OntopStream performance evaluation

Considerations

This is a performance evaluation notebook to assess the OntopStream's behaviour as the data volume increases. The dockerized infrastrucure simulates an entire analytical pipeline based on the "New York Taxi dataset" of the Ververica sql-training repository. The data streams, managed in Apache Kafka, are ingested by Flink and accessed through the S-OBDA paradigm with OntopStream, in the way showed in the figure below.

The evaluation notebook includes seven RSP-QL queries to test several aspects of OntopStream, with an output limited to 200,000 RDF responses. For each query is computed the difference between the start time, recorded before sending the HTTP request, and each result received from the OntopStream remote endpoint. The below table shows the average execution time (5 runs per query) for fetching n-thousand RDF results, taken from a simulation using an AWS t3.xlarge machine with 4 vCPU and 16 GB of memory.

Query	1K	10K	50K	100K	150K	200K
Q1	2050	2718	4340	5996	7768	9501
Q2	1685	2760	4729	7029	9326	11713
Q3	3498	4281	7211	10815	14491	18043
Q4	3695	4900	8491	13439	18282	25552
Q5	3656	5100	10365	17738	25263	33592
Q(s,o)	3368	4318	8060	12836	16719	20982
Q(s,p,o)	5377	7112	14418	23188	31953	40991

Queries Q1 and Q2 test simple mappings involving one and two Flink dynamic tables, respectively. Q3 demonstrates the Ontop's stream reasoning capabilities: each ?Taxi variable is mapped as :Taxi, however the reasoner is able to infer that ?Taxi is also a :Car because in the ontology is stated that :Taxi rdfs:subClassOf :Car. Q4 tests the FILTER() query condition. Response times grow faster in Q4, because the pre-filtered data volume is much greater wrt. to returned response.

Moving to more complex queries, Q5 involves three mapped Flink dynamic tables and a more complicated VKG query translation. Thanks to the OWL2QL compliance of the Ontop internal engine, Q(s,o) is able to retireve all the RDF subjects and objects given a fixed predicate, while Q(s,p,o) is the full RDF materialization of the dataset under OWL2QL entailment regime.

Tests execution

1. Start the producer (Kafka and Flink)

sudo docker-compose -f flink-kafka.yml up -d

2. Start the REST endpoint

Connectct to the sql-client remote terminal

sudo docker-compose -f flink-kafka.yml exec sql-client bash

Start the REST endpoint service

/opt/flink-sql-gateway-0.2-SNAPSHOT/bin/sql-gateway.sh --library /opt/sql-client/lib

Note(1): keep the terminal window alive until you need the service. Close the REST service, press CTRL+C and type EXIT to leave the terminal.

Note(2): the JDBC driver mappings are persisted on the local file sql-gateway-defaults.yaml, which is automatically loaded on startup in the sql-client docker image.

3. Start OntopStream

Open a new teminal window, then run the command:

sudo docker-compose -f ontop.yml up -d

Note: the DB-descriptive ontology and the Ontop mapping file are stored in the taxiRides folder.

4. Start Jupyter

sudo docker-compose -f jupyter.yml up

Load the notebook OntopStream_preformance_evaluation.ipynb in the jupyter environment.

Execute all the testing queries in the notebook.

5. Stop the demo environment

sudo docker-compose -f jupyter.yml down

sudo docker-compose -f ontop.yml down

sudo docker-compose -f flink-kafka.yml down

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
flink		flink
images		images
jdbc		jdbc
mysql		mysql
taxiRides		taxiRides
OntopStream.png		OntopStream.png
OntopStream_performance_evaluation.ipynb		OntopStream_performance_evaluation.ipynb
README.md		README.md
evaluation_results		evaluation_results
flink-kafka.yml		flink-kafka.yml
jupyter.yml		jupyter.yml
ontop.yml		ontop.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OntopStream performance evaluation

Considerations

Tests execution

1. Start the producer (Kafka and Flink)

2. Start the REST endpoint

3. Start OntopStream

4. Start Jupyter

5. Stop the demo environment

About

Uh oh!

Releases

Packages

Languages

chimera-suite/OntopStream-evaluation

Folders and files

Latest commit

History

Repository files navigation

OntopStream performance evaluation

Considerations

Tests execution

1. Start the producer (Kafka and Flink)

2. Start the REST endpoint

3. Start OntopStream

4. Start Jupyter

5. Stop the demo environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages