This is a performance evaluation notebook to assess the OntopStream's behaviour as the data volume increases. The dockerized infrastrucure simulates an entire analytical pipeline based on the "New York Taxi dataset" of the Ververica sql-training repository. The data streams, managed in Apache Kafka, are ingested by Flink and accessed through the S-OBDA paradigm with OntopStream, in the way showed in the figure below.
The evaluation notebook includes seven RSP-QL queries to test several aspects of OntopStream, with an output limited to 200,000 RDF responses.
For each query is computed the difference between the start time, recorded before sending the HTTP request, and each result received from the OntopStream remote endpoint. The below table shows the average execution time (5 runs per query) for fetching n-thousand RDF results, taken from a simulation using an AWS t3.xlarge machine with 4 vCPU and 16 GB of memory.
| Query | 1K | 10K | 50K | 100K | 150K | 200K |
|---|---|---|---|---|---|---|
| Q1 | 2050 | 2718 | 4340 | 5996 | 7768 | 9501 |
| Q2 | 1685 | 2760 | 4729 | 7029 | 9326 | 11713 |
| Q3 | 3498 | 4281 | 7211 | 10815 | 14491 | 18043 |
| Q4 | 3695 | 4900 | 8491 | 13439 | 18282 | 25552 |
| Q5 | 3656 | 5100 | 10365 | 17738 | 25263 | 33592 |
| Q(s,o) | 3368 | 4318 | 8060 | 12836 | 16719 | 20982 |
| Q(s,p,o) | 5377 | 7112 | 14418 | 23188 | 31953 | 40991 |
Queries Q1 and Q2 test simple mappings involving one and two Flink dynamic tables, respectively. Q3 demonstrates the Ontop's stream reasoning capabilities: each ?Taxi variable is mapped as :Taxi, however the reasoner is able to infer that ?Taxi is also a :Car because in the ontology is stated that :Taxi rdfs:subClassOf :Car. Q4 tests the FILTER() query condition. Response times grow faster in Q4, because the pre-filtered data volume is much greater wrt. to returned response.
Moving to more complex queries, Q5 involves three mapped Flink dynamic tables and a more complicated VKG query translation. Thanks to the OWL2QL compliance of the Ontop internal engine, Q(s,o) is able to retireve all the RDF subjects and objects given a fixed predicate, while Q(s,p,o) is the full RDF materialization of the dataset under OWL2QL entailment regime.
sudo docker-compose -f flink-kafka.yml up -d
Connectct to the sql-client remote terminal
sudo docker-compose -f flink-kafka.yml exec sql-client bash
Start the REST endpoint service
/opt/flink-sql-gateway-0.2-SNAPSHOT/bin/sql-gateway.sh --library /opt/sql-client/lib
Note(1): keep the terminal window alive until you need the service. Close the REST service, press CTRL+C and type EXIT to leave the terminal.
Note(2): the JDBC driver mappings are persisted on the local file sql-gateway-defaults.yaml, which is automatically loaded on startup in the sql-client docker image.
Open a new teminal window, then run the command:
sudo docker-compose -f ontop.yml up -d
Note: the DB-descriptive ontology and the Ontop mapping file are stored in the taxiRides folder.
sudo docker-compose -f jupyter.yml up
Load the notebook OntopStream_preformance_evaluation.ipynb in the jupyter environment.
Execute all the testing queries in the notebook.
sudo docker-compose -f jupyter.yml down
sudo docker-compose -f ontop.yml down
sudo docker-compose -f flink-kafka.yml down
