OntoSplit

This repository contains the dataset, experimental scripts, and results for the study on ontology partitioning and SPARQL query optimization. The research focuses on improving the execution time of complex SPARQL queries by splitting large RDF/XML ontologies and leveraging parallel query execution in Apache Jena Fuseki.

📂 Contents:

📂 ./data - stores all ontology (or RDF/XML structures) files (original and partitioned), as well as any sample datasets or additional resources needed for experimentation and demonstrations.
📊 ./benchmarking-data – experiments data
📊 ./benchmarking-data/benchmark.xlsx – final tests results data: tables, charts
📜 ./benchmarking-data/sparql-queries – test SPARQL queries categorized by execution time (fast -1, medium -2, slow - 3)
📜 ./benchmarking-data/results-time - contains JSON files capturing the execution time for SPARQL queries of different categories (fast -1, medium -2, slow - 3) across various ontology partition configurations (1–15 parts)
🔧 ./benchmarking-data/scripts - Python scripts for benchmarking execution and results calculation
🔧 ./scripts/ontology-creation - Python scripts for ontology creation (PDF to JSON; JSON to XML/RDF ontology with different splitting options)
📕 ./parsed-pdfs-json - Stores files related to PDFs from the Dataset, including original PDFs (optional) and JSON outputs resulting from parsing scripts
📖 ./docs/ – methodology, findings, and implementation details - TODO

🚀 Sponsor this project

Please support @malakhovks. Despite the Wartime in Ukraine, R&D in the field of Digital Health and Ontology Engineering are being resumed:

Via credit card: https://send.monobank.ua/jar/5ad56oNAcD

Public Address to Receive USDT (BEP20): 0x1128A7b84728123dd4F55176c378754Dd396A674

Pay me via Trust Wallet: https://link.trustwallet.com/send?asset=c20000714_t0x55d398326f99059fF775485246999027B3197955&address=0x1128A7b84728123dd4F55176c378754Dd396A674

🔍 Key Topics:

SPARQL query optimization
Ontology partitioning (sharding)
Parallel query execution
Apache Jena Fuseki performance benchmarking
Semantic Web & RDF processing

🚀 Future Work:

The repository will be updated with further optimizations, including machine learning-based query performance prediction and dynamic ontology partitioning.

Contributions and discussions are welcome!

📕 Dataset

EBSCO articles dataset (domain knowledge: rehabilitation medicine) + JSON of every article

wget -O ./ebsco-rehabilitation-dataset.zip https://cdn.e-rehab.pp.ua/u/ebsco-rehabilitation-dataset.zip

💳 Funding

This study would not have been possible without the financial support of the National Research Foundation of Ukraine (Open Funder Registry: 10.13039/100018227). Our work was funded by Grant contract:

Development of the cloud-based platform for patient-centered telerehabilitation of oncology patients with mathematical-related modeling, application ID: 2021.01/0136
Link at the National Repository of Academic Texts № 0225U001069
Link at the National Repository of Academic Texts № 0224U000467
The research was also conducted within the framework of the scientific and technical project "To develop theoretical foundations and a functional model of a computer for processing complex information structures". Link at the National Repository of Academic Texts № 0124U002317
The research was also part of the scientific and technical project "Develop means of supporting virtualization technologies and their use in computer engineering and other applications". Link at the National Repository of Academic Texts № 0124U001826

📖 How to Cite / BibTex

If you use this repository in your research, please cite it as follows:

🔹 Citation format for article:

@article{palagin_2025_ontosplit,
  title={A Method for Enhancing the Efficiency of RDF/XML-Structure Processing in the Apache Jena Semantic Web Framework},
  author={Palagin, O. V. and Petrenko, M. G. and Kaverinskiy, V. V. and Malakhov, K.S.},
  journal={Cybernetics and Systems Analysis},
  ISSN={1573-8337},
  volume={61},
  number={3},
  pages={469–486},
  DOI={https://doi.org/10.1007/s10559-025-00784-w},
  year={2025}
}

🔹 Citation format for repository:

@misc{OntoSplit,
  author = {Kyrylo Malakhov and Vladislav Kaverinskiy},
  title = {OntoSplit: Ontology Partitioning and SPARQL Query Optimization},
  year = {2025},
  howpublished = {GitHub Repository},
  url = {https://github.com/knowledge-ukraine/OntoSplit}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
benchmarking-data		benchmarking-data
data		data
parsed-pdfs-json		parsed-pdfs-json
scripts/ontology-creation		scripts/ontology-creation
LICENSE		LICENSE
README.md		README.md
logo_nrfu_eng.png		logo_nrfu_eng.png
usdt-bsc.jpg		usdt-bsc.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OntoSplit

📂 Contents:

🚀 Sponsor this project

🔍 Key Topics:

🚀 Future Work:

📕 Dataset

💳 Funding

📖 How to Cite / BibTex

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

knowledge-ukraine/OntoSplit

Folders and files

Latest commit

History

Repository files navigation

OntoSplit

📂 Contents:

🚀 Sponsor this project

🔍 Key Topics:

🚀 Future Work:

📕 Dataset

💳 Funding

📖 How to Cite / BibTex

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages