This repository contains the benchmarking experiments for QligFEPv2, an iteration on the development of the QligFEP software for relative binding free energy (RBFE) calculations. QligFEP is a Python-based tool that automates the setup, execution, and analysis of free energy perturbation (FEP) calculations using the Q simulation package.
Original QligFEP publication: Jespers, W., Esguerra, M., Åqvist, J., Gutiérrez-de-Terán, H., QligFEP: an automated workflow for small molecule free energy calculations in Q. J Cheminform 11, 26 (2019). https://doi.org/10.1186/s13321-019-0348-5
📄 Manuscript describing this work is coming soon.
This benchmarking study evaluates QligFEPv2 performance across 16 protein-ligand systems from two established datasets:
- JACS Benchmark Set (8 targets): BACE1, CDK2, JNK1, MCL1, P38, PTP1B, Thrombin, TYK2
- Merck Public Dataset (8 targets): CDK8, cMET, EG5, HIF-2α, PFKFB3, SHP2, SYK, TNKS2
The repository provides:
- Starting structures and preparation workflows
- Complete FEP setup and analysis scripts
- Interactive visualization dashboard
- Comprehensive performance analysis notebooks
- Detailed performance metrics and regression plots
Explore the benchmarking results through an interactive Dash web application that visualizes perturbation networks, molecular structures, and performance metrics for each target.
Install dependencies:
python -m pip install git+https://github.com/David-Araripe/SFC_FreeEnergyCorrection.git \
git+https://github.com/David-Araripe/Weighted_cc.git \
git+https://github.com/David-Araripe/chemFilters.git \
dash cinnabar dash-molstar dash-bootstrap-components \
statannotations statsmodels fastparquet tabulate pyfontsLaunch the dashboard:
python app.pyThe dashboard provides:
- Interactive perturbation network graphs
- 3D molecular visualization using Molstar (everburstSun/dash-molstar)
- Experimental vs. calculated ΔΔG regression plots
- Statistical performance metrics (RMSE, MUE, Kendall's τ)
├── perturbations/ # Input structures and FEP setup files
│ └── <target>/
│ ├── protein.pdb # Prepared protein structure
│ ├── water.pdb # Equilibrated water sphere
│ ├── ligands.sdf # All ligands for the target
│ ├── mapping.json # Perturbation network definition
│ ├── <ligand>.lib # Q library files (force field parameters)
│ ├── <ligand>.prm # Q parameter files
│ ├── <ligand>.pdb # Individual ligand structures
│ ├── prepare.sh # SLURM script to setup FEP calculations
│ └── analyze.sh # SLURM script to analyze results
│
├── results/ # Analyzed FEP results
│ └── <target>/
│ ├── <target>_FEP_results.json # Raw FEP energies
│ ├── <target>_dgBar_verbose.parquet # Verbose output from qfep
│ ├── <target>_run_data.parquet # Runtime data (all replicates)
│ └── mapping_ddG.json # Network with calculated ΔΔG
│
├── startFiles/ # Raw inputs, data preparation notebooks
├── figures/ # Figures for manuscript
├── cache/ # Cached processed data for dashboard
├── app.py # Interactive Dash visualization dashboard
└── results_check.ipynb # Main analysis notebook with all metrics
The starting structures are derived from the IndustryBenchmarks2024 repository (Zenodo), with specific modifications detailed in PROT_PREPARATION.md.
The startFiles/ directory contains Jupyter notebooks documenting the complete preparation workflow:
| Notebook | Description |
|---|---|
extract_data.ipynb |
Downloads structures from IndustryBenchmarks2024 repository |
ligand_alignment.ipynb |
Aligns ligand structures to reference conformations |
rename_and_prepare_pdbs.ipynb |
Standardizes atom naming and generates water spheres with qprep |
perturbation_mapping.ipynb |
Creates perturbation network mappings using automatic algorithms |
restraint_check.ipynb |
Validates restraint selection for each perturbation |
system_sizes_and_total_compute.ipynb |
Analyzes system sizes and computational requirements |
Navigate to a target directory and run the preparation script:
cd perturbations/<target>
sbatch prepare.sh # or run setupFEP command directlyThe prepare.sh script:
- Splits the multi-molecule SDF file into individual ligand files
- Runs
setupFEPto generate FEP input files with appropriate restraints
After simulations complete, analyze results:
cd perturbations/<target>
sbatch analyze.sh # or run qligfep_analyze command directlyThe analyze.sh script runs qligfep_analyze to:
- Process FEP trajectories using the Gbar estimator
- Calculate ΔΔG values with statistical uncertainties
- Generate detailed results in parquet and JSON formats
- Move results to the appropriate directory
Note: Target-specific setup commands with restraint strategies are documented in perturbations/commands.md.
Comprehensive performance metrics and analysis are available in:
-
results/README.md- Performance tables for ΔΔG and ΔG predictions across all targets, including:- Kendall's τ (correlation coefficient)
- RMSE (root mean square error)
- MUE (mean unsigned error)
- Regression plots for each target
-
results_check.ipynb- Interactive analysis notebook containing:- Detailed statistical comparisons across force fields (QligFEP, OPLS3e, PMX-Sage 2.0)
- Stripplots and stacked metrics visualizations
- Computational performance analysis
- System size distributions
If you use this data or code, please cite:
-
QligFEP original paper: Jespers, W., Esguerra, M., Åqvist, J., Gutiérrez-de-Terán, H., QligFEP: an automated workflow for small molecule free energy calculations in Q. J Cheminform 11, 26 (2019). https://doi.org/10.1186/s13321-019-0348-5
-
IndustryBenchmarks2024: Baumann H., Alibay I., Horton J., Ries B., Henry M., et al., OpenFreeEnergy/IndustryBenchmarks2024: v1.0.0 (v1.0.0). Zenodo. (2025) https://doi.org/10.5281/zenodo.17245550
-
Manuscript for this work: Coming soon
See LICENSE file for details.
For questions about this benchmarking study or QligFEPv2, please open an issue in this repository.
