A Nextflow plugin for comprehensive file tracking and reporting, providing detailed information about workflow outputs including both named (emit) and unnamed outputs with multi-cloud storage support.
- File Output Tracking: Monitors all process outputs during workflow execution
- Named Output Support: Tracks outputs with
emitnames for easy identification - Individual Process Reports: Generates individual JSON reports for each process/tag combination
- Published File Tracking: Automatically tracks both work directory and published file locations
- JSON Reporting: Generates structured JSON reports with file locations and metadata
- Collated Reports: Option to generate single consolidated report for entire workflow
- Multi-Cloud Storage Support: Built-in support for multiple cloud storage backends:
- Amazon S3:
s3://bucket/path - Google Cloud Storage:
gs://bucket/path - Azure Blob Storage:
azure://container/path - Latch Data:
latch://workspace.account/path - Local filesystem:
/local/path
- Amazon S3:
-
Prerequisites:
- Java 11+ (tested with Java 17)
- Gradle 7.0+
- Nextflow 24.10.0+
- Git
-
Clone the repository:
git clone https://github.com/theiagen/nf-theia.git cd nf-theia -
Build the plugin:
./gradlew build
-
Create plugin distribution (optional):
./gradlew makeZip # Creates nf-theia-0.1.0.zip in build/libs/ -
Install plugin locally for use:
./gradlew installPlugin # Installs to ~/.nextflow/plugins/
-
Test repository (alternative for development):
export NXF_PLUGINS_TEST_REPOSITORY="https://github.com/theiagen/nf-theia/releases/download/v0.2.3/nf-theia-0.2.3-meta.json"
-
Manual plugin installation:
- The plugin must be extracted as a folder in
~/.nextflow/plugins/ - Use
./gradlew installPluginfor local development builds (recommended)
- The plugin must be extracted as a folder in
Add the plugin configuration to your nextflow.config file:
plugins {
id 'nf-theia@0.2.3'
}
theia {
fileReport {
enabled = true
collate = true
workdir = true
collatedFileName = "collated-workflow-files.json"
}
}enabled(boolean): Enable/disable file reporting (default:false)collate(boolean): Generate a single collated report file (default:false)workdir(boolean): Write json files to workdir (default:false)collatedFileName(string): Name of the collated report file (default:"collated-workflow-files.json")
Note: Ensure your environment has proper authentication configured for your chosen cloud storage provider.
Run your workflow as normal:
nextflow run your_workflow.nfCheck the generated reports:
- Individual JSON reports are generated for each process/tag in their respective publishDir locations
- If
collate = true, a single consolidated JSON file will be created in the root publishDir location with all file information
#!/usr/bin/env nextflow
process ANALYZE_DATA {
publishDir 'results/analysis', mode: 'copy'
tag "${sample_id}"
input:
val sample_id
output:
path "${sample_id}_analysis.txt", emit: analysis
path "${sample_id}_summary.txt", emit: summary
path "${sample_id}_raw.txt"
script:
"""
echo "Analysis results for ${sample_id}" > ${sample_id}_analysis.txt
echo "Summary data for ${sample_id}" > ${sample_id}_summary.txt
echo "Raw output for ${sample_id}" > ${sample_id}_raw.txt
"""
}
workflow {
samples = Channel.of('sample_001', 'sample_002')
ANALYZE_DATA(samples)
}With the plugin enabled, this will generate:
- Individual JSON files:
results/analysis/ANALYZE_DATA_sample_001.json,ANALYZE_DATA_sample_002.json - Collated JSON file:
results/workflow_files.json(ifcollate = true) - Named outputs:
analysisandsummary - Unnamed output:
output_2(for raw file) - Published file paths: Automatically tracked in
publishedFilesarrays
The generated JSON report contains:
{
"workflow": {
"totalTasks": 2,
"timestamp": "Sat Jul 19 22:00:29 BST 2025"
},
"tasks": [
{
"process": "ANALYZE_DATA",
"tag": "sample_001",
"taskName": "ANALYZE_DATA (sample_001)",
"workDir": "/path/to/work/dir/sample_001",
"outputs": {
"analysis": {
"workDirFiles": ["/path/to/work/dir/sample_001_analysis.txt"],
"publishedFiles": ["/path/to/results/analysis/sample_001_analysis.txt"]
},
"summary": {
"workDirFiles": ["/path/to/work/dir/sample_001_summary.txt"],
"publishedFiles": ["/path/to/results/analysis/sample_001_summary.txt"]
},
"output_2": {
"workDirFiles": ["/path/to/work/dir/sample_001_raw.txt"],
"publishedFiles": ["/path/to/results/analysis/sample_001_raw.txt"]
}
},
"timestamp": "Sat Jul 19 22:00:29 BST 2025"
}
]
}If no reports are generated:
- Verify
theia.fileReport.enabled = truein your config - Check that your workflow actually produces outputs
- Ensure you have write permissions to the output directory
- Check the Nextflow log for plugin loading messages
If you encounter build issues:
-
Java Version Compatibility:
java -version # Should be Java 11+ -
Clean and rebuild:
./gradlew clean build
-
Dependency issues:
./gradlew dependencies --configuration runtimeClasspath
- ✅ Core functionality: File tracking and JSON reporting
- ✅ Multi-cloud support: S3, GCS, Azure, Latch, and local storage
- ✅ Named outputs: Full support for
emitparameters - ✅ Collated reports: Workflow-level consolidated reports
- ✅ Build system: Standard Nextflow plugin build pipeline
- 🚧 Plugin registry: Not yet published to official registry
- 🚧 Test coverage: Basic integration tests available
- Issues: Report bugs and request features via GitHub Issues
Licensed under the Apache License, Version 2.0. See LICENSE file for details.