This is an AI-powered materials discovery system called "scientist" that uses DSPy to discover rare-earth-free permanent magnets. It combines large language models with computational materials science tools through the Ouro platform and GGen library to systematically generate, evaluate, mutate, and refine material candidates with full mutation history tracking.
The system follows a DSPy-based iterative discovery loop:
- Hypothesis Generation/Refinement (
GenerateMagnetHypothesis,RefineHypothesis): Uses LLMs to generate scientific hypotheses about potential magnetic materials, incorporating mutation history feedback - Material Design (
DesignMaterialCandidate): Converts hypotheses into specific chemical compositions and crystal structures, or selects parent materials for mutation - Mutation Strategy Selection (
SelectMutationStrategy): AI-driven selection of mutation operations (scale lattice, substitute elements, jitter sites, etc.) based on performance analysis - Structure Generation/Mutation (
ComputationalTools+GGen): Generates new crystal structures from scratch or applies mutations to existing materials using GGen library - Computational Evaluation (
ComputationalTools): Evaluates material properties using Ouro routes with mutation effect tracking - Results Interpretation (
InterpretSimulationResults): Analyzes computational results and extracts insights, including mutation effectiveness analysis
app.py: Main application withMaterialDiscoveryScientistDSPy module and mutation-aware discovery looptools.py:ComputationalToolsclass that interfaces with Ouro platform and GGen library for structure generation, mutation, and property evaluationmodels.py: Data structures (Material,MaterialProperties,MutationRecord) for representing materials, properties, and complete mutation lineagepublisher.py:Publisherclass for creating Ouro posts with discovery results, mutation analysis, and visualizations
The system supports intelligent material mutation through GGen:
- Mutation Types: Scale lattice, substitute elements, jitter atomic sites, shear lattice, break symmetry, change space groups
- History Tracking: Complete lineage of mutations with success/failure rates and property changes
- Strategy Selection: AI selects optimal mutation strategies based on performance analysis and target properties
- Fallback Generation: Falls back to Ouro crystal generation if GGen mutations fail
The system uses DSPy signatures for structured LLM reasoning:
- Input/output fields are clearly defined for each reasoning step
- ChainOfThought modules provide detailed reasoning traces
- Mutation-aware signatures:
SelectMutationStrategyfor intelligent mutation selection, updatedRefineHypothesisandInterpretSimulationResultswith mutation history context - MLflow integration tracks optimization and evaluation metrics including mutation success rates
# Run the main discovery loop
python app.py
# or
scientist # via setuptools entry point# Start MLflow tracking server (must be running before app execution)
mlflow server --backend-store-uri sqlite:///scientist.sqlite# Install in development mode
pip install -e .Required environment variables (see .env.example):
OPENAI_API_KEY: OpenAI API key for DSPy LLM callsOURO_API_KEY: Ouro platform API key for computational toolsOURO_TEAM_ID: Ouro team ID for file/asset management
- Ensure MLflow server is running locally on port 5000
- Configure environment variables in
.envfile - Run the discovery loop with
python app.py - The system will:
- Start with random generation for first few iterations
- Begin applying intelligent mutations based on best candidates
- Track mutation success rates and property changes
- Adapt strategy selection based on performance analysis
- Results are automatically logged to MLflow and published to Ouro
- Generated structures, mutations, and property evaluations are stored as artifacts
- Mutation history and lineage are preserved for analysis
- DSPy: Framework for optimizing LLM-based reasoning
- GGen: Crystal generation and mutation library (local dependency at
/Users/mmoderwell/ouro/ggen) - PyMatGen: Materials science library for crystal structures
- MLflow: Experiment tracking and model management
- Ouro-py: Interface to Ouro computational platform
- OpenAI: LLM provider for hypothesis generation
generation/: Contains additional modules (ignore per user request)mlartifacts/: MLflow artifact storage directoryscientist.sqlite: MLflow backend database- Material registry stores mutation lineage in memory during execution
- GGen integration allows fallback between GGen and Ouro generation methods
- Space group compatibility checking and resolution is handled automatically by the Ouro platform
- GGen library provides trajectory tracking for mutation sequences
- Material registry maintains relationships between parent and child materials for mutation lineage analysis
- Mutation success rates inform future strategy selection for continuous improvement