AI-powered COBOL to Java (Quarkus) migration framework built on Python using Atomic Agents and Instructor for seamless orchestration, structured validation, and automatic cost tracking.
What is Quarkus? Quarkus is a fast, cloud-native Java framework for microservices, optimized for Kubernetes and serverless. It offers quick startup, low memory use, and native compilation. This migration produces ready-to-deploy Quarkus code with annotations, dependency injection, and RESTful APIs.
Updates
- 08/07/2025: Tested with LiteLLM β best results with OpenAI o4 via Azure, Codex via Azure, Gemini via Vertex, and Claude via Vertex.
- 08/07/2025: Tested with OpenRouter β works on par with LiteLLM, plus support for Grok 3/4 and many more models.
- 08/07/2025: GPT-OSS models (both locally and via OpenRouter) proved too limited for complex input/output schemas, so Iβm exploring alternative solutions.
- 08/08/2025: Tested with GPT-5 β outstanding results.
This repository is a modern Python rewrite of the original Microsoft project Legacy Modernization Agents (C#), which provided an AI-assisted approach to COBOL-to-Java migration using Semantic Kernel.
The Python implementation leverages the Atomic Agents framework to provide a more modular, schema-driven approach to AI agent management while maintaining all the functionality of the original C# system.
What's new in this version:
- Built on
Atomic Agentsfor standardized AI orchestration - Uses
Instructorfor structured LLM interactions with automatic validation - Automatic token and cost tracking through instructor hooks
- Docker-first deployment with simplified setup
- Enhanced observability with comprehensive logging and conversation tracking
The original Legacy Modernization Agents represents a pioneering Microsoft initiative that emerged from a strategic collaboration with Denmark's Bankdata. This groundbreaking project demonstrates how AI agents can revolutionize legacy COBOL modernization at enterprise scale.
Built on Microsoft's Semantic Kernel framework with .NET 8.0, the system leverages Process Functions to orchestrate three specialized AI agents: CobolAnalyzer, JavaConverter, and DependencyMapper. The architecture is specifically optimized for GPT-4.1 models running at enterprise capacity (1M tokens/minute) and integrates seamlessly with Azure OpenAI services. Development is streamlined through Visual Studio Code Dev Containers, ensuring consistent environments across teams.
The project's multi-agent architecture embodies a clear separation of concerns, with each agent specializing in distinct phases of the migration pipeline. Configuration management follows enterprise patterns with a dual-file system (template + local credentials), while the included doctor.sh CLI tool provides comprehensive setup, validation, and migration management capabilities. This design philosophy prioritizes observability, reliability, and scalability for large-scale modernization initiatives.
As a joint research initiative between Microsoft's Global Black Belt team and Bankdata, this project has garnered significant industry attention through featured blog posts and conference presentations. The open-source release was strategically designed to engage the broader COBOL community, gathering real-world code samples to further refine the AI models.
With enterprise-validated results, the system has demonstrated remarkable efficiency: processing 102 COBOL files into 99 Java files in just ~1.2 hours at sub-dollar costs, achieving an impressive 97% successful conversion rate. This proves the practical viability of AI-assisted legacy modernization at enterprise scale.
While the original C# project offered a strong foundation, this Python version introduces key innovations that dramatically improve developer experience, observability, AI reliability, and deployment simplicity:
This Python rewrite leverages Instructor library for superior LLM interactions:
- Optimized Prompts: All agent prompts have been refined for better accuracy and consistency
- Structured Communication: Instructor enforces strict input/output schemas via Pydantic, reducing AI errors
- Automatic Retries: Built-in retry logic with validation ensures robust responses
- Multi-Provider Support: Easy switching between OpenAI, Azure OpenAI, Anthropic, and other providers
- Performance: Structured responses eliminate manual parsing, improving speed and reliability
Built on Atomic Agents, which offers significant improvements over frameworks like LangChain and CrewAI through its IPO model (Input-Process-Output) with Pydantic validation, atomic components with single responsibilities, and transparent operations without hidden abstractions.
This approach delivers a faster, more resilient, and scalable modernization pipeline with full developer control.
- Schema-Driven Architecture: Pydantic models ensure strong typing and validation across all data flows
- Asynchronous Processing: Full async/await support for improved performance and scalability
- Enhanced CLI Experience: Rich interface with progress bars, styling, and interactive feedback
- Service-Oriented Design: Clear separation of concerns with modular, testable components
- Automated Observability: Built-in hooks capture token usage, costs, and conversation flows automatically
| Feature | Value |
|---|---|
| Speed | Migration in minutes, not months |
| Cost | Orders of magnitude lower than traditional manual rewrites |
| Auditability | Full AI trace and confidence logs |
| Accuracy | Enterprise-grade output with >95% correctness potential, supported by AI-driven auto-review and suggestions to speed-up human validation |
| Scalability | Agent-based architecture supports horizontal scale |
| Maintainability | Modern, clean Java output with Javadoc |
| Compliance | Traceability from COBOL to Java |
- Copy the example environment file and edit local settings:
cp ./config/settings.env.example ./config/settings.local.env nano ./config/settings.local.env
- Run the setup script to initialize Docker containers and validate the environment:
./scripts/docker-setup.sh setup
- Follow on-screen instructions to verify the setup and perform a test migration.
- COBOL source files:
./data/cobol-source - Java output files:
./data/java-output - Log files:
./data/logs
- For further options, consult the help menu:
./scripts/docker-setup.sh --help
- To configure the environment interactively, use:
./run.sh cobol-migrate-setup
- For a list of available commands:
./run.sh --help
- You may use Docker Compose commands directly as an alternative to
./run.sh. Example:./run.sh cobol-migrate-setup # is equivalent to: docker-compose run --rm cobol-migration cobol-migrate-setup
More information:
- π DOCKER_QUICKSTART.md
- π DOCKER_GUIDE.md
- π PARAMETERS.md - Complete configuration reference
The Docker-first setup ensures a consistent, isolated environment on any platform, with secure non-root containers and built-in scalability for enterprise use. Automated scripts streamline all operations, while development mode includes full debugging tools for rapid troubleshooting.
This containerized approach is ready for Docker Swarm clusters, Kubernetes with manifests, CI/CD integration (GitHub Actions, GitLab CI), and major cloud platforms like AWS ECS, Azure Container Instances, and GCP Cloud Run.
The framework is based on a modular multi-agent architecture, each agent specializing in one phase of the migration process. For a detailed technical diagram of the Python implementation, see the Architecture Diagram in Appendix.
| Agent | Role |
|---|---|
CobolAnalyzerAgent |
Parses COBOL structure, detects business logic, calculates complexity |
JavaConverterAgent |
Converts COBOL to modern Java using Quarkus best practices |
DependencyMapperAgent |
Maps relationships between files, generates Mermaid dependency diagrams |
graph TB
A[File Discovery] --> B[Dependency Analysis]
B --> C[COBOL Analysis]
C --> D[Java Conversion]
D --> E[File Generation]
E --> F[Report Generation]
G[Hook Tracking] -.-> C
G -.-> D
G -.-> B
With automatic Instructor hooks, every AI call is traced and measured.
- Tokens: prompt, completion, total
- Cost: calculated based on model used
- Latency: per-agent performance
- Hook efficiency: % of calls automatically captured
- Conversion rate, token use, complexity score
- AI performance per file and per agent
- Code expansion ratio (COBOL β Java)
- Timestamped agent reasoning
- Latency and confidence metrics per file
- Full visibility for debugging and auditing
This file is automatically generated after each migration. You can also regenerate it at any time using the conversation command with ./run.sh.
- Fully annotated Quarkus-compatible services
- Type-safe, idiomatic Java code
- Clean microservice structure, ready for REST APIs
Only recommended for advanced users.
- Python 3.12+
- pip
pip install -e .
# Copy template to create your local configuration
cp config/settings.env.example config/settings.local.env
nano config/settings.local.env
python -m cobol_migration_agents.cli main --cobol-source ./data/cobol-source --java-output ./data/java-outputπ Configuration Help: For detailed explanations of all configuration parameters, see PARAMETERS.md
The modular design makes it easy to add new agents, models, services, or CLI commands. Contributions are welcomeβwhether it's new agent types, improved prompts, better tests, or feedback from users.
MIT License β same as the original C# version.
This repository is an independent creation by Lorenzo Toscano, developed entirely separate from any professional activities or organizational affiliations. This work serves as a technical demonstration of how agentic AI configurations can effectively accelerate reverse engineering and code migration processes.
This project is intended for educational and research purposes, showcasing best practices in AI-assisted software migration. If you plan to use it in production environments, it is strongly recommended to seek guidance from qualified and experienced professionals.
This Python version implements a modern, modular architecture using Atomic Agents and Pydantic models:
graph TB
subgraph ORCHESTRATOR ["π― Migration Orchestrator"]
COORDINATOR["π MigrationOrchestrator<br/>β’ 6-Step Process<br/>β’ File Discovery<br/>β’ Workflow Management"]
end
subgraph AI_AGENTS ["π€ Atomic Agents"]
COBOL_AGENT["π CobolAnalyzerAgent<br/>β’ Structure Analysis<br/>β’ Variable Mapping<br/>β’ Logic Flow Detection<br/>β’ Copybook References"]
JAVA_AGENT["β JavaConverterAgent<br/>β’ COBOLβJava Translation<br/>β’ Quarkus Integration<br/>β’ Best Practices<br/>β’ Error Handling"]
DEPENDENCY_AGENT["πΊοΈ DependencyMapperAgent<br/>β’ Relationship Analysis<br/>β’ Mermaid Diagrams<br/>β’ Usage Patterns<br/>β’ Risk Assessment"]
end
subgraph DATA_MODELS ["π Pydantic Models"]
COBOL_MODELS["π COBOL Models<br/>β’ CobolFile<br/>β’ CobolAnalysis<br/>β’ Complexity Metrics"]
JAVA_MODELS["β Java Models<br/>β’ JavaFile<br/>β’ Quarkus Metadata<br/>β’ Class Structures"]
SCHEMA_MODELS["π Schema Models<br/>β’ Input/Output Schemas<br/>β’ Migration Schemas<br/>β’ DependencyMap"]
end
subgraph SERVICES ["π οΈ Core Services"]
FILE_MANAGER["π FileManager<br/>β’ Async File Operations<br/>β’ Directory Organization<br/>β’ Backup & Validation"]
LOGGING_SERVICE["π LoggingService<br/>β’ API Call Tracking<br/>β’ Conversation Logging<br/>β’ Cost Analysis"]
REPORT_SERVICE["π ReportGenerator<br/>β’ Markdown Reports<br/>β’ Migration Statistics<br/>β’ Recommendations"]
end
subgraph OUTPUT ["π€ Generated Artifacts"]
JAVA_OUTPUT["β Java Files<br/>β’ Quarkus Services<br/>β’ Package Structure<br/>β’ Annotations"]
REPORTS_OUTPUT["π Reports<br/>β’ Migration Report<br/>β’ Conversation Logs<br/>β’ API Statistics"]
DIAGRAMS_OUTPUT["πΊοΈ Diagrams<br/>β’ Dependency Maps<br/>β’ Mermaid Charts<br/>β’ Risk Analysis"]
end
%% Main Flow
COORDINATOR --> AI_AGENTS
AI_AGENTS --> DATA_MODELS
DATA_MODELS --> SERVICES
SERVICES --> OUTPUT
%% Detailed Connections
COORDINATOR -.-> FILE_MANAGER
COORDINATOR -.-> LOGGING_SERVICE
COORDINATOR -.-> REPORT_SERVICE
COBOL_AGENT --> COBOL_MODELS
JAVA_AGENT --> JAVA_MODELS
DEPENDENCY_AGENT --> SCHEMA_MODELS
FILE_MANAGER --> JAVA_OUTPUT
LOGGING_SERVICE --> REPORTS_OUTPUT
REPORT_SERVICE --> DIAGRAMS_OUTPUT
%% Styling
classDef orchestratorStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#0d47a1
classDef agentStyle fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#e65100
classDef modelStyle fill:#f1f8e9,stroke:#689f38,stroke-width:3px,color:#1b5e20
classDef serviceStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#4a148c
classDef outputStyle fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px,color:#1b5e20
class COORDINATOR orchestratorStyle
class COBOL_AGENT,JAVA_AGENT,DEPENDENCY_AGENT agentStyle
class COBOL_MODELS,JAVA_MODELS,SCHEMA_MODELS modelStyle
class FILE_MANAGER,LOGGING_SERVICE,REPORT_SERVICE serviceStyle
class JAVA_OUTPUT,REPORTS_OUTPUT,DIAGRAMS_OUTPUT outputStyle
Ready-to-Compile Quarkus Code:
- Complete Java classes with proper Quarkus annotations (
@ApplicationScoped,@Entity,@Path) - Dependency injection setup with CDI patterns
- RESTful API structure with service/repository layers
- Package organization and imports
- Business logic preservation with type-safe conversions
- Build Configuration: Custom
pom.xml/build.gradlefor specific dependencies - Application Properties: Runtime configuration (
application.properties) - Database Schema: DDL scripts for JPA entities if database integration is needed
- Integration Testing: End-to-end test suites for complex business scenarios
- Deployment Manifests: Kubernetes/Docker configurations for production
AI Model Limitations:
- Complex Business Logic: Very intricate COBOL programs may require manual review
- Legacy Extensions: Non-standard COBOL extensions might not be fully supported
- Performance Optimization: Generated code may need tuning for high-performance scenarios
Scope Limitations:
- JCL Integration: Job Control Language translation not included
- Database Specifics: COBOL-DB2 specific optimizations need manual attention
- Screen Handling: CICS/3270 screen logic requires additional conversion steps
Infrastructure Gaps:
- CI/CD Pipelines: Build and deployment automation not generated
- Monitoring Setup: Observability configuration for production monitoring
- Security Configuration: Authentication/authorization setup for enterprise deployment
When assessing production readiness, consider that the conversion of core business logic is already production-grade, with demonstrated accuracy above 95%. The Quarkus microservices architecture, Docker containerization, advanced logging, and optimized token usage are all fully implemented. However, it is essential to carefully review sections involving complex mathematical calculations, performance-critical code paths, integrations with external systems, and security-sensitive operations. Manual handling is still required for environment-specific configurations, database connection setup, authentication and authorization implementation, as well as load testing and performance tuning activities.
For optimal results, carefully follow the instructions provided below. Please note that even steps not directly managed by this tool can be efficiently automated using solutions such as GitHub Copilot and similar technologies.
Copy your COBOL source files into the data/cobol-source directory within your cloned repository.
By default, the migration process uses the following folders: data/cobol-source for input, data/java-output for generated Java code, and data/logs for migration logs.
1. Code Review Process:
# Generate migration
./scripts/docker-setup.sh migrate
# Review conversation logs for AI confidence scores
cat data/logs/conversation_log_*.md
# Test compilation
cd data/java-output && mvn compile2. Integration Checklist:
- Configure
application.propertiesfor your environment - Set up database connections and JPA configuration
- Implement authentication/authorization if needed
- Add integration tests for critical business flows
- Configure monitoring and alerting
- Set up CI/CD pipelines
3. Deployment Strategy:
- Development: Use generated Docker configurations
- Staging: Add environment-specific properties
- Production: Implement full observability stack
This framework provides a strong foundation for COBOL modernization while being transparent about what additional work may be needed for full production deployment.
Below are some potential enhancements. Note that advanced features are typically integrated into commercial AI solutions for migration and reverse engineering by specialized vendors.
Short Term:
- Unit Test Generation: Automated test creation for converted Java code
- Build Configuration: Auto-generate
pom.xmlwith correct dependencies - Enhanced COBOL Support: Better handling of complex nested structures
- Performance Optimization: AI-driven performance tuning suggestions
Medium Term:
- JCL Converter Agent: Job Control Language to Spring Batch/Quarkus Scheduler
- Database Migration Tools: Schema conversion and data migration scripts
- CICS Integration: Web service replacement for terminal-based applications
- Advanced Dependency Analysis: Cross-system dependency mapping
Long Term:
- Enterprise Integration: SAP, Oracle, IBM mainframe connectors
- Visual Migration Designer: GUI for migration planning and customization
- Real-time Migration: Incremental, zero-downtime migration strategies