Skip to content

toddstoffel/MongoSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MongoSQL - SQL to MongoDB Query Language Translator

A command-line client that translates MariaDB/MySQL syntax to MongoDB Query Language (MQL) and executes queries against MongoDB databases with 100% compatibility across comprehensive test suites.

Features

  • Interactive CLI similar to MySQL client
  • Translates SQL SELECT, INSERT, UPDATE, DELETE statements to MongoDB operations
  • Comprehensive function mapping - 47+ MariaDB/MySQL functions supported
  • Advanced JOIN support - INNER, LEFT, RIGHT, and multi-table JOINs
  • Complete ORDER BY functionality with proper collation matching
  • Full DISTINCT operations support
  • GROUP BY operations with HAVING clause support
  • Conditional functions - IF, CASE WHEN, COALESCE, NULLIF
  • Complete SUBQUERY support - SCALAR, IN/EXISTS, ROW, and DERIVED subqueries
  • Reserved words handling - Complete MariaDB reserved word support
  • Connects to MongoDB using PyMongo with connection pooling
  • Collation-aware sorting to match MariaDB's utf8mb4_unicode_ci behavior
  • Rich terminal output with MariaDB-compatible formatting
  • Batch mode support
  • Environment-based configuration
  • Modular architecture with dedicated modules for all SQL operations

Compatibility Status

Current Test Results (100% success rate):

  • DATETIME functions: 22/22 (100.0%) - Complete date/time function support
  • STRING functions: 10/10 (100.0%) - Full string manipulation support
  • MATH functions: 10/10 (100.0%) - Complete mathematical operations
  • AGGREGATE functions: 5/5 (100.0%) - All aggregate functions working
  • JOINS: 4/4 (100.0%) - Complete JOIN functionality
  • GROUP BY: 3/3 (100.0%) - Full GROUP BY with HAVING support
  • ORDER BY: 3/3 (100.0%) - Full sorting with proper collation
  • DISTINCT: 3/3 (100.0%) - All DISTINCT operations supported
  • CONDITIONAL: 4/4 (100.0%) - IF, CASE WHEN, COALESCE, NULLIF fully implemented
  • SUBQUERIES: 5/5 (100.0%) - Complete subquery support with all patterns

Recent Updates

Version 1.3.0 (August 15, 2025)

🎉 COMPLETE SUBQUERY SUPPORT - Perfect 100% Compatibility Achieved!

✅ Completed Features:

  • SCALAR Subqueries: Single value returns with proper aggregation pipeline integration
  • IN/EXISTS Subqueries: Table-based and correlated subquery support with $lookup operations
  • ROW Subqueries: Multi-column tuple matching with $and conditions
  • DERIVED Subqueries: Complex table expressions with GROUP BY, aliases, and field mapping
  • Database Connection: Fixed CLI database switching for seamless query execution
  • Field Resolution: Enhanced alias-aware projection mapping for clean output formatting

🔧 Technical Achievements:

  • Complete token-based subquery parsing with context-aware type detection
  • MongoDB aggregation pipeline generation for all subquery patterns
  • Sophisticated field reference resolution for derived table operations
  • Clean output formatting with proper alias handling
  • Integration with existing JOIN, GROUP BY, and ORDER BY systems

📊 Test Results:

  • Before: 64/67 tests passing (95.5% success rate)
  • After: 69/69 tests passing (100% success rate)
  • Achievement: Perfect compatibility across all SQL operation categories
  • Subqueries: All 5 subquery patterns fully implemented and tested

Version 1.2.0 (August 15, 2025)

🎉 Major Conditional Functions Update - Compatibility increased to 95.5%

✅ Completed Features:

  • COALESCE Function: Fixed evaluation engine to properly handle $ifNull operators
  • NULLIF Function: Complete implementation with proper empty string formatting
  • Expression Evaluation: Enhanced MongoDB client with $eq and $ifNull operator support
  • Result Formatting: Improved null value handling to match MariaDB behavior
  • QA Parser: Fixed table output parsing to handle empty cell values correctly

🔧 Technical Improvements:

  • Enhanced mongodb_client.py with comprehensive expression evaluation
  • Updated argument parsing in translator.py for conditional function compatibility
  • Fixed result formatting pipeline from evaluation through display
  • Improved QA test parser robustness for edge cases

📊 Test Results:

  • Before: 62/67 tests passing (92.5% success rate)
  • After: 64/67 tests passing (95.5% success rate)
  • Fixed: COALESCE and NULLIF functions now working correctly
  • Remaining: Only 3 subquery-related tests pending

Installation

  1. Clone the repository:
git clone <repository-url>
cd MongoSQL
  1. Install dependencies:
pip install -r requirements.txt
  1. Make the mongosql script executable:
chmod +x mongosql
  1. Configure MongoDB connection:
cp .env.example .env
# Edit .env with your MongoDB connection details

Important: Collation Compatibility

Critical Discovery: For accurate comparison between MariaDB and MongoDB results, both systems must use compatible collation rules:

  • MariaDB: Uses utf8mb4_unicode_ci (case-insensitive, Unicode-aware)
  • MongoDB: Configured with equivalent collation settings:
    {
      locale: 'en',
      caseLevel: false,     // Case-insensitive like MariaDB
      strength: 1,          // Primary level only (ignore case/accents)
      numericOrdering: false
    }

This ensures ORDER BY and comparison operations return identical results between both database systems.

Usage

Interactive Mode (Default)

./mongosql

Or with connection parameters:

./mongosql --host localhost --port 27017 --database mydb --username myuser -p

Execute Single Statement

./mongosql classicmodels -e "SELECT * FROM customers WHERE customerNumber > 100"

Batch Mode

cat queries.sql | ./mongosql classicmodels --batch

Advanced Features

GROUP BY with Aggregation

-- GROUP BY with COUNT and HAVING
SELECT country, COUNT(*) as customer_count 
FROM customers 
GROUP BY country 
HAVING COUNT(*) > 5
ORDER BY customer_count DESC;

-- Multiple aggregation functions
SELECT country, COUNT(*) as total, AVG(creditLimit) as avg_credit
FROM customers 
GROUP BY country;

Conditional Functions

-- IF function
SELECT IF(creditLimit > 50000, 'High', 'Low') as credit_tier FROM customers;

-- CASE WHEN expression
SELECT CASE 
  WHEN creditLimit > 100000 THEN 'Premium'
  WHEN creditLimit > 50000 THEN 'Standard' 
  ELSE 'Basic'
END as tier FROM customers;

-- COALESCE function (returns first non-null value)
SELECT COALESCE(city, 'Unknown') as location FROM customers;

-- NULLIF function (returns null if values are equal, otherwise first value)
SELECT NULLIF(customerName, 'Unknown Customer') as name FROM customers;

JOIN Support

-- INNER JOIN
SELECT c.customerName, o.orderDate 
FROM customers c 
INNER JOIN orders o ON c.customerNumber = o.customerNumber;

-- LEFT JOIN with multiple tables
SELECT c.customerName, o.orderDate, od.quantityOrdered
FROM customers c
LEFT JOIN orders o ON c.customerNumber = o.customerNumber
LEFT JOIN orderdetails od ON o.orderNumber = od.orderNumber;

ORDER BY with Collation

-- Case-insensitive sorting matching MariaDB behavior
SELECT customerName FROM customers ORDER BY customerName ASC LIMIT 10;
-- Returns: Alpha Cognac, American Souvenirs Inc, Amica Models & Co., ANG Resellers...

DISTINCT Operations

-- Single and multi-column DISTINCT
SELECT DISTINCT country FROM customers;
SELECT DISTINCT city, country FROM customers;

Subqueries (In Development)

-- Scalar subquery for single value comparison
SELECT customerName FROM customers 
WHERE customerNumber = (SELECT customerNumber FROM orders ORDER BY orderDate DESC LIMIT 1);

-- IN subquery for membership testing
SELECT customerName FROM customers 
WHERE customerNumber IN (SELECT customerNumber FROM orders LIMIT 3);

-- EXISTS subquery for correlated existence testing
SELECT customerName FROM customers 
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customerNumber = customers.customerNumber) 
LIMIT 1;

SQL to MQL Translation Examples

SELECT Statements

-- SQL
SELECT customerName, customerNumber FROM customers WHERE customerNumber > 100 ORDER BY customerName LIMIT 10;

-- Translates to MongoDB
db.customers.find({"customerNumber": {"$gt": 100}}, {"customerName": 1, "customerNumber": 1}).sort({"customerName": 1}).limit(10)

INSERT Statements

-- SQL
INSERT INTO customers (customerName, customerNumber, contactFirstName) VALUES ('New Company', 500, 'John');

-- Translates to MongoDB
db.customers.insertOne({"customerName": "New Company", "customerNumber": 500, "contactFirstName": "John"})

UPDATE Statements

-- SQL
UPDATE customers SET contactFirstName = 'Jane' WHERE customerNumber = 500;

-- Translates to MongoDB
db.customers.updateMany({"customerNumber": 500}, {"$set": {"contactFirstName": "Jane"}})

DELETE Statements

-- SQL
DELETE FROM customers WHERE customerNumber = 500;

-- Translates to MongoDB
db.customers.deleteMany({"customerNumber": {"$eq": 500}})

Subquery Statements (Planned)

-- SQL Scalar Subquery
SELECT customerName FROM customers 
WHERE customerNumber = (SELECT customerNumber FROM orders ORDER BY orderDate DESC LIMIT 1);

-- Translates to MongoDB Aggregation Pipeline
db.customers.aggregate([
  {$lookup: {
    from: "orders", 
    pipeline: [{$sort: {orderDate: -1}}, {$limit: 1}], 
    as: "subquery"
  }},
  {$unwind: "$subquery"},
  {$match: {$expr: {$eq: ["$customerNumber", "$subquery.customerNumber"]}}},
  {$project: {customerName: 1}}
])

Supported SQL Functions

The translator supports 47+ MariaDB/MySQL functions with comprehensive mapping to MongoDB equivalents:

String Functions (10/10 - 100% supported)

  • CONCAT()$concat
  • SUBSTRING()$substr
  • LENGTH()$strLenCP
  • UPPER(), LOWER()$toUpper, $toLower
  • TRIM(), LTRIM(), RTRIM()$trim, $ltrim, $rtrim
  • REPLACE()$replaceAll
  • LEFT(), RIGHT()$substr variations
  • REVERSE() → Custom implementation

Mathematical Functions (10/10 - 100% supported)

  • ABS()$abs
  • CEIL(), FLOOR()$ceil, $floor
  • ROUND()$round
  • POWER(), SQRT()$pow, $sqrt
  • SIN(), COS()$sin, $cos
  • LOG()$ln
  • GREATEST()$max

Date/Time Functions (22/22 - 100% supported)

  • NOW(), CURDATE(), CURTIME()$$NOW and variants
  • YEAR(), MONTH(), DAY()$year, $month, $dayOfMonth
  • HOUR(), MINUTE(), SECOND()$hour, $minute, $second
  • DATE_FORMAT()$dateToString
  • MAKEDATE(), MAKETIME() → Custom implementations
  • TIMESTAMPADD(), ADDTIME(), SUBTIME() → Date arithmetic
  • EXTRACT(), TO_DAYS() → Temporal extractions

Aggregate Functions (5/5 - 100% supported)

  • COUNT()$sum: 1
  • SUM(), AVG(), MIN(), MAX()$sum, $avg, $min, $max

Advanced Operations (100% supported)

  • JOINs: INNER, LEFT, RIGHT, multi-table joins
  • ORDER BY: Single and multi-field sorting with proper collation
  • DISTINCT: Single and multi-column distinct operations
  • LIMIT/OFFSET: Result pagination

CLI Commands

Inside the interactive shell:

  • help - Show available commands
  • show collections - List all collections in current database
  • use <database> - Switch to different database
  • quit or exit - Exit the client

Configuration

Environment variables (.env file):

# MongoDB Configuration
MONGO_HOST=localhost
MONGO_USERNAME=username
MONGO_PASSWORD=password
MONGO_AUTH_DATABASE=admin
MONGO_DATABASE=classicmodels
MONGO_PORT=27017

# MongoDB Connection Options  
MONGO_RETRY_WRITES=true
MONGO_WRITE_CONCERN=majority
MONGO_APP_NAME=MongoSQL
MONGODB_TIMEOUT=5000
MONGODB_SSL=false

# For MongoDB Atlas, use format like:
# MONGO_HOST=cluster0.xxxxx.mongodb.net
# MONGODB_SSL=true

# MariaDB Configuration (for QA comparison testing)
MARIADB_HOST=localhost
MARIADB_USERNAME=username
MARIADB_PASSWORD=password
MARIADB_DATABASE=classicmodels

Command Line Options

  • --host - MongoDB host (default: localhost)
  • --port - MongoDB port (default: 27017)
  • --database, -d - Database name
  • --username, -u - Username for authentication
  • --password, -p - Prompt for password
  • --execute, -e - Execute statement and exit
  • --batch - Run in batch mode (non-interactive)

Project Structure

src/
├── cli/
│   └── main.py              # CLI interface and main entry point
├── database/
│   └── mongodb_client.py    # MongoDB connection with collation support
├── parsers/
│   └── token_sql_parser.py  # Advanced SQL statement parsing using sqlparse tokens
├── translators/
│   └── sql_to_mql.py        # Main SQL to MQL translation logic
├── functions/
│   ├── function_mapper.py   # Function mapping coordination
│   ├── string_functions.py  # String function mappings
│   ├── math_functions.py    # Mathematical function mappings
│   ├── datetime_functions.py # Date/time function mappings
│   └── aggregate_functions.py # Aggregate function mappings
├── joins/
│   ├── join_parser.py       # JOIN clause parsing
│   ├── join_translator.py   # JOIN translation to aggregation
│   ├── join_optimizer.py    # JOIN query optimization
│   └── join_types.py        # JOIN type definitions
├── groupby/
│   ├── groupby_parser.py    # GROUP BY clause parsing
│   ├── groupby_translator.py # GROUP BY to aggregation pipeline
│   └── groupby_types.py     # GROUP BY type definitions
├── orderby/
│   ├── orderby_parser.py    # ORDER BY clause parsing
│   ├── orderby_translator.py # ORDER BY to $sort translation
│   └── orderby_types.py     # ORDER BY type definitions
├── conditional/
│   ├── conditional_parser.py    # Conditional function parsing
│   ├── conditional_translator.py # IF, CASE WHEN, COALESCE translation
│   ├── conditional_types.py     # Conditional expression types
│   └── conditional_function_mapper.py # Function mapping
├── subqueries/
│   ├── subquery_parser.py       # Subquery detection and parsing
│   ├── subquery_translator.py   # Subquery to aggregation pipeline translation
│   ├── subquery_types.py        # Subquery type definitions (scalar, IN, EXISTS)
│   └── subquery_optimizer.py    # Subquery execution optimization
├── where/
│   ├── where_parser.py      # WHERE clause parsing
│   ├── where_translator.py  # WHERE to MongoDB match filters
│   └── where_types.py       # WHERE condition types
├── reserved_words/
│   ├── reserved_word_handler.py     # MariaDB reserved word handling
│   └── mariadb_reserved_words.py   # Complete MariaDB word lists
├── formatters/
│   ├── result_formatter.py  # Generic result formatting
│   └── mariadb_formatter.py # MariaDB-compatible output
└── utils/
    ├── helpers.py           # Utility functions
    └── schema.py            # Database schema utilities

QA/
└── mariadb_comparison_qa.py # Comprehensive test suite (100 tests across 2 phases)

KB/
├── MONGODB_FUNCTION_MAPPING.md # Complete function mapping reference
├── MISSING_FUNCTIONS.md        # Planned function implementations
├── REFERENCE_LINKS.md          # Documentation links and resources
└── mariadb.md                  # MariaDB compatibility notes

Module Architecture

Core Modules

JOIN Module (src/joins/)

Comprehensive JOIN support with MongoDB aggregation pipeline translation:

  • Supported Types: INNER, LEFT, RIGHT, CROSS JOIN
  • Features: Multi-table joins, complex ON conditions, table aliases
  • MongoDB Translation: Uses $lookup and $unwind operations
  • Optimization: Query optimization for MongoDB execution model

GROUP BY Module (src/groupby/)

Complete GROUP BY functionality with aggregation support:

  • Features: Single/multiple field grouping, aggregate functions, HAVING clauses
  • Supported Aggregates: COUNT, SUM, AVG, MIN, MAX
  • MongoDB Translation: Uses $group aggregation pipeline stage
  • Integration: Works seamlessly with ORDER BY and LIMIT

Conditional Module (src/conditional/)

SQL conditional functions translated to MongoDB operators:

  • IF Function: $cond operator for conditional logic ✅
  • CASE WHEN: $switch operator for multi-branch conditions ✅
  • COALESCE: Nested $ifNull operators for null handling ✅
  • NULLIF: $cond with $eq comparison for null conversion ✅
  • Expression Evaluation: Comprehensive MongoDB operator support
  • Edge Case Handling: Proper null value formatting and display

Subqueries Module (src/subqueries/)

Advanced subquery support for nested SELECT statements:

  • Scalar Subqueries: Single value comparisons in WHERE clauses
  • IN Subqueries: Value existence checking with subquery result sets
  • EXISTS Subqueries: Correlated subqueries for row existence testing
  • MongoDB Translation: Complex aggregation pipeline generation with $lookup
  • Performance Optimization: Query optimization for nested operations
  • Integration: Seamless integration with WHERE, JOIN, and other modules

Reserved Words Module (src/reserved_words/)

MariaDB compatibility with proper identifier handling:

  • Comprehensive Lists: All MariaDB reserved words and keywords
  • Oracle Mode: Additional words for Oracle compatibility mode
  • Smart Escaping: Automatic backtick escaping when needed
  • Context Aware: Different handling for different SQL contexts

WHERE Module (src/where/)

Complex WHERE clause parsing and translation:

  • Operators: All comparison operators, LIKE patterns, IN clauses
  • Logical Operations: AND, OR, NOT with proper precedence
  • MongoDB Translation: Converts to MongoDB match filters
  • Pattern Matching: SQL LIKE to MongoDB regex conversion

Supporting Modules

Functions Module (src/functions/)

47+ MariaDB/MySQL functions with MongoDB equivalents:

  • String Functions: CONCAT, SUBSTRING, LENGTH, TRIM, etc.
  • Math Functions: ABS, ROUND, POWER, SQRT, trigonometric functions
  • DateTime Functions: NOW, DATE_FORMAT, YEAR, MONTH, etc.
  • Aggregate Functions: COUNT, SUM, AVG, MIN, MAX

Parsers (src/parsers/)

Token-based SQL parsing using sqlparse library:

  • Robust Parsing: Handles complex SQL syntax correctly
  • Token-Based: No regex usage, reliable edge case handling
  • Modular: Delegates to specialized parsers for different clauses

Database (src/database/)

MongoDB connection and query execution:

  • Connection Pooling: Efficient MongoDB connection management
  • Collation Support: Matches MariaDB utf8mb4_unicode_ci behavior
  • Error Handling: Comprehensive error messages and recovery

Testing Framework

Quality Assurance (QA/)

Comprehensive testing framework achieving 100% core SQL compatibility:

  • 100 Test Cases across 13 functional categories organized in development phases
  • Phase 1 Complete: 69 tests for core SQL features (100% pass rate)
  • Phase 2 Testing: 31 tests for modern application extensions (JSON, enhanced strings, advanced aggregates)
  • Side-by-side Comparison: MariaDB vs MongoDB result validation
  • Collation Testing: Ensures identical sorting behavior
  • Automated Reporting: Detailed success/failure analysis with categorized results

QA Testing Usage:

# Test all categories
python QA/mariadb_comparison_qa.py

# Test specific development phase
python QA/mariadb_comparison_qa.py --phase 1    # Core SQL features
python QA/mariadb_comparison_qa.py --phase 2    # Modern extensions

# Test specific category
python QA/mariadb_comparison_qa.py --category datetime
python QA/mariadb_comparison_qa.py --category json

# Test specific function
python QA/mariadb_comparison_qa.py --function JSON_EXTRACT

# Verbose output with detailed comparisons
python QA/mariadb_comparison_qa.py --phase 1 --verbose

Error Handling

The translator provides helpful error messages for:

  • Connection issues
  • Invalid SQL syntax
  • Unsupported operations
  • MongoDB operation failures

Limitations and Roadmap

Currently Working (95.5% compatibility)

  • ✅ All basic SQL operations (SELECT, INSERT, UPDATE, DELETE)
  • ✅ Comprehensive function library (47+ functions)
  • ✅ Complete JOIN support (INNER, LEFT, RIGHT, multi-table)
  • ✅ Full ORDER BY with proper collation matching
  • ✅ Complete GROUP BY with HAVING clause support
  • ✅ All DISTINCT operations
  • ✅ All aggregate functions
  • ✅ MariaDB reserved words handling
  • Complete conditional functions (IF, CASE WHEN, COALESCE, NULLIF)

In Development

  • 🔄 Subqueries Module - Implementing nested SELECT statement support:
    • Scalar Subqueries: WHERE column = (SELECT ...) comparisons
    • IN Subqueries: WHERE column IN (SELECT ...) membership testing
    • EXISTS Subqueries: WHERE EXISTS (SELECT ...) correlated queries
    • MongoDB Translation: Converting to $lookup aggregation pipelines
    • Target: Complete remaining 3 test cases to reach 100% compatibility
  • 🔄 Additional MariaDB functions - Expanding function library beyond current 47+

MongoDB-Specific Considerations

  • Document-based nature: Complex JOINs are converted to aggregation pipelines
  • Collation importance: Proper collation configuration required for sorting compatibility
  • Schema flexibility: MongoDB's schemaless nature handled gracefully
  • Performance optimization: Queries optimized for MongoDB's execution model

Known Differences

  • Collation: Requires explicit configuration for MariaDB compatibility
  • NULL handling: MongoDB's null semantics may differ slightly from SQL
  • Data types: Automatic type conversion between SQL and BSON types

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

[Add your license information here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages