A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.
- Live EDA Analysis: 11-step comprehensive data analysis with step-by-step progress updates
- Real-Time AutoML: Automated machine learning with live training progress and model performance tracking
- WebSocket Connections: Instant progress updates and real-time notifications
- Background Job Processing: Asynchronous task execution with Celery and Redis
- Live Progress Tracking: Detailed progress bars and status updates for all operations
- Automated EDA: Comprehensive exploratory data analysis with statistical insights
- Smart AutoML: Automated model selection and hyperparameter tuning using FLAML
- Model Explainability: SHAP, LIME, and permutation importance for model interpretability
- Gemini AI Integration: Natural language explanations and business insights
- Multi-Modal Support: Tabular, Computer Vision, NLP, and Time Series data
- Responsive Web UI: React + TypeScript + Tailwind CSS with dark/light themes
- Real-Time Dashboards: Live metrics, activity monitoring, and interactive visualizations
- Drag & Drop Interface: Intuitive file upload and data management
- Interactive Visualizations: Plotly.js and Recharts for data exploration
- Mobile Optimized: Fully responsive design for all devices
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β Infrastructureβ
β (React + TS) βββββΊβ (FastAPI) βββββΊβ (Docker) β
β β β β β β
β β’ Real-time UI β β β’ REST APIs β β β’ PostgreSQL β
β β’ WebSocket β β β’ Background β β β’ Redis β
β β’ Visualizationsβ β Jobs β β β’ MinIO β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β AI/ML Stack β
β β
β β’ FLAML AutoML β
β β’ SHAP/LIME β
β β’ Gemini AI β
βββββββββββββββββββ
- Framework: React 18.2.0 with TypeScript
- Build Tool: Vite 4.5.0
- Styling: Tailwind CSS 3.3.5
- Charts: Plotly.js 2.27.0, Recharts 2.8.0
- Routing: React Router DOM 6.20.1
- State Management: Zustand 4.4.7
- UI Components: Headless UI, Heroicons
- Framework: FastAPI 0.104.1 with async support
- Server: Uvicorn with WebSocket support
- Data Processing: Pandas 2.1.4, NumPy 1.24.3
- Machine Learning: Scikit-learn 1.3.2, FLAML 2.1.1
- Model Explainability: SHAP 0.43.0, LIME 0.2.0.1
- AI Integration: Google Generative AI 0.3.2
- Task Queue: Celery 5.3.4 with Redis 5.0.1
- WebSockets: WebSockets 12.0 for real-time updates
- Containerization: Docker & Docker Compose
- Database: PostgreSQL with SQLAlchemy 2.0.23
- Object Storage: MinIO 7.2.0
- Message Broker: Redis 7.0 (Alpine)
- Monitoring: Prometheus + Grafana
- Task Monitoring: Flower (Celery dashboard)
- Load Balancing: Nginx (production ready)
- Docker & Docker Compose (v20.10+)
- Git for version control
- Google Gemini API Key for AI explanations
-
Clone the Repository
git clone <repository-url> cd auto-insights
-
Environment Configuration
# Copy environment template cp .env.example .env # Edit .env file with your configuration nano .env # or use your preferred editor
Required Environment Variables:
# AI Integration GEMINI_API_KEY=your_google_gemini_api_key_here # Database DATABASE_URL=postgresql://user:password@localhost:5432/auto_insights # Object Storage MINIO_ENDPOINT=localhost:9000 MINIO_ACCESS_KEY=minioadmin MINIO_SECRET_KEY=minioadmin # Redis REDIS_URL=redis://localhost:6379/0
-
Launch the Platform
# Start all services (recommended) ./start.sh # Or use Docker Compose directly docker-compose up -d
-
Verify Installation
# Check service status docker-compose ps # Validate platform functionality python validate_platform.py
-
Access Applications
- Main Application: http://localhost:3000
- API Documentation: http://localhost:8000/docs
- Interactive API Docs: http://localhost:8000/redoc
- Task Monitoring: http://localhost:5555
- MinIO Console: http://localhost:9001
- Grafana Dashboard: http://localhost:3001
- Prometheus Metrics: http://localhost:9090
The platform performs comprehensive exploratory data analysis with real-time progress updates:
- Data Loading & Validation (5%)
- Basic Statistics (15%)
- Missing Values Analysis (25%)
- Distribution Analysis (35%)
- Correlation Analysis (45%)
- Feature Importance (55%)
- Outlier Detection (65%)
- Data Quality Report (75%)
- Visualization Generation (85%)
- Summary & Recommendations (95%)
- Complete (100%)
Automated machine learning with live progress tracking:
- Algorithm Selection: Automatic model selection from 20+ algorithms
- Hyperparameter Tuning: Intelligent parameter optimization
- Cross-Validation: Real-time CV score updates
- Model Comparison: Live leaderboard updates
- Performance Metrics: Instant accuracy, precision, recall tracking
Real-time updates via WebSocket connections:
// Frontend WebSocket integration
const ws = new WebSocket(`ws://localhost:8000/ws/job/${jobId}`);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Progress:', data.progress, '%');
console.log('Status:', data.status);
console.log('Message:', data.message);
};cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
# Run linting
npm run lint
# Type checking
npm run type-checkcd backend
# Install Python dependencies
pip install -r requirements.txt
# Start development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Run with background task support
celery -A celery_app.celery_app worker --loglevel=info
# Run Redis for local development
redis-server# Terminal 1: Frontend
cd frontend && npm run dev
# Terminal 2: Backend
cd backend && uvicorn main:app --reload
# Terminal 3: Redis
redis-server
# Terminal 4: Celery Worker
cd backend && celery -A celery_app.celery_app worker --loglevel=info# Comprehensive platform validation
python validate_platform.py# Install test dependencies
pip install websockets requests pandas
# Run comprehensive real-time test
python test_realtime.py# API load testing
python load_test.py
# WebSocket stress testing
python websocket_test.py- Prometheus: System metrics collection
- Grafana: Real-time dashboards and alerting
- Custom Metrics: Business KPIs and ML model performance
- Structured Logging: JSON formatted logs with correlation IDs
- Log Aggregation: Centralized logging with ELK stack ready
- Error Tracking: Comprehensive error handling and reporting
- Real-time Metrics: CPU, memory, disk usage
- Application Metrics: Response times, throughput, error rates
- ML Metrics: Model accuracy, training time, prediction latency
- CORS Protection: Configured for production domains
- Input Validation: Pydantic models for all API inputs
- SQL Injection Protection: Parameterized queries
- XSS Protection: Input sanitization and validation
- CSRF Protection: Token-based authentication ready
- Encrypted Storage: Database and object storage encryption
- Secure APIs: HTTPS enforcement in production
- Access Control: Role-based permissions ready
- Audit Logging: Complete activity tracking
# Build production images
docker-compose -f docker-compose.prod.yml up -d
# Or use the production startup script
./deploy.sh- AWS: ECS Fargate with RDS and ElastiCache
- Google Cloud: Cloud Run with Cloud SQL and Memorystore
- Azure: Container Instances with Azure Database and Redis Cache
- Horizontal Scaling: Multiple backend instances behind load balancer
- Database Scaling: Read replicas and connection pooling
- Celery Scaling: Multiple worker nodes
- Caching: Redis clustering for high availability
GET /api/projects- List all projectsPOST /api/projects- Create new projectGET /api/projects/{id}- Get project detailsPUT /api/projects/{id}- Update projectDELETE /api/projects/{id}- Delete project
POST /api/projects/{id}/upload- Upload datasetGET /api/projects/{id}/datasets- List datasetsGET /api/projects/{id}/datasets/{dataset_id}- Get dataset infoDELETE /api/projects/{id}/datasets/{dataset_id}- Delete dataset
POST /api/eda/analyze- Start EDA analysis with WebSocketGET /api/eda/{project_id}/{dataset_id}/report- Get EDA resultsPOST /api/automl/train- Start AutoML training with WebSocketGET /api/automl/{project_id}/leaderboard- Get model leaderboardGET /api/automl/{project_id}/models/{model_id}- Get specific model
ws://localhost:8000/ws/job/{job_id}- Real-time job progress
{
"job_id": "uuid-string",
"status": "running|completed|failed",
"progress": 75.5,
"message": "Processing step 8/11: Feature importance analysis",
"data": {
"results": "...",
"metrics": {...}
},
"websocket_url": "ws://localhost:8000/ws/job/uuid-string"
}- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
- Python: PEP 8 with Black formatting
- TypeScript: ESLint + Prettier
- Git Hooks: Pre-commit hooks for code quality
- Unit tests for all new features
- Integration tests for API endpoints
- End-to-end tests for critical workflows
- Performance benchmarks for ML components
This project is licensed under the MIT License - see the LICENSE file for details.
Problem: WebSocket connections failing
# Solution: Check Redis and Celery services
docker-compose logs redis
docker-compose logs celery_workerProblem: ML models not training
# Solution: Verify Python dependencies
docker-compose exec backend pip list | grep -E "(pandas|scikit-learn|flaml)"Problem: File uploads failing
# Solution: Check MinIO service and permissions
docker-compose logs minio
---
## π Acknowledgments
- **Google Gemini AI** for natural language explanations
- **FLAML** for automated machine learning
- **FastAPI** for the robust backend framework
- **React** ecosystem for the modern frontend
- **Open Source Community** for all the amazing tools and libraries
---
**Built with β€οΈ for data scientists, ML engineers, and business analysts who need instant insights from their data.**
---
**β Star this repository if you find it useful!**