Skip to content

A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.

License

Notifications You must be signed in to change notification settings

zaidshaikh987/Auto-Insight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Auto-Insights Platform πŸš€

A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.

Auto-Insights Python React FastAPI Docker TypeScript Vite TailwindCSS Uvicorn WebSockets Celery Redis PostgreSQL MinIO Prometheus Grafana FLAML scikit-learn Pandas NumPy ESLint Prettier Black License: MIT

image image image image image image image image image image image image image

🌟 Key Features

πŸ”΄ Real-Time Processing

  • Live EDA Analysis: 11-step comprehensive data analysis with step-by-step progress updates
  • Real-Time AutoML: Automated machine learning with live training progress and model performance tracking
  • WebSocket Connections: Instant progress updates and real-time notifications
  • Background Job Processing: Asynchronous task execution with Celery and Redis
  • Live Progress Tracking: Detailed progress bars and status updates for all operations

πŸ€– AI-Powered Intelligence

  • Automated EDA: Comprehensive exploratory data analysis with statistical insights
  • Smart AutoML: Automated model selection and hyperparameter tuning using FLAML
  • Model Explainability: SHAP, LIME, and permutation importance for model interpretability
  • Gemini AI Integration: Natural language explanations and business insights
  • Multi-Modal Support: Tabular, Computer Vision, NLP, and Time Series data

🎨 Modern User Experience

  • Responsive Web UI: React + TypeScript + Tailwind CSS with dark/light themes
  • Real-Time Dashboards: Live metrics, activity monitoring, and interactive visualizations
  • Drag & Drop Interface: Intuitive file upload and data management
  • Interactive Visualizations: Plotly.js and Recharts for data exploration
  • Mobile Optimized: Fully responsive design for all devices

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚    Backend      β”‚    β”‚   Infrastructureβ”‚
β”‚   (React + TS)  │◄──►│   (FastAPI)     │◄──►│   (Docker)      β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ Real-time UI  β”‚    β”‚ β€’ REST APIs     β”‚    β”‚ β€’ PostgreSQL    β”‚
β”‚ β€’ WebSocket     β”‚    β”‚ β€’ Background     β”‚    β”‚ β€’ Redis         β”‚
β”‚ β€’ Visualizationsβ”‚    β”‚   Jobs          β”‚    β”‚ β€’ MinIO         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   AI/ML Stack   β”‚
                       β”‚                 β”‚
                       β”‚ β€’ FLAML AutoML  β”‚
                       β”‚ β€’ SHAP/LIME     β”‚
                       β”‚ β€’ Gemini AI     β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technology Stack

Frontend

  • Framework: React 18.2.0 with TypeScript
  • Build Tool: Vite 4.5.0
  • Styling: Tailwind CSS 3.3.5
  • Charts: Plotly.js 2.27.0, Recharts 2.8.0
  • Routing: React Router DOM 6.20.1
  • State Management: Zustand 4.4.7
  • UI Components: Headless UI, Heroicons

Backend

  • Framework: FastAPI 0.104.1 with async support
  • Server: Uvicorn with WebSocket support
  • Data Processing: Pandas 2.1.4, NumPy 1.24.3
  • Machine Learning: Scikit-learn 1.3.2, FLAML 2.1.1
  • Model Explainability: SHAP 0.43.0, LIME 0.2.0.1
  • AI Integration: Google Generative AI 0.3.2
  • Task Queue: Celery 5.3.4 with Redis 5.0.1
  • WebSockets: WebSockets 12.0 for real-time updates

Infrastructure & DevOps

  • Containerization: Docker & Docker Compose
  • Database: PostgreSQL with SQLAlchemy 2.0.23
  • Object Storage: MinIO 7.2.0
  • Message Broker: Redis 7.0 (Alpine)
  • Monitoring: Prometheus + Grafana
  • Task Monitoring: Flower (Celery dashboard)
  • Load Balancing: Nginx (production ready)

πŸš€ Quick Start Guide

Prerequisites

  • Docker & Docker Compose (v20.10+)
  • Git for version control
  • Google Gemini API Key for AI explanations

Installation & Setup

  1. Clone the Repository

    git clone <repository-url>
    cd auto-insights
  2. Environment Configuration

    # Copy environment template
    cp .env.example .env
    
    # Edit .env file with your configuration
    nano .env  # or use your preferred editor

    Required Environment Variables:

    # AI Integration
    GEMINI_API_KEY=your_google_gemini_api_key_here
    
    # Database
    DATABASE_URL=postgresql://user:password@localhost:5432/auto_insights
    
    # Object Storage
    MINIO_ENDPOINT=localhost:9000
    MINIO_ACCESS_KEY=minioadmin
    MINIO_SECRET_KEY=minioadmin
    
    # Redis
    REDIS_URL=redis://localhost:6379/0
  3. Launch the Platform

    # Start all services (recommended)
    ./start.sh
    
    # Or use Docker Compose directly
    docker-compose up -d
  4. Verify Installation

    # Check service status
    docker-compose ps
    
    # Validate platform functionality
    python validate_platform.py
  5. Access Applications


πŸ“Š Real-Time Features Deep Dive

Live EDA Analysis

The platform performs comprehensive exploratory data analysis with real-time progress updates:

  1. Data Loading & Validation (5%)
  2. Basic Statistics (15%)
  3. Missing Values Analysis (25%)
  4. Distribution Analysis (35%)
  5. Correlation Analysis (45%)
  6. Feature Importance (55%)
  7. Outlier Detection (65%)
  8. Data Quality Report (75%)
  9. Visualization Generation (85%)
  10. Summary & Recommendations (95%)
  11. Complete (100%)

Real-Time AutoML Training

Automated machine learning with live progress tracking:

  • Algorithm Selection: Automatic model selection from 20+ algorithms
  • Hyperparameter Tuning: Intelligent parameter optimization
  • Cross-Validation: Real-time CV score updates
  • Model Comparison: Live leaderboard updates
  • Performance Metrics: Instant accuracy, precision, recall tracking

WebSocket Communication

Real-time updates via WebSocket connections:

// Frontend WebSocket integration
const ws = new WebSocket(`ws://localhost:8000/ws/job/${jobId}`);

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Progress:', data.progress, '%');
  console.log('Status:', data.status);
  console.log('Message:', data.message);
};

πŸ”§ Development Workflow

Frontend Development

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Run linting
npm run lint

# Type checking
npm run type-check

Backend Development

cd backend

# Install Python dependencies
pip install -r requirements.txt

# Start development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Run with background task support
celery -A celery_app.celery_app worker --loglevel=info

# Run Redis for local development
redis-server

Full Development Stack

# Terminal 1: Frontend
cd frontend && npm run dev

# Terminal 2: Backend
cd backend && uvicorn main:app --reload

# Terminal 3: Redis
redis-server

# Terminal 4: Celery Worker
cd backend && celery -A celery_app.celery_app worker --loglevel=info

πŸ§ͺ Testing & Validation

Platform Validation

# Comprehensive platform validation
python validate_platform.py

Real-Time Feature Testing

# Install test dependencies
pip install websockets requests pandas

# Run comprehensive real-time test
python test_realtime.py

Load Testing

# API load testing
python load_test.py

# WebSocket stress testing
python websocket_test.py

πŸ“ˆ Monitoring & Observability

Application Monitoring

  • Prometheus: System metrics collection
  • Grafana: Real-time dashboards and alerting
  • Custom Metrics: Business KPIs and ML model performance

Log Management

  • Structured Logging: JSON formatted logs with correlation IDs
  • Log Aggregation: Centralized logging with ELK stack ready
  • Error Tracking: Comprehensive error handling and reporting

Performance Monitoring

  • Real-time Metrics: CPU, memory, disk usage
  • Application Metrics: Response times, throughput, error rates
  • ML Metrics: Model accuracy, training time, prediction latency

πŸ”’ Security & Best Practices

Security Features

  • CORS Protection: Configured for production domains
  • Input Validation: Pydantic models for all API inputs
  • SQL Injection Protection: Parameterized queries
  • XSS Protection: Input sanitization and validation
  • CSRF Protection: Token-based authentication ready

Data Protection

  • Encrypted Storage: Database and object storage encryption
  • Secure APIs: HTTPS enforcement in production
  • Access Control: Role-based permissions ready
  • Audit Logging: Complete activity tracking

πŸš€ Deployment Options

Production Deployment

# Build production images
docker-compose -f docker-compose.prod.yml up -d

# Or use the production startup script
./deploy.sh

Cloud Deployment

  • AWS: ECS Fargate with RDS and ElastiCache
  • Google Cloud: Cloud Run with Cloud SQL and Memorystore
  • Azure: Container Instances with Azure Database and Redis Cache

Scaling Considerations

  • Horizontal Scaling: Multiple backend instances behind load balancer
  • Database Scaling: Read replicas and connection pooling
  • Celery Scaling: Multiple worker nodes
  • Caching: Redis clustering for high availability

πŸ“š API Documentation

Core Endpoints

Project Management

  • GET /api/projects - List all projects
  • POST /api/projects - Create new project
  • GET /api/projects/{id} - Get project details
  • PUT /api/projects/{id} - Update project
  • DELETE /api/projects/{id} - Delete project

Data Management

  • POST /api/projects/{id}/upload - Upload dataset
  • GET /api/projects/{id}/datasets - List datasets
  • GET /api/projects/{id}/datasets/{dataset_id} - Get dataset info
  • DELETE /api/projects/{id}/datasets/{dataset_id} - Delete dataset

Real-Time Analysis

  • POST /api/eda/analyze - Start EDA analysis with WebSocket
  • GET /api/eda/{project_id}/{dataset_id}/report - Get EDA results
  • POST /api/automl/train - Start AutoML training with WebSocket
  • GET /api/automl/{project_id}/leaderboard - Get model leaderboard
  • GET /api/automl/{project_id}/models/{model_id} - Get specific model

WebSocket Endpoints

  • ws://localhost:8000/ws/job/{job_id} - Real-time job progress

Response Format

{
  "job_id": "uuid-string",
  "status": "running|completed|failed",
  "progress": 75.5,
  "message": "Processing step 8/11: Feature importance analysis",
  "data": {
    "results": "...",
    "metrics": {...}
  },
  "websocket_url": "ws://localhost:8000/ws/job/uuid-string"
}

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

Code Style

  • Python: PEP 8 with Black formatting
  • TypeScript: ESLint + Prettier
  • Git Hooks: Pre-commit hooks for code quality

Testing Standards

  • Unit tests for all new features
  • Integration tests for API endpoints
  • End-to-end tests for critical workflows
  • Performance benchmarks for ML components

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ†˜ Support & Troubleshooting

Common Issues

Problem: WebSocket connections failing

# Solution: Check Redis and Celery services
docker-compose logs redis
docker-compose logs celery_worker

Problem: ML models not training

# Solution: Verify Python dependencies
docker-compose exec backend pip list | grep -E "(pandas|scikit-learn|flaml)"

Problem: File uploads failing

# Solution: Check MinIO service and permissions
docker-compose logs minio

---

## πŸ™ Acknowledgments

- **Google Gemini AI** for natural language explanations
- **FLAML** for automated machine learning
- **FastAPI** for the robust backend framework
- **React** ecosystem for the modern frontend
- **Open Source Community** for all the amazing tools and libraries

---

**Built with ❀️ for data scientists, ML engineers, and business analysts who need instant insights from their data.**

---

**⭐ Star this repository if you find it useful!**

About

A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.

Topics

Resources

License

Stars

Watchers

Forks