DRISHTI — Deep Remote-sensing Intelligence System for Holistic Task Integration

A Scalable Remote Sensing Vision-Language Model for Unified Spatial Reasoning

🥈 Awarded Silver Medal at Inter IIT Tech Meet 14.0

DRISHTI is a state-of-the-art Remote Sensing Vision-Language Model (RS-VLM) that unifies image-, region-, and pixel-level tasks for satellite and aerial imagery analysis. It addresses the challenges of ultra-high-resolution imagery's token explosion and heterogeneous modalities (optical/SAR), enabling unified spatial reasoning across diverse remote sensing applications.

Key Features

Component	Description
DRISHTI	Unified RS-VLM integrating optical and SAR understanding at image, region, and pixel levels
STP (Spatial Token Pruning)	Single-pass token pruning that preserves multiple salient regions while reducing visual compute by up to 50%
EarthMind-4B	Structured Visual Question Answering (VQA) and image captioning for remote sensing imagery
RemoteSAM	Deterministic pixel reasoning for precise segmentation and oriented bounding box localization
Vyoma Interface	Natural language interface supporting grounded exploration of large RS scenes
Task Classifier	BERT-based automatic query routing to the appropriate model (caption/vqa/grounding/area)

Technical Highlights

Spatial Token Pruning (STP)

Unlike iterative zooming approaches, STP is a single-pass method that:

Analyzes attention patterns across transformer layers (layers 19-29)
Generates gradient-weighted attention maps to identify salient regions
Dynamically selects relevant image tiles based on query context
Reduces visual compute by up to 50% while preserving task-critical information
Supports ultra-high-resolution imagery through intelligent tile selection

Multi-Task Support

DRISHTI handles diverse remote sensing tasks through unified inference:

Caption: Generate descriptive captions for satellite imagery
VQA: Answer visual questions with binary, numeric, or semantic responses
Grounding: Locate objects with precise segmentation masks and oriented bounding boxes
Area Calculation: Compute geographic areas from segmentation masks

Quick Start

Prerequisites

Docker and Docker Compose
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit (nvidia-container-toolkit)
Ensure that docker is configured to use NVIDIA runtime by doing sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
At least 16GB GPU VRAM (recommended: 24GB+)
~20GB disk space for model weights

One-Command Deployment

# Clone the repository
git clone https://github.com/4adex/drishti.git
cd drishti

# IMPORTANT: Change the server ip in env if deploying in a remote machine
# Run the setup script
./setup.sh

This will:

Download model weights (~15GB) from Hugging Face
Build Docker images for backend and frontend
Start all services

Access the Application

Once running:

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

Deployment Options

Detached Mode (Background)

# Start in background
./setup.sh -d

# View logs
./setup.sh --logs

# Stop services
./setup.sh --down

Skip Downloads (If weights already exist)

./setup.sh --skip-weights

Custom Configuration

# Copy and edit environment file
cp .env.example .env

# Configure ports, GPU, etc.
nano .env

# Then run setup
./setup.sh

Environment Variables

Variable	Default	Description
`MODELS_DIR`	`./backend/models`	Path to model weights
`BACKEND_PORT`	`8000`	Backend API port
`FRONTEND_PORT`	`3000`	Frontend web port
`CUDA_VISIBLE_DEVICES`	`0`	GPU device(s) to use
`NEXT_PUBLIC_API_URL`	`http://localhost:8000`	Backend URL for frontend

Setting Up for Remote Access

When deploying on a remote server, you must configure NEXT_PUBLIC_API_URL to point to your server's public IP or domain. Otherwise, the frontend will try to connect to localhost which won't work from external clients.

Create or edit the .env file:

cp .env.example .env
nano .env

Set your server's public IP:

# Drishti Configuration
BACKEND_PORT=8000
FRONTEND_PORT=3000
CUDA_VISIBLE_DEVICES=0

# IMPORTANT: Replace with your server's public IP or domain
NEXT_PUBLIC_API_URL=http://YOUR_SERVER_IP:8000

# Models directory
MODELS_DIR=./backend/models

Rebuild the frontend (required because NEXT_PUBLIC_* vars are baked in at build time):

# If already running, rebuild frontend with new config
docker compose build frontend --no-cache
docker compose up -d

# Or for fresh deployment
./setup.sh

Note: Any time you change NEXT_PUBLIC_API_URL, you must rebuild the frontend for the changes to take effect.

Development

Backend Only

cd backend
./setup.sh

Frontend Only

cd frontend
pnpm install
pnpm dev

Manual Docker Compose

# Build images
docker compose build

# Start services
docker compose up

# Or in detached mode
docker compose up -d

Troubleshooting

GPU Not Detected

# Check NVIDIA Docker runtime
docker info | grep nvidia

# Install nvidia-container-toolkit if missing
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker

Out of Memory

Reduce GPU usage by modifying backend/serve.py or use a single model at a time. The system requires:

Minimum: 16GB GPU VRAM
Recommended: 24GB+ for optimal performance with STP enabled

Backend Health Check Failing

The backend takes 2-3 minutes to load models (EarthMind-4B ~8GB, RemoteSAM, BERT classifier). Wait for the health check to pass:

# Check backend status
curl http://localhost:8000/health

Port Already in Use

# Use different ports
BACKEND_PORT=8001 FRONTEND_PORT=3001 ./setup.sh

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check for all model deployments
`/predict`	POST	Unified inference endpoint (auto-routes via Task Classifier)
`/geoNLI/eval`	POST	GeoNLI evaluation endpoint for benchmarking
`/docs`	GET	Interactive API documentation (Swagger UI)

Acknowledgments

EarthMind — Vision-Language Model for remote sensing
RemoteSAM — Segment Anything for Earth Observation (ACM MM 2025)
Ray Serve — Scalable model serving framework
Next.js — React framework for the Vyoma interface

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DRISHTI — Deep Remote-sensing Intelligence System for Holistic Task Integration

Key Features

Technical Highlights

Spatial Token Pruning (STP)

Multi-Task Support

Quick Start

Prerequisites

One-Command Deployment

Access the Application

Deployment Options

Detached Mode (Background)

Skip Downloads (If weights already exist)

Custom Configuration

Environment Variables

Setting Up for Remote Access

Development

Backend Only

Frontend Only

Manual Docker Compose

Troubleshooting

GPU Not Detected

Out of Memory

Backend Health Check Failing

Port Already in Use

API Endpoints

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

4adex/drishti

Folders and files

Latest commit

History

Repository files navigation

DRISHTI — Deep Remote-sensing Intelligence System for Holistic Task Integration

Key Features

Technical Highlights

Spatial Token Pruning (STP)

Multi-Task Support

Quick Start

Prerequisites

One-Command Deployment

Access the Application

Deployment Options

Detached Mode (Background)

Skip Downloads (If weights already exist)

Custom Configuration

Environment Variables

Setting Up for Remote Access

Development

Backend Only

Frontend Only

Manual Docker Compose

Troubleshooting

GPU Not Detected

Out of Memory

Backend Health Check Failing

Port Already in Use

API Endpoints

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages