Skip to content

whitebumblebee/sheetdb

Repository files navigation

SheetDB - Reanimating Spreadsheets πŸŽƒ

A Kiroween-themed project that brings Google Spreadsheets back to life as a functional relational database backend.

Universal Backend Solution - Use Google Sheets as a database with Python, REST API, or JavaScript client. No hosting, no bandwidth limits, no setup time.

Why SheetDB?

SheetDB transforms Google Sheets into a powerful relational database with the performance of a traditional backend. It's perfect for:

  • Rapid Prototyping - Build MVPs in hours, not days
  • πŸ‘₯ Collaborative Apps - Users can edit data directly in sheets
  • πŸ“Š Internal Tools - Admin dashboards without hosting costs
  • 🌐 Public APIs - Serve data from published sheets (no auth!)
  • πŸŽ“ Learning Projects - Database concepts without infrastructure

Key Features

  • πŸš€ 500-1000x Faster - DuckDB caching makes queries blazing fast (vs direct API calls)
  • πŸ”— Relational Queries - Joins, filters, aggregations, subqueries - full SQL power
  • πŸ”’ Type Safety - Schema validation and TypeScript support
  • 🌐 Multiple Interfaces - Python library, REST API, JavaScript client
  • 🎯 Framework Agnostic - Works with React, Django, FastAPI, Flask, and more
  • πŸ”“ Three Operating Modes - Public (no auth), Authenticated (full power), Hybrid (best of both)
  • πŸ“Š Built-in UI - Google Sheets provides the admin interface for free
  • ⚑ Zero Setup - Public mode works instantly, no credentials needed

Project Structure

⚠️ Note: This is a monorepo containing multiple components. When you pip install sheetdb, you get only the core library. See REPOSITORY_STRUCTURE.md for details.

.
β”œβ”€β”€ sheetdb/              # Core Python library (pip package)
β”œβ”€β”€ sheetdb_server/       # REST API server (separate component)
β”œβ”€β”€ sheetdb-js/           # JavaScript/TypeScript client (npm package)
β”œβ”€β”€ django_app/           # Monster Hunt Tracker demo
β”œβ”€β”€ examples/             # Example applications
β”œβ”€β”€ docs/                 # Documentation
└── tests/                # Test suite

What gets installed with pip install sheetdb:

  • βœ… Core library (sheetdb/)
  • βœ… CLI tool

What's separate:

  • ❌ REST API server (install FastAPI/uvicorn separately)
  • ❌ JavaScript client (separate npm package)
  • ❌ Examples (available in source only)

Setup

1. Create Virtual Environment

py -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment

cp .env.example .env
# Edit .env with your spreadsheet ID and credentials path

4. Set Up Google Sheets API

  1. Go to Google Cloud Console
  2. Create a new project or select existing one
  3. Enable Google Sheets API
  4. Create a service account
  5. Download the JSON credentials file
  6. Share your spreadsheet with the service account email

Three Operating Modes

SheetDB supports three modes to fit different use cases:

1. Public Mode (No Authentication) πŸ”“

Perfect for prototypes, public data, and learning projects. Zero setup required!

from sheetdb import SheetDB

# Just publish your sheet to web and use the URL
db = SheetDB.from_public_url("https://docs.google.com/spreadsheets/d/YOUR_ID/")
db.sync()  # Reads from public CSV export

# Query with full SQL power
monsters = db.query("SELECT * FROM monsters WHERE danger_level >= 7")

Benefits:

  • βœ… No OAuth setup or credentials needed
  • βœ… No rate limits on reads (unlimited!)
  • βœ… Perfect for prototypes and public data
  • βœ… Still get 500x performance boost from caching

How to enable:

  1. In Google Sheets: File β†’ Share β†’ Publish to web
  2. Select "Entire Document" and "Web page"
  3. Copy the URL and use it with SheetDB

2. Authenticated Mode (Full Power) πŸ”

Full CRUD operations with Google Sheets API. Best for private data and production apps.

from sheetdb import SheetDB

# Use service account credentials
db = SheetDB(
    credentials="credentials.json",
    spreadsheet_id="YOUR_SPREADSHEET_ID",
    cache_mode="aggressive"  # Load once, query many times
)

db.sync()  # One-time sync from Sheets API

# Unlimited fast queries (cached in DuckDB)
for i in range(1000):
    data = db.query("SELECT * FROM monsters")  # ~0.001s each!

# Write operations
db.insert('monsters', {'name': 'Vampire', 'danger_level': 8})
db.update('monsters', id=1, data={'danger_level': 9})
db.delete('monsters', id=1)

Benefits:

  • βœ… Full CRUD operations
  • βœ… Private data support
  • βœ… 500x faster reads via caching
  • βœ… Controlled write access

Setup:

  1. Create Google Cloud project
  2. Enable Google Sheets API
  3. Create service account and download JSON key
  4. Share spreadsheet with service account email

3. Hybrid Mode (Best of Both) ⚑

Combine public reads (unlimited) with authenticated writes (controlled). Recommended for production!

from sheetdb import SheetDB

# Public reads + authenticated writes
db = SheetDB(
    sheet_url="https://docs.google.com/spreadsheets/d/YOUR_ID/",
    credentials="credentials.json",
    mode="hybrid",
    cache_mode="smart",
    cache_ttl=300  # Auto-refresh every 5 minutes
)

# Unlimited fast reads from public URL
data = db.query("SELECT * FROM monsters")

# Controlled writes via API
db.insert('monsters', {'name': 'Vampire', 'danger_level': 8})

Benefits:

  • βœ… Unlimited reads (no rate limits)
  • βœ… Controlled writes (authenticated)
  • βœ… Best performance
  • βœ… Auto-sync for collaborative editing

Architecture

SheetDB uses a layered architecture that works with any framework:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Application Layer (React, Django, FastAPI, etc.)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Interface Layer (JS Client, REST API, ORM)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Core SheetDB Library (Unified API)                 β”‚
β”‚  β”œβ”€ Sheets Gateway (API + Public URLs)              β”‚
β”‚  β”œβ”€ DuckDB Engine (Relational Cache)                β”‚
β”‚  └─ Schema Config (Validation)                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Data Sources (Google Sheets, Public URLs, Forms)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Principles:

  • Google Sheets = Source of truth for all data
  • DuckDB = In-memory relational cache (500x faster queries)
  • Schema Config = Type safety and validation
  • Framework Agnostic = Works with any Python/JS framework

For a detailed explanation of how SheetDB works, see ARCHITECTURE.md.

Quick Start

5-Minute Setup (Public Mode)

The fastest way to get started - no authentication required!

from sheetdb import SheetDB

# 1. Publish your Google Sheet to web (File β†’ Share β†’ Publish to web)
# 2. Use the URL with SheetDB
db = SheetDB.from_public_url("https://docs.google.com/spreadsheets/d/YOUR_ID/")

# 3. Sync data (one-time load into cache)
db.sync()

# 4. Query with full SQL power
monsters = db.query("""
    SELECT m.name, m.danger_level, COUNT(h.id) as hunt_count
    FROM monsters m
    LEFT JOIN hunts h ON m.id = h.monster_id
    GROUP BY m.id, m.name, m.danger_level
    ORDER BY hunt_count DESC
""")

# 5. That's it! No credentials, no setup, just works.

Python Library (All Modes)

from sheetdb import SheetDB

# Public mode (no auth)
db = SheetDB.from_public_url("https://docs.google.com/spreadsheets/d/YOUR_ID/")

# Authenticated mode (full CRUD)
db = SheetDB(
    credentials="credentials.json",
    spreadsheet_id="YOUR_ID",
    cache_mode="aggressive"  # Default: load once, query many times
)

# Hybrid mode (public reads + auth writes)
db = SheetDB(
    sheet_url="https://docs.google.com/spreadsheets/d/YOUR_ID/",
    credentials="credentials.json",
    mode="hybrid"
)

# Cache modes
db = SheetDB(..., cache_mode="aggressive")  # Load once (default)
db = SheetDB(..., cache_mode="smart", cache_ttl=300)  # Auto-refresh every 5 min
db = SheetDB(..., cache_mode="lazy")  # Cache on-demand
db = SheetDB(..., cache_mode="none")  # No cache (like Retrosheet)

# Operations
db.sync()  # Refresh cache from sheets
data = db.query("SELECT * FROM monsters WHERE danger_level > 5")
db.insert('monsters', {'name': 'Vampire', 'danger_level': 8})
db.update('monsters', id=1, data={'danger_level': 9})
db.delete('monsters', id=1)

REST API Server

Start a universal backend that any frontend can use:

# Start the server
cd sheetdb_server
python server.py

# API is now available at http://localhost:8000
# Docs at http://localhost:8000/docs

API Endpoints:

# List all tables
GET /api/tables

# Query rows with filters
GET /api/tables/monsters/rows?danger_level__gte=7&limit=10&order_by=name

# Create row
POST /api/tables/monsters/rows
{"name": "Vampire", "danger_level": 8}

# Update row
PUT /api/tables/monsters/rows/1
{"danger_level": 9}

# Delete row
DELETE /api/tables/monsters/rows/1

# Trigger sync
POST /api/sync

# Check sync status
GET /api/sync/status

JavaScript Client

Use SheetDB from any JavaScript/TypeScript frontend:

npm install sheetdb-js

Vanilla JavaScript:

import { SheetDB } from "sheetdb-js";

const db = new SheetDB("http://localhost:8000/api");

// Fluent query API
const monsters = await db
  .table("monsters")
  .filter({ danger_level__gte: 7 })
  .orderBy("name")
  .limit(10)
  .get();

// CRUD operations
await db.table("monsters").create({ name: "Vampire", danger_level: 8 });
await db.table("monsters").update(1, { danger_level: 9 });
await db.table("monsters").delete(1);

// Trigger sync
await db.sync();

React Hooks:

import { SheetDBProvider, useQuery, useMutation } from "sheetdb-js/react";

function App() {
  return (
    <SheetDBProvider apiUrl="http://localhost:8000/api">
      <MonsterList />
    </SheetDBProvider>
  );
}

function MonsterList() {
  const { data, loading, error, refetch } = useQuery("monsters", {
    filter: { danger_level__gte: 7 },
    orderBy: "name",
  });

  const { mutate: createMonster } = useMutation("monsters", "create");

  if (loading) return <div>Loading monsters...</div>;
  if (error) return <div>Error: {error.message}</div>;

  return (
    <div>
      <h1>Dangerous Monsters ({data.length})</h1>
      {data.map((monster) => (
        <div key={monster.id}>
          {monster.name} - Danger Level: {monster.danger_level}
        </div>
      ))}
      <button
        onClick={() => createMonster({ name: "Vampire", danger_level: 8 })}
      >
        Add Monster
      </button>
      <button onClick={refetch}>Refresh</button>
    </div>
  );
}

CLI Tool

SheetDB includes a command-line tool for quick setup and management:

# Install SheetDB with CLI
pip install -e .

# Initialize from public URL (no auth required)
sheetdb init --url "https://docs.google.com/spreadsheets/d/YOUR_ID/"

# Initialize with authentication
sheetdb init --credentials creds.json --spreadsheet-id YOUR_ID

# Sync data
sheetdb sync --url "https://docs.google.com/spreadsheets/d/YOUR_ID/"

# Generate schema
sheetdb schema generate --credentials creds.json --spreadsheet-id YOUR_ID

# Get help
sheetdb --help

For complete CLI documentation, see docs/CLI_USAGE.md.

Using the ORM

SheetDB provides a Django-like ORM for working with your spreadsheet data. The ORM infrastructure is provided by sheetdb, while model definitions live in your application code (e.g., Django app).

Important: SheetDB does not support Django migrations. The schema is defined in sheets_schema.yaml and Google Sheets is the source of truth. Models simply map to existing sheets - they don't create or modify the sheet structure.

Defining Models

from sheetdb import Model

class Monster(Model):
    class Meta:
        table_name = "monsters"
        primary_key = "id"

class Hunter(Model):
    class Meta:
        table_name = "hunters"
        primary_key = "id"

class Hunt(Model):
    class Meta:
        table_name = "hunts"
        primary_key = "id"

    def get_monster(self):
        """Get related monster."""
        if self.monster_id:
            return Monster.objects.get(id=self.monster_id)
        return None

Configuration

from sheetdb import (
    SheetsGateway,
    SchemaConfig,
    DuckDBEngine,
    configure_orm
)

# Initialize components
gateway = SheetsGateway('credentials.json', 'your-spreadsheet-id')
schema = SchemaConfig('.kiro/specs/sheetdb/sheets_schema.yaml')
engine = DuckDBEngine(":memory:")

# Configure ORM
configure_orm(engine, gateway, schema)

# Sync data from sheets
engine.refresh_all_tables(gateway, schema)

Querying Data

# Get all records
monsters = Monster.objects.all()

# Filter records
dangerous = Monster.objects.filter(danger_level__gte=7)

# Get single record
dracula = Monster.objects.get(id=1)

# Chain filters
forest_monsters = Monster.objects.filter(habitat="Forest").filter(danger_level__gte=5)

# Order results
ordered = Monster.objects.all().order_by('-danger_level')

# Count records
count = Monster.objects.count()

Filter Operators

  • field: Exact match
  • field__gte: Greater than or equal
  • field__lte: Less than or equal
  • field__contains: String contains

Creating and Updating Records

# Create new record
monster = Monster.create(
    id=10,
    name='Vampire',
    danger_level=8,
    habitat='Crypt'
)

# Update existing record
monster.danger_level = 9
monster.save()

Relationships

# Get related objects
hunt = Hunt.objects.get(id=1)
monster = hunt.get_monster()
hunter = hunt.get_hunter()

Example Script

See examples/orm_example.py for a complete working example.

Schema Management (No Migrations)

SheetDB uses a different approach than traditional Django ORMs:

  1. Create your Google Sheet with the desired structure (columns, headers)
  2. Define the schema in sheets_schema.yaml to map sheets to tables
  3. Define models in your Django app that match the schema
  4. Validate using python validate_schema.py

To add a new table:

  1. Create a new sheet in Google Sheets
  2. Add the table definition to sheets_schema.yaml
  3. Create a model class in your Django app
  4. Run validation to ensure everything is configured correctly

Note: Django's makemigrations and migrate commands are not used with SheetDB. The schema is managed through the YAML configuration file and Google Sheets itself.

For detailed Django setup instructions, see docs/django_setup.md.

Development

This project follows spec-driven development. See .kiro/specs/sheetdb/ for:

  • requirements.md - Feature requirements
  • design.md - Architecture and design decisions
  • tasks.md - Implementation task list

Schema Validation

Before deploying or running the application, validate your schema configuration:

python validate_schema.py <config_path> <credentials_path>

Example:

python validate_schema.py .kiro/specs/sheetdb/sheets_schema.yaml path/to/credentials.json

The validation script checks:

  • All referenced sheets exist in the spreadsheet
  • All referenced columns exist in sheets
  • Primary key columns exist
  • Foreign key columns exist in both source and target tables

Exit codes:

  • 0: Validation successful
  • 1: Validation failed with errors
  • 2: Script error (missing files, invalid arguments, etc.)

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=sheetdb

# Run property-based tests
pytest -k property

Documentation

Examples

  • examples/sheetdb_example.py - Core SheetDB usage
  • examples/orm_example.py - ORM usage
  • examples/cli_example.sh - CLI commands
  • examples/rest_api_example.sh - REST API usage
  • examples/forms_writer_example.py - Forms integration
  • examples/js-client-vanilla.html - Vanilla JavaScript example
  • examples/js-client-react.jsx - React example

Performance Comparison

SheetDB vs Retrosheet

The Problem with Direct API Calls:

Retrosheet and similar libraries query Google Sheets API directly for every request. This is slow and hits rate limits quickly.

# Retrosheet approach (direct API calls)
import retrosheet

sheet = retrosheet.Retrosheet("YOUR_ID", "credentials.json")

# Every query hits the API (slow!)
for i in range(100):
    data = sheet.fetch("monsters")  # ~0.5s each = 50 seconds total!

SheetDB's Solution: Smart Caching

SheetDB syncs data once into DuckDB, then queries run in-memory at blazing speed:

# SheetDB approach (cached queries)
from sheetdb import SheetDB

db = SheetDB.from_public_url("https://docs.google.com/spreadsheets/d/YOUR_ID/")
db.sync()  # One-time sync: ~2 seconds

# Unlimited fast queries (in-memory!)
for i in range(100):
    data = db.query("SELECT * FROM monsters")  # ~0.001s each = 0.1 seconds total!

# Result: 500x faster! (50s β†’ 0.1s)

Feature Comparison:

Feature SheetDB Retrosheet Traditional DB
Query Speed ⚑ 0.001s (cached) 🐌 0.5s (API call) ⚑ 0.001s
Relational Queries βœ… Full SQL (joins, aggregations) ❌ No joins βœ… Full SQL
Rate Limits βœ… Unlimited reads (public mode) ❌ 100 req/100s βœ… Unlimited
Setup Time βœ… 5 minutes (public mode) ⚠️ 30 minutes (OAuth) ❌ Hours (hosting)
Hosting Cost βœ… $0 (Google's infrastructure) βœ… $0 ❌ $5-50/month
Built-in UI βœ… Google Sheets βœ… Google Sheets ❌ Build your own
Type Safety βœ… Schema validation ❌ No validation βœ… Schema enforcement
Multiple Interfaces βœ… Python, REST API, JS ❌ Python only ⚠️ Varies
Collaborative Editing βœ… Real-time in Sheets βœ… Real-time in Sheets ❌ Build your own

When to Use Each:

  • SheetDB: Prototypes, internal tools, collaborative apps, public APIs
  • Retrosheet: Simple scripts, one-off data fetches
  • Traditional DB: High-scale production apps (1M+ rows, complex transactions)

vs Traditional Databases (PostgreSQL, MySQL)

Advantages of SheetDB:

  • βœ… No Hosting - Leverage Google's infrastructure (free!)
  • βœ… Unlimited Bandwidth - Google handles all traffic
  • βœ… Built-in Admin UI - Google Sheets is your admin panel
  • βœ… Collaboration - Multiple users can edit simultaneously
  • βœ… Version History - Built-in audit trail
  • βœ… Zero Setup - Public mode works instantly
  • βœ… Familiar Interface - Everyone knows spreadsheets

When Traditional DB is Better:

  • ❌ Very large datasets (1M+ rows)
  • ❌ Complex transactions (ACID guarantees)
  • ❌ High write throughput (1000s writes/sec)
  • ❌ Advanced features (triggers, stored procedures)

vs Firebase/Supabase

Advantages of SheetDB:

  • βœ… No Vendor Lock-in - Your data lives in Google Sheets
  • βœ… Familiar Interface - Spreadsheets vs custom admin panels
  • βœ… No Learning Curve - Everyone knows how to use sheets
  • βœ… Free Tier - Google Sheets is free for most use cases
  • βœ… Offline Editing - Edit sheets offline, sync later
  • βœ… Relational Queries - Full SQL power with joins

When Firebase/Supabase is Better:

  • ❌ Real-time subscriptions (live updates)
  • ❌ Authentication/authorization (built-in)
  • ❌ File storage (built-in)
  • ❌ Serverless functions (built-in)

Use Cases

Perfect For:

  • πŸš€ Rapid Prototyping - Build MVPs in hours
  • πŸ‘₯ Internal Tools - Admin dashboards, data entry forms
  • πŸ“Š Public APIs - Serve data from published sheets
  • πŸŽ“ Learning Projects - Database concepts without infrastructure
  • 🀝 Collaborative Apps - Users edit data directly in sheets
  • πŸ“ˆ Data Analysis - SQL queries on spreadsheet data

Not Recommended For:

  • ❌ High-scale production (1M+ rows)
  • ❌ Real-time gaming (sub-100ms latency)
  • ❌ Financial transactions (ACID guarantees)
  • ❌ High write throughput (1000s writes/sec)

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published