Bio-Block: Secure Document Management System

Bio-Block is a decentralized document management system that leverages blockchain technology, IPFS (InterPlanetary File System), and vector databases to provide secure, verifiable, and privacy-preserving document storage and management for healthcare data.

Key Features

🔐 Advanced Security & Privacy

Streaming Encryption: Memory-safe encryption for large files (>5MB) with real-time progress tracking
PHI Anonymization: Automatic anonymization of Personal Health Information in Excel and image files
Blockchain Verification: Document hashes stored on Ethereum for tamper-proof verification
Decentralized Storage: IPFS-based storage with encryption and secure access controls

🏥 Healthcare Data Management

Multi-format Support: Excel (.xlsx, .xls), CSV, ODS, TSV, and other spreadsheet formats (.xlsm, .xlsb), plus medical images (.jpg, .jpeg, .png)
Smart Anonymization: Wallet-based hashing for personal data, OCR+NLP for medical images
Preview System: Free 5% preview of Excel data for evaluation before purchase
Metadata Collection: Comprehensive tagging with disease types, demographics, and data sources

🔍 Intelligent Search & Discovery

Vector Search: Natural language queries using ChromaDB for semantic document discovery
Advanced Filtering: Filter by data type, gender, source, file type, and other metadata
Combined Search: Semantic search enhanced with metadata filters for precise results

💰 Marketplace & Economics

Document Marketplace: Set prices and earn from document sales
Earnings Tracking: Real-time earnings display and withdrawal functionality
Preview Downloads: Free evaluation of data quality before purchase
Wallet Integration: Seamless Ethereum wallet connectivity

Architecture

Bio-Block follows a microservices architecture with separate frontend and backend services:

Project Structure

healthy/
├── prototype/                 # React frontend application
│   ├── src/
│   │   ├── App.js            # Main application with navigation
│   │   ├── contractService.js # Smart contract interactions
│   │   ├── upload_data.js    # Document upload interface with streaming encryption
│   │   ├── search_data.js    # Document search interface with smart decryption
│   │   ├── Dashboard.js      # User dashboard with earnings and document management
│   │   ├── encryptionUtils.js # Traditional document encryption utilities
│   │   ├── utils/
│   │   │   └── streamingEncryption.js # Memory-safe streaming encryption for large files
│   │   └── DocumentStorage.sol # Smart contract source
│   └── package.json
├── python_backend/           # FastAPI service
│   ├── main.py               # ChromaDB, search endpoints, and image PHI anonymization
│   ├── requirements.txt
│   ├── vercel.json           # Vercel deployment config
│   ├── tests/                # Python API test suite
│   │   ├── test_api.py       # Comprehensive API tests with unittest
│   │   └── test.jpg          # Test image for anonymization tests
│   └── chroma_db/            # Local ChromaDB storage
├── javascript_backend/        # Express.js API server
│   ├── controllers/          # Business logic controllers
│   │   ├── anonymizeController.js # Excel file anonymization logic
│   │   ├── ipfsController.js      # IPFS interaction logic
│   │   └── healthController.js    # Health check logic
│   ├── routes/              # API route definitions
│   │   ├── anonymize.js     # Excel anonymization routes
│   │   ├── ipfs.js          # IPFS routes
│   │   └── health.js        # Health check routes
│   ├── tests/               # JavaScript API test suite
│   │   ├── api.test.js      # Mocha/Chai/SuperTest API tests
│   │   └── test.xlsx        # Test Excel file with PHI data
│   ├── server.js            # Main server file
│   ├── vercel.json          # Vercel deployment config
│   └── package.json
└── README.md

Frontend (React)

Modern UI built with React.js and Tailwind CSS
Interactive progress tracking for uploads and encryption
Wallet integration for Ethereum connectivity
Document marketplace and earnings dashboard

JavaScript Backend (Express.js - Port 3001)

Excel file processing and PHI anonymization
IPFS file upload handling
Preview generation for Excel files
RESTful API with MVC architecture

Python Backend (FastAPI - Port 3002)

Vector database operations using ChromaDB
Image PHI anonymization using Presidio/OCR
Semantic search and document filtering
Advanced ML-based text processing

Smart Contracts (Solidity)

Document verification on Ethereum blockchain
Marketplace functionality for document sales
Earnings tracking and withdrawal system

Quick Start

Prerequisites

Node.js (v14+)
Python (v3.8+)
MetaMask or Ethereum wallet
Git

Installation

Clone and setup

git clone https://github.com/yourusername/bio-block.git
cd bio-block

Backend setup

# Python backend
cd python_backend
pip install -r requirements.txt
python -m spacy download en_core_web_lg

# JavaScript backend
cd ../javascript_backend
npm install

# Frontend
cd ../prototype
npm install

Environment configuration

Create .env in prototype/:

REACT_APP_PINATA_JWT=your_pinata_jwt_key
REACT_APP_ENCRYPTION_KEY=your_32_byte_encryption_key
REACT_APP_PYTHON_BACKEND_URL=http://localhost:3002
REACT_APP_JS_BACKEND_URL=http://localhost:3001

Run the application

# Terminal 1: Python backend
cd python_backend && uvicorn main:app --reload --port 3002

# Terminal 2: JavaScript backend  
cd javascript_backend && node server.js

# Terminal 3: Frontend
cd prototype && npm start

Access the application at http://localhost:3000

API Endpoints

JavaScript Backend (Express.js)

🔧 Local URL: http://localhost:3001

GET / - Root endpoint with API information
GET /api/health - Health check endpoint to verify server status
POST /api/anonymize - Anonymize PHI (Personal Health Information) in spreadsheet files with optional preview generation
- Input: Spreadsheet file (.xlsx, .xls, .csv, .ods, .tsv, .xlsm, .xlsb) via multipart form data
- Optional: Wallet address for personal data anonymization
- Optional: generatePreview=true parameter to create 5% sample preview
- Output: Full anonymized spreadsheet file, and preview file (if requested) containing first 5% of rows (min 5, max 50)
POST /api/ipfs/upload - Upload a file to IPFS
- Input: file via multipart form data
- Output: IPFS hash of the uploaded file
Organized with MVC architecture (controllers and routes)

Python Backend (FastAPI)

🔧 Local URL: http://localhost:3002

GET / - Health check and API information
POST /store - Store document summaries and metadata in ChromaDB
POST /search - Search documents using natural language queries
POST /filter - Filter documents by metadata criteria (data type, gender, data source, file type)
POST /search_with_filter - Combined semantic search with metadata filtering
POST /anonymize_image - Anonymize PHI in medical images using Presidio ML models with OCR+spaCy fallback
- Input: Image file (.jpg, .jpeg, .png) via multipart form data
- Output: Anonymized image with advanced ML-based PHI redaction
- Method: Presidio (primary), Tesseract OCR + spaCy NLP (fallback)
Returns similarity scores, document metadata, and summaries

Example API Usage

# Health check - JavaScript backend
curl http://localhost:3001/api/health

# Health check - Python backend
curl http://localhost:3002/

# Search documents (POST request)
curl -X POST http://localhost:3002/search \
  -H "Content-Type: application/json" \
  -d '{"query": "patient information", "k": 5}'

# Filter documents by metadata
curl -X POST http://localhost:3002/filter \
  -H "Content-Type: application/json" \
  -d '{"filters": {"dataType": "Personal", "gender": "Male"}, "n_results": 10}'

# Combined search with filters
curl -X POST http://localhost:3002/search_with_filter \
  -H "Content-Type: application/json" \
  -d '{"query": "diabetes research", "filters": {"dataType": "Institution", "dataSource": "Hospital"}, "n_results": 5}'

# Anonymize medical image using Presidio ML models
curl -X POST http://localhost:3002/anonymize_image \
  -F "file=@medical_scan.jpg"

# Test spreadsheet anonymization (JavaScript backend)
curl -X POST http://localhost:3001/api/anonymize \
  -F "file=@sample_data.xlsx" \
  -F "generatePreview=true"

# Test with different spreadsheet formats
curl -X POST http://localhost:3001/api/anonymize \
  -F "file=@test_sample.csv" \
  -F "generatePreview=true"

curl -X POST http://localhost:3001/api/anonymize \
  -F "file=@test_sample.tsv" \
  -F "generatePreview=true"

Testing

Bio-Block includes comprehensive test suites for both backend services:

Running Tests

# Python backend tests (6 tests)
cd python_backend && python tests/test_api.py

# JavaScript backend tests (3 tests)
cd javascript_backend && npm test

Test Coverage

Python Backend: Store, search, filter, anonymize image endpoints
JavaScript Backend: Health check, Excel anonymization, IPFS upload
Automated Test Data: Dynamic generation of test files with sample data

Contributing

We welcome contributions! Here's how to get started:

Development Setup

Fork the repository
Follow the Quick Start guide
Create a feature branch: git checkout -b feature/your-feature
Make your changes and test thoroughly
Submit a pull request

Code Style

Follow existing code patterns
Add tests for new features
Update documentation as needed
Ensure all tests pass before submitting

Project Structure

healthy/
├── prototype/                 # React frontend
├── python_backend/           # FastAPI service
├── javascript_backend/       # Express.js API
└── README.md                 # This file

For detailed setup instructions, see the Quick Start section above.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Built with these amazing technologies:

React - Frontend framework
FastAPI - Python backend
Express.js - JavaScript backend
ChromaDB - Vector database
IPFS - Decentralized storage
Ethereum - Blockchain platform

Bio-Block - Secure, decentralized healthcare document management for the Web3 era.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github		.github
.vscode		.vscode
javascript_backend		javascript_backend
prototype		prototype
python_backend		python_backend
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
HYBRID_ARCHITECTURE.md		HYBRID_ARCHITECTURE.md
LICENSE		LICENSE
README.md		README.md
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bio-Block: Secure Document Management System

Key Features

🔐 Advanced Security & Privacy

🏥 Healthcare Data Management

🔍 Intelligent Search & Discovery

💰 Marketplace & Economics

Architecture

Project Structure

Frontend (React)

JavaScript Backend (Express.js - Port 3001)

Python Backend (FastAPI - Port 3002)

Smart Contracts (Solidity)

Quick Start

Prerequisites

Installation

API Endpoints

JavaScript Backend (Express.js)

Python Backend (FastAPI)

Example API Usage

Testing

Running Tests

Test Coverage

Contributing

Development Setup

Code Style

Project Structure

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

License

healthyinc/bio-block

Folders and files

Latest commit

History

Repository files navigation

Bio-Block: Secure Document Management System

Key Features

🔐 Advanced Security & Privacy

🏥 Healthcare Data Management

🔍 Intelligent Search & Discovery

💰 Marketplace & Economics

Architecture

Project Structure

Frontend (React)

JavaScript Backend (Express.js - Port 3001)

Python Backend (FastAPI - Port 3002)

Smart Contracts (Solidity)

Quick Start

Prerequisites

Installation

API Endpoints

JavaScript Backend (Express.js)

Python Backend (FastAPI)

Example API Usage

Testing

Running Tests

Test Coverage

Contributing

Development Setup

Code Style

Project Structure

License

Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages