Skip to content

BrianElionDev/Certik_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” CryptoLens Certik Scraper

Advanced cryptocurrency security data collection platform powered by CertikSkynet

Node.js Puppeteer Supabase License Maintenance PRs Welcome

๐Ÿš€ Overview

CryptoLens is a comprehensive cryptocurrency security intelligence platform that automatically scrapes, analyzes, and stores security metrics from CertikSkynet for the top 1000 cryptocurrencies. Built with enterprise-grade reliability and automated scheduling.

๐ŸŽฏ Why CryptoLens?

In the rapidly evolving crypto landscape, security intelligence is crucial. CryptoLens bridges the gap between raw security data and actionable insights by providing:

  • ๐Ÿ”„ Real-time Security Monitoring for top 1000 cryptocurrencies
  • ๐Ÿ“Š Automated Data Collection with 85-90% success rate
  • ๐Ÿ›ก๏ธ Enterprise-grade Reliability with advanced error handling
  • โšก Optimized Performance processing 1000+ coins in 24-27 hours

โœจ Key Features

  • ๐Ÿ›ก๏ธ Security Scoring - Comprehensive security metrics and ratings
  • ๐Ÿ“Š Community Analytics - Twitter engagement and sentiment analysis
  • ๐Ÿ’ฐ Financial Data - Market cap, volume, and price tracking
  • ๐Ÿ”„ Automated Scheduling - Smart cron-based updates every 48 hours
  • ๐ŸŽฏ Intelligent Targeting - Only scrapes expired data to optimize resources
  • ๐Ÿ”’ Overlap Protection - Prevents concurrent scraping conflicts
  • ๐Ÿ“ˆ Scalable Architecture - Handles 1000+ coins efficiently
  • ๐Ÿ› ๏ธ Error Recovery - Advanced retry mechanisms and fault tolerance

๐Ÿ“‹ Prerequisites

  • Node.js 18+
  • pnpm package manager
  • Supabase account and database
  • 4GB+ RAM (for browser automation)

โšก Quick Start

1. Clone & Install

git clone https://github.com/yourusername/cryptolens-certik-scraper.git
cd scrape_certik
pnpm install

2. Environment Setup

# Copy environment template
cp .env.example .env

# Add your credentials
SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key

3. Database Initialization

# Upload top 1000 coins from CoinGecko
pnpm run upload-coins

4. Start Scraping

# One-time manual scrape
pnpm run scrape-certik

# Or start automated scheduler
pnpm run start-cron

๐ŸŽฎ Available Scripts

Command Description Use Case
pnpm run upload-coins ๐Ÿ“ฅ Fetch & update top 1000 coins from CoinGecko Initial setup, monthly updates
pnpm run scrape-certik ๐Ÿ” Manual scraping session Testing, immediate data needs
pnpm run start-cron โฐ Start automated scheduler Production deployment

๐Ÿ—๏ธ Architecture

Data Flow

CoinGecko API โ†’ Supabase DB โ†’ CertikSkynet Scraper โ†’ Enriched Database
     โ†“              โ†“                โ†“                      โ†“
  Top 1000      Coin List      Security Data          Complete Dataset

Core Components

  • ๐Ÿ•ท๏ธ Scraper Engine (Scraper.js) - Puppeteer-based web scraping
  • ๐Ÿ“Š Database Layer (certikScraperSupabase.js) - Supabase integration
  • โฐ Scheduler (cronScraper.js) - Automated task management
  • ๐Ÿช™ Coin Management (uploadCoinsToSupabase.js) - CoinGecko sync

๐Ÿ“Š Database Schema

Table: certik_coins
โ”œโ”€โ”€ coin_gecko_id (TEXT, UNIQUE) - CoinGecko identifier
โ”œโ”€โ”€ symbol (TEXT) - Cryptocurrency symbol
โ”œโ”€โ”€ name (TEXT) - Full coin name
โ”œโ”€โ”€ market_cap_rank (INTEGER) - Market ranking
โ”œโ”€โ”€ certik_data (JSONB) - Complete security metrics
โ”œโ”€โ”€ certik_last_updated (TIMESTAMPTZ) - Last scrape time
โ”œโ”€โ”€ certik_next_update (TIMESTAMPTZ) - Next scheduled update
โ”œโ”€โ”€ certik_scrape_attempts (INTEGER) - Retry counter
โ””โ”€โ”€ certik_last_error (TEXT) - Error logging

๐Ÿ”ง Configuration

Scraping Parameters

const scraper = new CertikScraperSupabase({
  batchSize: 3, // Parallel scraping limit
  maxRetries: 3, // Retry attempts per coin
  updateInterval: 48, // Hours between updates
});

Scheduling Options

// Check every 12 hours for expired coins
cron.schedule("0 */12 * * *", scraperFunction);

๐Ÿ“ˆ Performance Metrics

Metric Value Notes
Scraping Speed ~80-90 seconds/coin Including retries & waits
Success Rate 85-90% Varies by coin availability
Memory Usage ~500MB-1GB Per batch of 3 coins
Full Cycle Time ~24-27 hours For 1000 coins
Data Freshness 48 hours Configurable interval

๐Ÿ›ก๏ธ Production Deployment

Using PM2 (Recommended)

# Install PM2 globally
npm install -g pm2

# Start scheduler
pm2 start cronScraper.js --name "certik-cron"

# Enable auto-restart on boot
pm2 startup
pm2 save

# Monitor
pm2 status
pm2 logs certik-cron

Docker Deployment

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install -g pnpm && pnpm install
COPY . .
CMD ["pnpm", "run", "start-cron"]

๐Ÿ” Monitoring & Debugging

Log Analysis

# Real-time monitoring
pm2 logs certik-cron --lines 100

# Check scraping status
pm2 monit

Common Issues & Solutions

Issue Cause Solution
Timeout Errors Slow page loading Increase timeout values
Rate Limiting Too many requests Reduce batch size
Memory Leaks Long-running sessions Restart PM2 process
Stale Data Missing updates Check cron schedule

๐Ÿ“Š Data Output Example

{
  "project": "Bitcoin",
  "securityScores": {
    "averageScore": "97.53",
    "additionalMetrics": [
      { "label": "Security Rank", "value": "1" },
      { "label": "Community Trust", "value": "High" }
    ]
  },
  "communityEngagement": [
    { "label": "Twitter Followers (24h)", "value": "1.2M" },
    { "label": "Twitter Activity Indicator", "value": "High" }
  ],
  "financialData": {
    "metrics": [
      { "label": "Market Cap", "value": "$1.2T" },
      { "label": "Volume (24h)", "value": "$15.3B" }
    ],
    "dailyInflows": [{ "label": "Net Inflow", "value": "+$127M" }]
  }
}

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow ESLint configuration
  • Add tests for new features
  • Update documentation
  • Use conventional commit messages

๐Ÿ“„ License

This project is licensed under the ISC License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • CertikSkynet - Security data provider
  • CoinGecko - Cryptocurrency market data
  • Supabase - Database and backend services
  • Puppeteer - Web scraping framework

๐Ÿšจ Important Notes

โš ๏ธ Rate Limiting & Ethics

  • Respect CertikSkynet's terms of service
  • Use reasonable delays between requests
  • Don't overload their servers - default settings are optimized
  • For commercial use, consider reaching out to Certik directly

๐Ÿ”’ Security Considerations

  • Keep your .env file secure and never commit it
  • Use strong Supabase credentials
  • Monitor your scraping logs for unusual activity
  • Regular database backups are recommended

๐Ÿ“ž Support


โญ Star this repo if it helped you! โญ

Made with โค๏ธ for the crypto community

About

Puppeteer-based scraper for CertiK security scores and community data across 1000+ crypto tokens.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published