A simple sqlite store for raw data.
Its provide a thin rest api for differents namespaces. Each namespace belongs to a sqlite's db.
This project pretends to be simple. Having a sqlite store allows move files directly when needed.
A fileserver is embebed for that purpose, and the option to take a snapshot for each namespace.
Future work could include a sharding strategy to split load, and a index for text data.
✨ New If -stream option is selected, it will stream each new entry by namespace in a Redis Instance.
For small data ~1mb.
My use case is to store crawled data (~700kb), up to 500k objects per namespace.
Bigger files are discourage. Each file is loaded in memory for each request. SQLite doesn't provide a way to stream data directly.
- A
defaultnamespace is created when started. - No auth, reverse proxy auth is easy to be included using nginx. In the future could be included as a auth endpoint in the app.
- Every object is compressed and decompressed using zlib.
-streamcould be used to stream each new object to redis.
Also check the default config values:
var (
listenAddr = Env("RD_LISTEN_ADDR", ":6667")
nsDir = Env("RD_NS_DIR", "data/")
redisAddr = Env("RD_REDIS_ADDR", "localhost:6379")
redisPass = Env("RD_REDIS_PASS", "")
redisDB = Env("RD_REDIS_DB", "0")
redisNS = Env("RD_REDIS_NS", "RD")
streamNo = Env("RD_STREAM", "false")
eStreamLimit = Env("RD_STREAM_LIMIT", "1000")
)
Data Schema V1:
CREATE TABLE IF NOT EXISTS data (
data_id TEXT PRIMARY KEY,
data BLOB NOT NULL,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS created_ix ON data(created_at);
-
GET /status
- 200 if everything is ok
-
GET /files
- Fileserver. List all the sqlite files for each namespace
-
POST /v1/namespace
- Create a namespace { "name" : "my_namespace" }
-
GET /v1/namespace
- List namespaces
-
GET /v1/namespace/{namespace}/_backup
- Takes a backup, This action is SYNC, so consider the time of the request for big files ( > 6 GB)
-
GET /v1/data/{namespace}
- List files as an API, base64 encoded data and uncompressed.
-
GET /v1/data/{namespace}/_list
- List only IDs and created fields
-
PUT /{namespace}/{key}
- 201 if created, anything else = fail
- If the path already exist, the data will be replaced with the new sent.
-
POST /{namespace}/{key}
- 201 if created, anything else = fail
-
DELETE /{namespace}/{key}
- 200 Deleted
-
GET /{namespace}/ will be removed in the next release
- List files as an API, base64 encoded data.
- This should be moved to the API endpoints. Filter options will be included in future versions.
Running the service:
rawdata volume -help
Usage of volume:
-listen string
Address to listen (default ":6667")
-namespace string
Namespace dir (default "data/")
-redis-ns string
Which key namespace use for redis (default "RD")
-stream
Enable stream data to redis
-stream-limit string
How many message by stream (default "1000")
rawdata volume
2023/04/05 17:48:50 new.go:57: NS Loading for default
2023/04/05 17:48:50 new.go:102: Starting from /home/nuxion/Projects/algorinfo/rawdata
2023/04/05 17:48:50 new.go:106: With stream disabled
2023/04/05 17:48:50 volume.go:100: Running web mode on: :6667
By default default namespace is created:
Create or update a object
curl -v -L -X PUT -d bigswag localhost:6667/default/wehave
Create a new object
curl -v -L -X PUT -d bigswag localhost:6667/default/wehave
Get object (uncompressed original format)
curl -v -L localhost:6667/default/wehave
Delete object
curl -v -L -X DELETE localhost:6667/default/wehave
-
https://github.com/geohot/minikeyvalue It use a Go Web server as coordinator that manage keys in a leveldb store and redirect each request to a Nginx server used as volume. The problem is the same than before, is not suitable for small data in the long term, but I take the idea to have a easy way to restore the data if something fails.
-
Some time later, I found that the previous work was based on this paper
- Migrate to sqlc
- Queue for intensive inserts (using channels)
- Worker to read data from redis (?)
- JWT Auth
- Automatic backup to Object store
- Streaming response of a list of objects from a namespace
- Store/Bucket struct which performs all the actions related to the operations on objects
- general config sqlite store for the app ?
- Optional WAL option for stores
- Locks
- Notifications through webservices (using simple pub/sub redis) per namespace
- Backup should be a go routine, lock namespace for writes when starting, and emit notificatiosn when ending. (http 423 should be returned in POST endpoints)
- Building your data lake
- Data Lake
- Data block in HDFS
- Facebook photo storage
- The anatomy of a Large-Scale web search engine
- When to use sqlite
- 35% faster than the filesystem
Redis INFO