This project is a Byte Pair Encoding (BPE) tokenizer written in Zig that tokenizes text and calculates the cost across various LLM providers.
It reads input from src/prompt.txt, performs BPE tokenization, and displays a comprehensive pricing table for popular language models.
- Pure Zig 0.15 implementation (no dependencies outside
std) - BPE Tokenization:
- Iteratively finds and merges most frequent adjacent byte pairs
- Stops when no pair occurs more than once
- ANSI-colored output for token visualization
- LLM Pricing Calculator:
- Calculates prompt costs
- Displays cost per prompt and price per million tokens
- Reads input from
src/prompt.txtfile
Create a file src/prompt.txt with your text:
This sentence must be tokenized.
Run:
zig build runOutput:
To add new LLM models, edit the models array in src/main.zig:
const models = [_]Model{
.{ .name = "Your Model Name", .price_per_million = 0.50 },
// ... other models
};- Allow reading text from a file instead of hardcoding
- Command-line arguments for file input