Finish speaking → instantly translated. Built with Rust for steady, low latency. Powered by GPT-5 for context-aware accuracy.
Indonesian → 日本語 on the fly • Q&A in Deutsch • Recap in English • One click to 한국어 • Also available: العربية, Français, Nederlands, Русский, Español
🎥 Demo Videos
Now supports automatic simultaneous translation into four languages at once. Perfect for multilingual conferences and hybrid events where real-time comprehension matters most.
- 🌐 Multi-language output (4 languages simultaneously)
- ⚡ Low-latency speech-to-text and translation pipeline
- 🧩 Customizable language sets (from 10+ available)
- 🔄 Stable for long live sessions
- 🖥️ Easy integration with event displays or streaming overlays
- International seminars & conferences
- Corporate training sessions
- Academic lectures & global classrooms
- Live streaming with multilingual audiences
- Religious & community events
Imagine a mixed-language room — Japan in front, Europe in the middle, the Middle East at the back — and you’re speaking Indonesian. As your first sentence ends, Japanese text instantly appears on the screen. A German engineer asks a question — you reply in your language; Deutsch captions flow without pause. The moderator wants a recap in English — done. A participant requests 한국어 — one click.
🎯 No device juggling. No awkward start/stop.
Why It Works
- 🦀 Rust keeps the audio → text → translation pipeline fast and predictable.
- 🧠 GPT-5 understands context, tone, and technical terms, producing natural translations.
- 🎤 Auto end-of-utterance detection — translations appear as soon as you finish speaking.
- True live captions — low, consistent latency from mic → screen
- Multi-language output — render one or many target languages at once
- Context & glossary-aware — supports per-session vocabulary
- Auto end-of-speech (VAD) — no manual start/stop
- Web UI — browser mic capture, real-time captions, instant language switching
- Stateless API — embeddable in meeting/presenter tools
- Production-ready — structured logs, graceful shutdown, configurable timeouts
[Browser Mic]
|
| PCM chunks over WebSocket
v
[Rust Server]
├─ VAD (end-of-utterance detection)
├─ ASR (speech → text) via GPT-5
├─ MT (text → target languages) via GPT-5
└─ Caption bus (fan-out to connected clients)
|
v
[Web Clients / Screens] ←— subscribe → render captions in real time
| Component | Technology |
|---|---|
| Language | Rust (async via Tokio) |
| Web Framework | Axum (HTTP + WebSocket) |
| Audio I/O | Web Audio API (getUserMedia → WS → server) |
| VAD | Lightweight energy-based detector (pluggable) |
| LLM | GPT-5 (ASR + translation) |
| Build/Run | Cargo / Docker |
You can swap or extend the VAD, ASR, or MT layers with other providers.
- 🦀 Rust (stable)
- 🔑 OpenAI API key with GPT-5 access
Create a .env file in your project root:
OPENAI_API_KEY=sk-...
REALTIME_MODEL=gpt-4o-realtime-preview
BASE_URL=http://localhost:8080
PORT=8080cargo run --releaseServer will start at http://localhost:8080.
- Visit
http://localhost:8080/ - Allow microphone access
- Choose target languages
- Start speaking Indonesian
- Captions appear instantly at the end of each utterance
docker build -t livetranslation:latest .docker run -d \
--name livetranslation \
-p 8080:8080 \
-e OPENAI_API_KEY=sk-yourkey \
-e REALTIME_MODEL=gpt-4o-realtime-preview \
livetranslation:latestThe server will be available at http://localhost:8080.
Create a docker-compose.yml:
version: "3.8"
services:
livetranslation:
build: .
container_name: livetranslation
ports:
- "8080:8080"
environment:
OPENAI_API_KEY: "sk-yourkey"
REALTIME_MODEL: "gpt-4o-realtime-preview"
restart: unless-stoppedThen run:
docker compose up -d- Audio Stream — Browser sends 16-bit PCM fragments via WebSocket.
- VAD — Detects end-of-speech boundaries.
- ASR — GPT-5 converts speech → text.
- MT — GPT-5 translates text into multiple target languages concurrently.
- Delivery — Each connected client (stage display, audience screen, recorder) receives caption payloads.
- Use small audio frames and early VAD triggers for faster response.
- Back-pressure ensures smooth performance under load.
- Per-language fan-out handled concurrently for minimal lag.
cargo run -- --cliOutputs the recognized Indonesian text and translations in the terminal.
- Each utterance triggers ASR + translation per language.
- Shorter utterances increase API calls but improve perceived latency.
- You can batch micro-utterances with a short buffer delay for cost efficiency.
- Audio processed in memory only — no persistence by default.
- Logs include only timing and size metadata unless explicitly enabled.
- Add auth for production (tokens, origin allowlists, rate limits).
- Speaker labels / diarization (meeting mode)
- Per-language screen styling & large-font stage mode
- Translation memory + domain glossary upload
- Optional TTS output per language
- Recording & export (SRT/VTT)
Contributions welcome! Please:
- Describe your use case and environment.
- Add tests for any new core logic.
- Keep latency metrics green. ✅
MIT License — see LICENSE.
- 🦀 Rust community for robust async foundations.
- 🤖 OpenAI GPT-5 for accurate, context-aware ASR and translation.
Live Translation — finish speaking, instantly translated. Rust for speed and reliability. GPT-5 for natural tone and accuracy. Perfect for conferences, onboarding, classrooms, and global teams.
Kukuh Tripamungkas Wicaksono (Kukuh TW) 📧 kukuhtw@gmail.com 📱 https://wa.me/628129893706 🔗 LinkedIn