Skip to content

kukuhtw/livetranslation_rust

Repository files navigation


🎙️ Live Translation (Rust + GPT OpenAI)

Finish speaking → instantly translated. Built with Rust for steady, low latency. Powered by GPT-5 for context-aware accuracy.

Indonesian → 日本語 on the fly • Q&A in Deutsch • Recap in English • One click to 한국어 • Also available: العربية, Français, Nederlands, Русский, Español

🎥 Demo Videos


🚀 Latest Update — October 14, 2025

Now supports automatic simultaneous translation into four languages at once. Perfect for multilingual conferences and hybrid events where real-time comprehension matters most.


⚙️ Key Features

  • 🌐 Multi-language output (4 languages simultaneously)
  • ⚡ Low-latency speech-to-text and translation pipeline
  • 🧩 Customizable language sets (from 10+ available)
  • 🔄 Stable for long live sessions
  • 🖥️ Easy integration with event displays or streaming overlays

🏛️ Ideal Use Cases

  • International seminars & conferences
  • Corporate training sessions
  • Academic lectures & global classrooms
  • Live streaming with multilingual audiences
  • Religious & community events

✨ What This Project Does

Imagine a mixed-language room — Japan in front, Europe in the middle, the Middle East at the back — and you’re speaking Indonesian. As your first sentence ends, Japanese text instantly appears on the screen. A German engineer asks a question — you reply in your language; Deutsch captions flow without pause. The moderator wants a recap in English — done. A participant requests 한국어 — one click.

🎯 No device juggling. No awkward start/stop.

Why It Works

  • 🦀 Rust keeps the audio → text → translation pipeline fast and predictable.
  • 🧠 GPT-5 understands context, tone, and technical terms, producing natural translations.
  • 🎤 Auto end-of-utterance detection — translations appear as soon as you finish speaking.

🔑 Core Features

  • True live captions — low, consistent latency from mic → screen
  • Multi-language output — render one or many target languages at once
  • Context & glossary-aware — supports per-session vocabulary
  • Auto end-of-speech (VAD) — no manual start/stop
  • Web UI — browser mic capture, real-time captions, instant language switching
  • Stateless API — embeddable in meeting/presenter tools
  • Production-ready — structured logs, graceful shutdown, configurable timeouts

🧱 Architecture Overview

[Browser Mic]
   |
   |  PCM chunks over WebSocket
   v
[Rust Server]
  ├─ VAD (end-of-utterance detection)
  ├─ ASR (speech → text) via GPT-5
  ├─ MT  (text → target languages) via GPT-5
  └─ Caption bus (fan-out to connected clients)
   |
   v
[Web Clients / Screens]  ←— subscribe → render captions in real time

🛠 Tech Stack

Component Technology
Language Rust (async via Tokio)
Web Framework Axum (HTTP + WebSocket)
Audio I/O Web Audio API (getUserMedia → WS → server)
VAD Lightweight energy-based detector (pluggable)
LLM GPT-5 (ASR + translation)
Build/Run Cargo / Docker

You can swap or extend the VAD, ASR, or MT layers with other providers.


🚀 Quick Start

Requirements

  • 🦀 Rust (stable)
  • 🔑 OpenAI API key with GPT-5 access

1️⃣ Configure Environment

Create a .env file in your project root:

OPENAI_API_KEY=sk-...
REALTIME_MODEL=gpt-4o-realtime-preview
BASE_URL=http://localhost:8080
PORT=8080

2️⃣ Run Locally (Cargo)

cargo run --release

Server will start at http://localhost:8080.


3️⃣ Open the Web UI

  1. Visit http://localhost:8080/
  2. Allow microphone access
  3. Choose target languages
  4. Start speaking Indonesian
  5. Captions appear instantly at the end of each utterance

🐳 Deployment with Docker

1️⃣ Build Docker Image

docker build -t livetranslation:latest .

2️⃣ Run the Container

docker run -d \
  --name livetranslation \
  -p 8080:8080 \
  -e OPENAI_API_KEY=sk-yourkey \
  -e REALTIME_MODEL=gpt-4o-realtime-preview \
  livetranslation:latest

The server will be available at http://localhost:8080.


3️⃣ Docker Compose (optional)

Create a docker-compose.yml:

version: "3.8"
services:
  livetranslation:
    build: .
    container_name: livetranslation
    ports:
      - "8080:8080"
    environment:
      OPENAI_API_KEY: "sk-yourkey"
      REALTIME_MODEL: "gpt-4o-realtime-preview"
    restart: unless-stopped

Then run:

docker compose up -d

🧩 How It Works (Pipeline Details)

  1. Audio Stream — Browser sends 16-bit PCM fragments via WebSocket.
  2. VAD — Detects end-of-speech boundaries.
  3. ASR — GPT-5 converts speech → text.
  4. MT — GPT-5 translates text into multiple target languages concurrently.
  5. Delivery — Each connected client (stage display, audience screen, recorder) receives caption payloads.

Latency Optimization

  • Use small audio frames and early VAD triggers for faster response.
  • Back-pressure ensures smooth performance under load.
  • Per-language fan-out handled concurrently for minimal lag.

🧪 Local CLI Test

cargo run -- --cli

Outputs the recognized Indonesian text and translations in the terminal.


💸 Costs & Limits

  • Each utterance triggers ASR + translation per language.
  • Shorter utterances increase API calls but improve perceived latency.
  • You can batch micro-utterances with a short buffer delay for cost efficiency.

🔐 Privacy & Security

  • Audio processed in memory only — no persistence by default.
  • Logs include only timing and size metadata unless explicitly enabled.
  • Add auth for production (tokens, origin allowlists, rate limits).

🗺️ Roadmap

  • Speaker labels / diarization (meeting mode)
  • Per-language screen styling & large-font stage mode
  • Translation memory + domain glossary upload
  • Optional TTS output per language
  • Recording & export (SRT/VTT)

🤝 Contributing

Contributions welcome! Please:

  1. Describe your use case and environment.
  2. Add tests for any new core logic.
  3. Keep latency metrics green. ✅

📄 License

MIT License — see LICENSE.


🙏 Acknowledgements

  • 🦀 Rust community for robust async foundations.
  • 🤖 OpenAI GPT-5 for accurate, context-aware ASR and translation.

TL;DR

Live Translation — finish speaking, instantly translated. Rust for speed and reliability. GPT-5 for natural tone and accuracy. Perfect for conferences, onboarding, classrooms, and global teams.


👤 Author

Kukuh Tripamungkas Wicaksono (Kukuh TW) 📧 kukuhtw@gmail.com 📱 https://wa.me/628129893706 🔗 LinkedIn


About

Live Translation — finish speaking, instantly translated.” Powered by Rust for speed.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published