🎙️ Live Translation (Rust + GPT OpenAI)

Finish speaking → instantly translated. Built with Rust for steady, low latency. Powered by GPT-5 for context-aware accuracy.

Indonesian → 日本語 on the fly • Q&A in Deutsch • Recap in English • One click to 한국어 • Also available: العربية, Français, Nederlands, Русский, Español

🎥 Demo Videos

🚀 Latest Update — October 14, 2025

Now supports automatic simultaneous translation into four languages at once. Perfect for multilingual conferences and hybrid events where real-time comprehension matters most.

⚙️ Key Features

🌐 Multi-language output (4 languages simultaneously)
⚡ Low-latency speech-to-text and translation pipeline
🧩 Customizable language sets (from 10+ available)
🔄 Stable for long live sessions
🖥️ Easy integration with event displays or streaming overlays

🏛️ Ideal Use Cases

International seminars & conferences
Corporate training sessions
Academic lectures & global classrooms
Live streaming with multilingual audiences
Religious & community events

✨ What This Project Does

Imagine a mixed-language room — Japan in front, Europe in the middle, the Middle East at the back — and you’re speaking Indonesian. As your first sentence ends, Japanese text instantly appears on the screen. A German engineer asks a question — you reply in your language; Deutsch captions flow without pause. The moderator wants a recap in English — done. A participant requests 한국어 — one click.

🎯 No device juggling. No awkward start/stop.

Why It Works

🦀 Rust keeps the audio → text → translation pipeline fast and predictable.
🧠 GPT-5 understands context, tone, and technical terms, producing natural translations.
🎤 Auto end-of-utterance detection — translations appear as soon as you finish speaking.

🔑 Core Features

True live captions — low, consistent latency from mic → screen
Multi-language output — render one or many target languages at once
Context & glossary-aware — supports per-session vocabulary
Auto end-of-speech (VAD) — no manual start/stop
Web UI — browser mic capture, real-time captions, instant language switching
Stateless API — embeddable in meeting/presenter tools
Production-ready — structured logs, graceful shutdown, configurable timeouts

🧱 Architecture Overview

[Browser Mic]
   |
   |  PCM chunks over WebSocket
   v
[Rust Server]
  ├─ VAD (end-of-utterance detection)
  ├─ ASR (speech → text) via GPT-5
  ├─ MT  (text → target languages) via GPT-5
  └─ Caption bus (fan-out to connected clients)
   |
   v
[Web Clients / Screens]  ←— subscribe → render captions in real time

🛠 Tech Stack

Component	Technology
Language	Rust (async via Tokio)
Web Framework	Axum (HTTP + WebSocket)
Audio I/O	Web Audio API (getUserMedia → WS → server)
VAD	Lightweight energy-based detector (pluggable)
LLM	GPT-5 (ASR + translation)
Build/Run	Cargo / Docker

You can swap or extend the VAD, ASR, or MT layers with other providers.

🚀 Quick Start

Requirements

🦀 Rust (stable)
🔑 OpenAI API key with GPT-5 access

1️⃣ Configure Environment

Create a .env file in your project root:

OPENAI_API_KEY=sk-...
REALTIME_MODEL=gpt-4o-realtime-preview
BASE_URL=http://localhost:8080
PORT=8080

2️⃣ Run Locally (Cargo)

cargo run --release

Server will start at http://localhost:8080.

3️⃣ Open the Web UI

Visit http://localhost:8080/
Allow microphone access
Choose target languages
Start speaking Indonesian
Captions appear instantly at the end of each utterance

🐳 Deployment with Docker

1️⃣ Build Docker Image

docker build -t livetranslation:latest .

2️⃣ Run the Container

docker run -d \
  --name livetranslation \
  -p 8080:8080 \
  -e OPENAI_API_KEY=sk-yourkey \
  -e REALTIME_MODEL=gpt-4o-realtime-preview \
  livetranslation:latest

The server will be available at http://localhost:8080.

3️⃣ Docker Compose (optional)

Create a docker-compose.yml:

version: "3.8"
services:
  livetranslation:
    build: .
    container_name: livetranslation
    ports:
      - "8080:8080"
    environment:
      OPENAI_API_KEY: "sk-yourkey"
      REALTIME_MODEL: "gpt-4o-realtime-preview"
    restart: unless-stopped

Then run:

docker compose up -d

🧩 How It Works (Pipeline Details)

Audio Stream — Browser sends 16-bit PCM fragments via WebSocket.
VAD — Detects end-of-speech boundaries.
ASR — GPT-5 converts speech → text.
MT — GPT-5 translates text into multiple target languages concurrently.
Delivery — Each connected client (stage display, audience screen, recorder) receives caption payloads.

Latency Optimization

Use small audio frames and early VAD triggers for faster response.
Back-pressure ensures smooth performance under load.
Per-language fan-out handled concurrently for minimal lag.

🧪 Local CLI Test

cargo run -- --cli

Outputs the recognized Indonesian text and translations in the terminal.

💸 Costs & Limits

Each utterance triggers ASR + translation per language.
Shorter utterances increase API calls but improve perceived latency.
You can batch micro-utterances with a short buffer delay for cost efficiency.

🔐 Privacy & Security

Audio processed in memory only — no persistence by default.
Logs include only timing and size metadata unless explicitly enabled.
Add auth for production (tokens, origin allowlists, rate limits).

🗺️ Roadmap

Speaker labels / diarization (meeting mode)
Per-language screen styling & large-font stage mode
Translation memory + domain glossary upload
Optional TTS output per language
Recording & export (SRT/VTT)

🤝 Contributing

Contributions welcome! Please:

Describe your use case and environment.
Add tests for any new core logic.
Keep latency metrics green. ✅

📄 License

MIT License — see LICENSE.

🙏 Acknowledgements

🦀 Rust community for robust async foundations.
🤖 OpenAI GPT-5 for accurate, context-aware ASR and translation.

TL;DR

Live Translation — finish speaking, instantly translated. Rust for speed and reliability. GPT-5 for natural tone and accuracy. Perfect for conferences, onboarding, classrooms, and global teams.

👤 Author

Kukuh Tripamungkas Wicaksono (Kukuh TW) 📧 kukuhtw@gmail.com 📱 https://wa.me/628129893706 🔗 LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
static		static
.env		.env
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
mysignaturee.txt		mysignaturee.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Live Translation (Rust + GPT OpenAI)

🚀 Latest Update — October 14, 2025

⚙️ Key Features

🏛️ Ideal Use Cases

✨ What This Project Does

🔑 Core Features

🧱 Architecture Overview

🛠 Tech Stack

🚀 Quick Start

Requirements

1️⃣ Configure Environment

2️⃣ Run Locally (Cargo)

3️⃣ Open the Web UI

🐳 Deployment with Docker

1️⃣ Build Docker Image

2️⃣ Run the Container

3️⃣ Docker Compose (optional)

🧩 How It Works (Pipeline Details)

Latency Optimization

🧪 Local CLI Test

💸 Costs & Limits

🔐 Privacy & Security

🗺️ Roadmap

🤝 Contributing

📄 License

🙏 Acknowledgements

TL;DR

👤 Author

About

Uh oh!

Releases

Packages

Languages

License

kukuhtw/livetranslation_rust

Folders and files

Latest commit

History

Repository files navigation

🎙️ Live Translation (Rust + GPT OpenAI)

🚀 Latest Update — October 14, 2025

⚙️ Key Features

🏛️ Ideal Use Cases

✨ What This Project Does

🔑 Core Features

🧱 Architecture Overview

🛠 Tech Stack

🚀 Quick Start

Requirements

1️⃣ Configure Environment

2️⃣ Run Locally (Cargo)

3️⃣ Open the Web UI

🐳 Deployment with Docker

1️⃣ Build Docker Image

2️⃣ Run the Container

3️⃣ Docker Compose (optional)

🧩 How It Works (Pipeline Details)

Latency Optimization

🧪 Local CLI Test

💸 Costs & Limits

🔐 Privacy & Security

🗺️ Roadmap

🤝 Contributing

📄 License

🙏 Acknowledgements

TL;DR

👤 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages