Selfdev-Speech

This repo provides backend for Speech-to-text and Text-to-speech services.

The project is based on Speaches.

Speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. Speaches project aims to be Ollama, but for TTS/STT models.

Run stack

docker compose build
docker compose up

You can open the web ui at: http://localhost:8372.

Download Models

The script ./download.sh can download models upon container start.

You can also download models manually:

# STT:
docker-compose exec selfdev-speech speaches-cli registry ls --task automatic-speech-recognition
docker-compose exec selfdev-speech speaches-cli model download Systran/faster-distil-whisper-small.en
docker-compose exec selfdev-speech speaches-cli model ls --task text-to-speech

# TTS:
docker-compose exec selfdev-speech uvx speaches-cli registry ls --task text-to-speech
docker-compose exec selfdev-speech uvx speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
docker-compose exec selfdev-speech uvx speaches-cli model ls --task text-to-speech

Use

export SPEACHES_BASE_URL="http://localhost:8372"

# STT:
export MODEL_ID="Systran/faster-distil-whisper-small.en"
curl -s "$SPEACHES_BASE_URL/v1/audio/transcriptions" -F "file=@audio.webm" -F "model=$MODEL_ID"

# TTS:
export MODEL_ID="speaches-ai/Kokoro-82M-v1.0-ONNX"
export VOICE_ID="af_heart"
curl "$SPEACHES_BASE_URL/v1/audio/speech" -s -H "Content-Type: application/json" \
  --output audio.mp3 \
  --data @- << EOF
{
  "input": "Hello World!",
  "model": "$MODEL_ID",
  "voice": "$VOICE_ID"
}
EOF

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE-AGPL-3.0		LICENSE-AGPL-3.0
LICENSE-Apache-2.0-NC		LICENSE-Apache-2.0-NC
LICENSE-COMMERCIAL		LICENSE-COMMERCIAL
README.md		README.md
audio.webm		audio.webm
docker-compose.yml		docker-compose.yml
download.sh		download.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Selfdev-Speech

Run stack

Download Models

Use

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Licenses found

vuics/selfdev-speech

Folders and files

Latest commit

History

Repository files navigation

Selfdev-Speech

Run stack

Download Models

Use

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages