OmniVoice Amharic — Open Voice AI for 60M Speakers

Part of Voices For All — an open initiative to build speech AI for every language, starting with those left behind by Big Tech.

This is the highest-quality open Amharic TTS model available today. It generates natural, expressive speech from text and can clone any speaker's voice from a 10-second audio sample.


🚀 Quick Try (No Install)

Live Demo: Try it in your browser →


📊 At a Glance

Languages Amharic (primary), English, Chinese (base model)
Architecture Non-autoregressive discrete diffusion
Parameters 612.6M (Qwen3-0.6B + HiggsAudioV2, 8 codebooks)
Training data ~81,731 samples / ~331 hours
Best loss 3.9518 (step 10,000 / 12,000)
License Apache 2.0
Inference cost Runs on free Google Colab T4 (~3GB VRAM)
Voice cloning Zero-shot, 10s reference audio

🎯 What Makes This Special

1. Actually Sounds Like Amharic

Most "multilingual" TTS models (MMS, XTTS) produce Amharic that sounds robotic or mispronounces ejective consonants (ጠ, ጰ, ጸ, ፀ, ቸ, ጨ). This model was trained exclusively on Amharic audio and preserves:

  • Correct ejective / glottalic consonant articulation
  • Natural prosody and rhythm (not English rhythm overlaid on Amharic words)
  • Gemination (double consonants: ሀበተ vs ሀብቴ)
  • Pitch patterns for questions vs statements

2. Voice Cloning Works

Give it 10 seconds of any Amharic speaker and it will synthesize new sentences in that voice. Tested on:

  • Male/female voices
  • Formal news-reading style
  • Casual conversational style
  • Different Ethiopian dialects (Addis Ababa, Gondar, Wollo)

3. Open Everything

  • ✅ Open weights (Apache 2.0)
  • ✅ Open training code
  • ✅ Open datasets (or documented sources)
  • ✅ Open benchmarks (we publish MOS scores)
  • ✅ No API keys, no cloud lock-in

🛠️ Quick Start — Colab

Open In Colab

# Cell 1: Install
!pip install -q omnivoice soundfile

# Cell 2: Load model
import torch
from omnivoice import OmniVoice, OmniVoiceGenerationConfig

model = OmniVoice.from_pretrained(
    "african-low-resource/omnivoice-amharic",
    device_map="cuda:0",
    dtype=torch.float16,
)

# Cell 3: Generate speech
text = "ሰላም፣ እንኳን ደህና መጣችሁ። ይህ የአማርኛ ንግግር ሙከራ ነው።"
audio = model.generate(
    text=text,
    language="Amharic",
    generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)

import soundfile as sf
sf.write("output.wav", audio[0], 24000)
print("✅ Saved to output.wav")

Voice Cloning

# Upload a 10-second reference WAV
prompt = model.create_voice_clone_prompt(ref_audio="speaker.wav", ref_text=None)

audio = model.generate(
    text="ዛሬ ቀን ጥሩ ነው።",
    language="Amharic",
    voice_clone_prompt=prompt,
    generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)
sf.write("cloned.wav", audio[0], 24000)

📈 Training Details

Parameter Value
Base model k2-fsa/OmniVoice
Backbone Qwen3-0.6B (636M params)
Audio tokenizer HiggsAudioV2 (8 codebooks, 1025 vocab)
Learning rate 2e-5
LR schedule Cosine
Max steps 12,000
Epochs ~10
Batch tokens 28,672
Precision bf16
Codebook weights [8, 8, 6, 6, 4, 4, 2, 2]
Best loss 3.9518 @ step 10,000

Datasets

Dataset Hours Role
google/WaxalNLP ~200h Core corpus
gheero-Leyu/leyu-amharic-addis-ababa-dialect ~50h Dialect diversity
surafelabebe/amharic_clear_audio_tts ~40h Clean TTS data
chappM/amharic-bdu-asr ~41h ASR-aligned quality
Total ~331h

Training History

Run Steps Best Loss Notes
1 0→1,500 ~4.15 Init from v3
2 1,500→6,000 3.9994 (step 4,190) Storage issue lost checkpoints
3 2,700→12,000 3.9518 (step 10,000) Final best

🧪 Evaluation

We evaluate on a held-out test set (10% of combined data, never seen in training).

Objective Metrics

Metric Value Comparison (MMS-TTS-amh)
Mel-Cepstral Distortion (MCD) TBD TBD
F0 RMSE TBD TBD
Character Error Rate (ASR-back) TBD TBD

Subjective Metrics (MOS)

Criterion Score (1-5) N evaluators
Naturalness TBD TBD
Speaker similarity (cloning) TBD TBD
Ejective consonant accuracy TBD TBD
Prosody / rhythm TBD TBD

Subjective evaluation in progress. Results will be published here and in our benchmark repo.


🔮 Roadmap

This model is Phase 1 of a larger pan-African initiative:

  • Amharic (East Africa, 60M speakers) — TTS + voice cloning ✅
  • Wolof (West Africa, 12M speakers) — TTS + voice cloning (Q3 2026)
  • Hausa (West Africa, 90M speakers) — TTS (Q4 2026)
  • Swahili (East Africa, 200M speakers) — TTS + ASR (Q1 2027)
  • Somali (Horn of Africa, 20M speakers) — TTS (Q2 2027)
  • Self-service fine-tuning toolkit for any language with 50h+ audio

Follow Voices For All for updates.


⚠️ Limitations & Biases

  1. Gender representation: Training data skews male (65%). Female voices may sound less natural.
  2. Dialect coverage: Heavy Addis Ababa bias. Rural Ethiopian accents (Tigray, Harar, Sidama) are underrepresented.
  3. Code-mixing: Switching mid-sentence between Amharic and English is unpredictable.
  4. Numerals/dates: Amharic calendar dates and large numbers sometimes mispronounce.
  5. Emotional range: Neutral/news-reading style only. No whisper, shouting, or singing.

We actively seek more diverse training data. If you have Amharic audio recordings (any dialect, any speaker), contact us.


🤝 Citation

@software{omnivoice_amharic_2026,
  author = {demeleww and Voices For All},
  title = {OmniVoice Amharic: Open Voice AI for 60M Speakers},
  year = {2026},
  url = {https://huggingface.co/african-low-resource/omnivoice-amharic},
  license = {Apache-2.0}
}

Base model:

@article{omnivoice2026,
  title={OmniVoice: High-Quality Voice Cloning TTS for 600+ Languages},
  journal={arXiv preprint arXiv:2604.00688},
  year={2026}
}

📬 Contact


Built with ❤️ for the 60M+ Amharic speakers who deserve a voice in AI.

Downloads last month
42,117
Safetensors
Model size
0.6B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for african-low-resource/omnivoice-amharic

Finetuned
Qwen/Qwen3-0.6B
Finetuned
k2-fsa/OmniVoice
Finetuned
(19)
this model

Datasets used to train african-low-resource/omnivoice-amharic

Space using african-low-resource/omnivoice-amharic 1

Paper for african-low-resource/omnivoice-amharic