๐ค F5-TTS: Vietnamese Text-to-Speech Synthesis.
The model was trained with approximately 1000 hours of data on an RTX 3090 GPU.
Enter text and upload a sample voice to generate natural speech.
๐ Sample Voice
Drop Audio Here
- or -
Click to Upload
๐ Text
โก Speed
โบ
0.3
2
๐ฅ Generate Voice
๐ง Generated Audio
๐ Spectrogram
โ Model Limitations
1. This model may not perform well with numerical characters, dates, special characters, etc. => A text normalization module is needed. 2. The rhythm of some generated audios may be inconsistent or choppy => It is recommended to select clearly pronounced sample audios with minimal pauses for better synthesis quality. 3. Default, reference audio text uses the pho-whisper-medium model, which may not always accurately recognize Vietnamese, resulting in poor voice synthesis quality. 4. Inference with overly long paragraphs may produce poor results.