🎤 F5-TTS: Vietnamese Text-to-Speech Synthesis.

The model was trained with approximately 1000 hours of data on an RTX 3090 GPU.

Enter text and upload a sample voice to generate natural speech.

🔊 Sample Voice

📝 Text

⚡ Speed

0.3 2

🎧 Generated Audio

📊 Spectrogram

❗ Model Limitations