VibeVoice-Realtime TTS Demo

Text

Streaming Input Text

This area will display the streaming input text in real time.

This demo requires the full text to be provided upfront. The model then receives the text via streaming input during synthesis.
For non-punctuation special characters, applying text normalization before processing often yields better results.

Speaker

CFG 1.5 Inference Steps 5

Model Generated Audio0.00s Audio Played0.00s

Runtime Logs