AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Author

Audio Samples

All of the audio samples use MelGAN as vocoder.

Audio Quality

When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.

VCTK Female

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

VCTK Male

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

Some have accepted it as a miracle without physical explanation.

VCTK Female

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

VCTK Male

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.

LJSpeech

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

Analyses on adaptation strategy

When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.

Origin Without L2 loss constraint Fine-tune mel encoder & decoder

Varying Adaptation Data

Please call stella.

1 samples 2 samples 5 samples 10 samples
20 samples 50 samples 100 samples

FastSpeech: Fast, Robust and Controllable Text to Speech
Semi-Supervised Neural Architecture Search
MultiSpeech: Multi-Speaker Text to Speech with Transformer
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
UWSpeech: Speech to Speech Translation for Unwritten Languages
AdaSpeech: Adaptive Text to Speech for Custom Voice