Author
Audio Samples
All of the audio samples use MelGAN as vocoder.
Audio Quality
When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.
VCTK Female
GT |
GT Mel+Vocoder |
Joint-training |
|
|
|
PPG-based |
AdaSpeech |
Ours(AdaSpeech 2) |
|
|
|
VCTK Male
GT |
GT Mel+Vocoder |
Joint-training |
|
|
|
PPG-based |
AdaSpeech |
Ours(AdaSpeech 2) |
|
|
|
Some have accepted it as a miracle without physical explanation.
VCTK Female
GT |
GT Mel+Vocoder |
Joint-training |
|
|
|
PPG-based |
AdaSpeech |
Ours(AdaSpeech 2) |
|
|
|
VCTK Male
GT |
GT Mel+Vocoder |
Joint-training |
|
|
|
PPG-based |
AdaSpeech |
Ours(AdaSpeech 2) |
|
|
|
the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.
LJSpeech
GT |
GT Mel+Vocoder |
Joint-training |
|
|
|
PPG-based |
AdaSpeech |
Ours(AdaSpeech 2) |
|
|
|
Analyses on adaptation strategy
When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.
Origin |
Without L2 loss constraint |
Fine-tune mel encoder & decoder |
|
|
|
Varying Adaptation Data
Please call stella.
1 samples |
2 samples |
5 samples |
10 samples |
|
|
|
|
20 samples |
50 samples |
100 samples |
|
|
|
FastSpeech: Fast, Robust and Controllable Text to Speech
Semi-Supervised Neural Architecture Search
MultiSpeech: Multi-Speaker Text to Speech with Transformer
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
UWSpeech: Speech to Speech Translation for Unwritten Languages
AdaSpeech: Adaptive Text to Speech for Custom Voice