MultiSpeech: Multi-Speaker Text to Speech with Transformer

Authors

TTS Audio Samples in the Paper

Experiments on VCTK and LibriTTS

VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.

GT GT mel + Wavenet Transformer based TTS MultiSpeech

LibriTTS speaker : The hectic flushed into her thin cheeks, but her voice sounded calm as before.

GT GT mel + Wavenet Transformer based TTS MultiSpeech

Audios of ablation study on VCTK

VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.

MultiSpeech -DC -LN -PB
-DC-LN-PB

Audios of FastSpeech (MultiSpeech as teacher model).

VCTK speaker : People look, but no one ever finds it.

GT MultiSpeech FastSeech

Almost Unsupervised Text to Speech and Automatic Speech Recognition
FastSpeech: Fast, Robust and Controllable Text to Speech
Semi-Supervised Neural Architecture Search
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech
UWSpeech: Speech to Speech Translation for Unwritten Languages
Denoising Text to Speech with Frame-Level Noise Modeling