MultiSpeech: Multi-Speaker Text to Speech with Transformer
Authors
- Mingjian Chen (Perking University) milk@pku.edu.cn
- Xu Tan (Microsoft Research) xuta@microsoft.com
- Yi Ren (Zhejiang University) rayeren@zju.edu.cn
- Jin Xu (Tsinghua University) j-xu18@mails.tsinghua.edu.cn
- Hao Sun (Perking University) sigmeta@pku.edu.cn
- Sheng Zhao (Microsoft STC Asia) Sheng.Zhao@microsoft.com
- Tao Qin (Microsoft Research) taoqin@microsoft.com
- Tie-Yan Liu (Microsoft Research) tyliu@microsoft.com
TTS Audio Samples in the Paper
Experiments on VCTK and LibriTTS
VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
GT | GT mel + Wavenet | Transformer based TTS | MultiSpeech |
---|---|---|---|
LibriTTS speaker : The hectic flushed into her thin cheeks, but her voice sounded calm as before.
GT | GT mel + Wavenet | Transformer based TTS | MultiSpeech |
---|---|---|---|
Audios of ablation study on VCTK
VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
MultiSpeech | -DC | -LN | -PB |
---|---|---|---|
-DC-LN-PB |
---|
Audios of FastSpeech (MultiSpeech as teacher model).
VCTK speaker : People look, but no one ever finds it.
GT | MultiSpeech | FastSeech |
---|---|---|
Our Related Works
Almost Unsupervised Text to Speech and Automatic Speech Recognition
FastSpeech: Fast, Robust and Controllable Text to Speech
Semi-Supervised Neural Architecture Search
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
FastSpeech 2: Fast and High-Quality End-to-End Text-to-Speech
UWSpeech: Speech to Speech Translation for Unwritten Languages
Denoising Text to Speech with Frame-Level Noise Modeling