Speech Research

This page lists some speech related research at Microsoft Research Asia, conducted by the team led by Xu Tan. The research topics cover text to speech, singing voice synthesis, music generation, automatic speech recognition, etc. Some research are open-sourced via NeuralSpeech and Muzic.

We are hiring researchers on speech, NLP, and deep learning at Microsoft Research Asia. Please contact xuta@microsoft.com if you have interests.

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

May 29, 2022

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

May 03, 2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

April 02, 2022

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

March 06, 2022

Speech-T: Transducer for Text to Speech and Beyond

October 06, 2021

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

September 21, 2021

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling

August 16, 2021

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior

June 11, 2021

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

June 02, 2021

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

March 05, 2021

AdaSpeech: Adaptive Text to Speech for Custom Voice

March 01, 2021

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

February 10, 2021

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

December 14, 2020

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

November 03, 2020

DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling

October 14, 2020

HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

September 02, 2020

PopMAG: Pop Music Accompaniment Generation

August 01, 2020

UWSpeech: Speech to Speech Translation for Unwritten Languages

June 12, 2020

MultiSpeech: Multi-Speaker Text to Speech with Transformer

May 09, 2020

Semi-Supervised Neural Architecture Search

March 01, 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

February 14, 2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

February 02, 2020

FastSpeech: Fast, Robust and Controllable Text to Speech

May 10, 2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition

April 10, 2019