Semi-Supervised Neural Architecture Search

ArXiv: arXiv:2002.10389

Authors

Renqian Luo (University of Science and Technology of China) lrq@mail.ustc.edu.cn
Xu Tan (Microsoft Research) xuta@microsoft.com
Rui Wang (Microsoft Research) ruiwa@microsoft.com
Tao Qin (Microsoft Research) taoqin@microsoft.com
Enhong Chen (University of Science and Technology of China) cheneh@ustc.edu.cn
Tie-Yan Liu (Microsoft Research) tyliu@microsoft.com

Abstract

Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost) to improve the controller. Specifically, SemiNAS 1) trains an initial controller with a small set of architecture-accuracy data pairs; 2) uses the trained controller to predict the accuracy of large amount of architectures~(without evaluation); and 3) adds the generated data pairs to the original data to further improve the controller. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. 2) It achieves higher accuracy under the same computational cost. On NASBench-101 benchmark dataset, it discovers a top 0.01% architecture after evaluating roughly 300 architectures, with only 1/7 computational cost compared with regularized evolution and gradient-based methods. On ImageNet, it achieves a state-of-the-art top-1 error rate of 23.5% (under the mobile setting) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.

Model Architecture

Low-Resource Setting

Robustness Setting

Audio Samples

All of the audio samples use Griffin-Lim as vocoder.

Low-Resource Setting

*The good it tried to do took active shape in the establishment of temporary refuges - at hoxton for males and in the hackney road for females

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*But shall also tend more effectually to preserve the health

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*Hired a room for the night and morning which he and a large party of friends occupied before and during the execution

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*Every link in that great human chain is shaken along the whole lengthened line has the motion jarred and each in turn sees

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*This phial he had managed to retain in his possession in spite of the frequent searches to which he was subjected in newgate

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*No time was lost in carrying out the dread ceremony but it was not completed without some of the officials turning sick and the moment it was over

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*No other employee has been found who saw oswald enter that morning

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*Three months prior to his regularly scheduled separation date ostensibly to care for his mother who had been injured in an accident at her work

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*Oswald’s activities with regard to cuba raise serious questions as to how much he might have been motivated in the assassination

GT(Griffin-Lim)	Transformer TTS	SemiNAS

*Lawson also in the lead car did not scan any buildings since an important part of his job was to look backward at the president’s car

GT(Griffin-Lim)	Transformer TTS	SemiNAS

Robustness Setting

*Allergic trouser.

Transformer TTS	SemiNAS

*Christmas is coming.

Transformer TTS	SemiNAS

*Nineteen twenty is when we are unique together until we realise we are all the same.

Transformer TTS	SemiNAS

Neural Architecture Optimization
FastSpeech: Fast, Robust and Controllable Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition

Semi-Supervised Neural Architecture Search

Authors

Abstract

Model Architecture

Low-Resource Setting

Robustness Setting

Audio Samples

Low-Resource Setting

Robustness Setting

Our Related Works