Semi-Supervised Neural Architecture Search
ArXiv: arXiv:2002.10389
Authors
- Renqian Luo (University of Science and Technology of China) lrq@mail.ustc.edu.cn
- Xu Tan (Microsoft Research) xuta@microsoft.com
- Rui Wang (Microsoft Research) ruiwa@microsoft.com
- Tao Qin (Microsoft Research) taoqin@microsoft.com
- Enhong Chen (University of Science and Technology of China) cheneh@ustc.edu.cn
- Tie-Yan Liu (Microsoft Research) tyliu@microsoft.com
Abstract
Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost) to improve the controller. Specifically, SemiNAS 1) trains an initial controller with a small set of architecture-accuracy data pairs; 2) uses the trained controller to predict the accuracy of large amount of architectures~(without evaluation); and 3) adds the generated data pairs to the original data to further improve the controller. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. 2) It achieves higher accuracy under the same computational cost. On NASBench-101 benchmark dataset, it discovers a top 0.01% architecture after evaluating roughly 300 architectures, with only 1/7 computational cost compared with regularized evolution and gradient-based methods. On ImageNet, it achieves a state-of-the-art top-1 error rate of 23.5% (under the mobile setting) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
Model Architecture
Low-Resource Setting
 
Robustness Setting
 
Audio Samples
All of the audio samples use Griffin-Lim as vocoder.
Low-Resource Setting
*The good it tried to do took active shape in the establishment of temporary refuges - at hoxton for males and in the hackney road for females
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*But shall also tend more effectually to preserve the health
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*Hired a room for the night and morning which he and a large party of friends occupied before and during the execution
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*Every link in that great human chain is shaken along the whole lengthened line has the motion jarred and each in turn sees
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*This phial he had managed to retain in his possession in spite of the frequent searches to which he was subjected in newgate
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*No time was lost in carrying out the dread ceremony but it was not completed without some of the officials turning sick and the moment it was over
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*No other employee has been found who saw oswald enter that morning
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*Three months prior to his regularly scheduled separation date ostensibly to care for his mother who had been injured in an accident at her work
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*Oswald’s activities with regard to cuba raise serious questions as to how much he might have been motivated in the assassination
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
*Lawson also in the lead car did not scan any buildings since an important part of his job was to look backward at the president’s car
| GT(Griffin-Lim) | Transformer TTS | SemiNAS | 
|---|---|---|
Robustness Setting
*Allergic trouser.
| Transformer TTS | SemiNAS | 
|---|---|
*Christmas is coming.
| Transformer TTS | SemiNAS | 
|---|---|
*Nineteen twenty is when we are unique together until we realise we are all the same.
| Transformer TTS | SemiNAS | 
|---|---|
Our Related Works
Neural Architecture Optimization
FastSpeech: Fast, Robust and Controllable Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition