Articulatory And Spectrum Features Integration Using Generalized Distillation Framework
Jianguo Yu, University of Aizu
Konstantin Markov, University of Aizu
Tomoko Matsui, Institute of Statistical Mathematics

Abstract:
It has been shown that by combining the acoustic and articulatory information significant performance improvements in automatic speech recognition (ASR) task can be achieved. In practice, however, articulatory information is not available during recognition and the general approach is to estimate it from the acoustic signal. In this paper, we propose a different approach based on the generalized distillation framework, where acoustic-articulatory inversion is not necessary. We trained two DNN models: one called ``teacher" learns from both acoustic and articulatory features and the other one called ``student" is trained on acoustic features only, but its training process is guided by the ``teacher" model and can reach a better performance that can"t be obtained by regular training even without articulatory feature inputs during test time. The paper is organized as follows: Section " gives the introduction and briefly discusses some related works. Section 2 describes the distillation training process, Section 3 describes ASR system used in this paper. Section 4 presents the experiments and the paper is concluded by Section 5.