Retrieving Sounds By Vocal Imitation Recognition
Yichi Zhang, Zhiyao Duan

Abstract:
Vocal imitation is widely used in human communication. In this paper, we propose an approach to automatically recognize the concept of a vocal imitation, and then retrieve sounds of this concept. Because different acoustic aspects (e.g., pitch, loudness, timbre) are emphasized in imitating different sounds, a key challenge in vocal imitation recognition is to extract appropriate features. Hand-crafted features may not work well for a large variety of imitations. Instead, we use a stacked auto-encoder to automatically learn features from a set of vocal imitations in an unsupervised way. Then, a multi-class SVM is trained for sound concepts of interest using their training imitations. Given a new vocal imitation of a sound concept of interest, our system can recognize its underlying concept and return it with a high rank among all concepts. Experiments show that our system significantly outperforms an MFCC-based comparison system in both classification and retrieval.