Text-Informed Audio Source Separation Using Nonnegative Matrix Partial Co-Factorization
Luc Le Magoarou, Alexey Ozerov, Ngoc Q. K. Duong

We consider a single-channel source separation problem consisting in separating speech from nonstationary background such as music. We introduce a novel approach called {\it text-informed separation}, where the source separation process is guided by the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the nonnegative matrix partial co-factorization (NMPCF) model based on a so called {\it excitation-filter-channel} speech model. The proposed NMPCF model allows sharing the linguistic information between the example speech and the speech in the mixture. We then derive the corresponding multiplicative update (MU) rules for the parameter estimation. Experimental results over different types of mixtures and speech examples show the effectiveness of the proposed approach.

keywords :
Informed audio source separation, text information, nonnegative matrix partial co-factorization, source-filter model