A Hypothesis Testing Approach For Real-Time Multichannel Speech Separation Using Time-Frequency Masks
Ryan Michael Corey, University of Illinois at Urbana-Champaign
Andrew Carl Singer, University of Illinois at Urbana-Champaign

We propose a new approach to time-frequency mask generation for real-time multichannel speech separation. Whereas conventional approaches select the strongest source in each time-frequency bin, we perform a binary hypothesis test to determine whether a target source is present or not. We derive a generalized likelihood ratio test and extend it to underdetermined mixtures by aggregating the outputs of several tests with different interference models. This approach is justified by the nonstationarity and time-frequency disjointedness of speech signals. This computationally simple method is suitable for real-time source separation in resource-constrained and latency-critical applications.