A Neural Network Based Algorithm For Speaker Localization In A Multi-Room Environment
Fabio Vesperini, UniversitÓ Politecnica delle Marche
Paolo Vecchiotti, UniversitÓ Politecnica delle Marche
Emanuele Principi, UniversitÓ Politecnica delle Marche
Stefano Squartini, UniversitÓ Politecnica delle Marche
Francesco Piazza, UniversitÓ Politecnica delle Marche

Abstract:
A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed in this paper. The approach is fully data-driven and employs a Neural Network fed by GCC-PHAT (Generalized Cross Correlation Phase Transform) Pat-terns, calculated by means of the microphone signals, to determine the speaker position in the room under analysis. In particular, we deal with a multi-room case study, in which the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested against the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In particular, we focused on the speaker localization problem in two distinct neighbouring rooms. We assumed the presence of an Oracle multi-room Voice Activity Detector (VAD) in our experiments. A three-stage optimization procedure has been adopted to find the best network configuration and GCC-PHAT Patterns combination. Moreover, an algorithm based on Time Difference of Arrival (TDOA), recently proposed in literature for the addressed applicative context, has been considered as term of comparison. As result, the proposed algorithm outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against "465 mm. Concluding, we also assessed the algorithm performance when a real VAD, recently proposed by some of the authors, is used. Even though a degradation of localization capability is registered (an average RMSE equal to 770 mm), still a remarkable improvement with respect to the state of the art performance is obtained.