Multimodal Factor Analysis
Yasin Yilmaz, Alfred O. Hero

A multimodal system with Poisson, Gaussian, and multinomial observations is considered. A generative graphical model that combines multiple modalities through common factor loadings is proposed. In this model, latent factors are like summary objects that has latent factor scores in each modality, and the observed objects are represented in terms of such summary objects. This potentially brings about a significant dimensionality reduction. It also naturally enables a powerful means of clustering based on a diverse set of observations. An expectation-maximization (EM) algorithm to find the model parameters is provided. The algorithm is tested on a Twitter dataset which consists of the counts and geographical coordinates of hashtag occurrences, together with the bag of words for each hashtag. The resultant factors successfully localizes the hashtags in all dimensions: counts, coordinates, topics. The algorithm is also extended to accommodate von Mises-Fisher distribution, which is used to model the spherical coordinates.