Mogt: Oversampling With A Parsimonious Mixture Of Gaussian Trees Model For Imbalanced Time-Series Classification
John Zhen Fu Pang, Hong Cao, Vincent Yan Fu Tan

Abstract:
We propose a novel framework of using a parsimonious statistical model, known as mixture of Gaussian trees, for modelling the possibly multi-modal minority class to solve the problem of imbalanced time-series binary classification. By exploiting the fact that close-by time points are highly correlated, our model significantly reduces the number of covariance parameters to be estimated from O(d2) to O(Ld), L denotes the number of mixture components and d is the dimension. Thus our model is particularly effective for modelling high-dimensional time-series with limited number of instances in the minority positive class. We conduct extensive classification experiments based on several well-known time-series datasets (both single- and multi-modal) by first randomly generating synthetic instances from our learned mixture model to correct the imbalance. We then compare our results to several state-of-the-art oversampling techniques and the results demonstrate that when our proposed model is used, the same support vector machines classifier achieves much better classification accuracy across the range of datasets. In fact, the proposed method achieves the best average performance 27 times out of 30 multi-modal datasets according to the F-value metric.