Randomized Robust Subspace Recovery For Big Data
Mostafa Rahmani, George Atia

Abstract:
In this paper, a randomized PCA algorithm that is robust to the presence of outliers and whose complexity is independent of the dimension of the given data matrix is proposed. Using random sampling and random embedding techniques, the given data matrix is turned to a small compressed data. A subspace learning approach is proposed to extract the columns subspace of the low rank matrix from the compressed data. Two ideas for robust subspace learning are proposed to work under two different model assumptions. The first idea is based on the linear dependence between the columns of the low rank matrix, and the second is based on the independence between the columns subspace of the low rank matrix and the subspace spanned by the outlying columns. We derive sufficient conditions to guarantee the performance of the proposed approach with high probability. It is shown that the proposed algorithm can successfully identify the outliers just by using roughly $\mathcal{O} ( r^2) $ random linear data observations, where $r$ is the rank of the low rank matrix, and provably achieve notable speedups in comparison to existing approaches.