Search
mlsplogo MLSP2017
IEEE International Workshop on
Machine Learning for Signal Processing

September 25-28, 2017  Roppongi, Tokyo, Japan

PROGRAM


Monday September 25, 2017
8:30-18:00Foyer, ground floorRegistration
9:45-12:30Foyer, ground floor, registration deskGoogle Tokyo Office visit
pre-registration only
13:20-13:30Iwasaki Koyata Memorial Hall, ground floorOpening Greeting
Chair: Tomoko Matsui, The Institute of Statistical Mathematics, Japan
13:30-15:00Iwasaki Koyata Memorial Hall, ground floorTutorial: Machine Learning Methods in Topological Data Analysis

Kenji Fukumizu
The Institute of Statistical Mathematics, Tokyo, Japan

Chair: Tomoko Matsui, The Institute of Statistical Mathematics, Japan

15:00-16:30Iwasaki Koyata Memorial Hall, ground floorTutorial: Data Fusion through Matrix and Tensor Decompositions: Diversity, Identifiability, and Interpretability

Tülay Adali
University of Maryland Baltimore County, USA

Chair: Raviv Raich, Oregon State University, USA

16:30-17:00Lecture Hall, 2nd floorCoffee Break
17:00-19:00Lecture Hall, 2nd floor
Poster Session 1: Machine learning theory and algorithm
Chair: Simo Särkkä, Aalto University, Finland

Asymptotic Performance Of Regularized Quadratic Discriminant Analysis Based Classifiers
Khalil Elkhalil, Abla Kammoun, Romain Couillet, Tareq Al-Naffouri, Mohamed-Slim Alouini
×

This paper carries out a large dimensional analysis of the standard regularized quadratic discriminant analysis (QDA) classifier designed on the assumption that data arise from a Gaussian mixture model. The analysis relies on fundamental results from random matrix theory (RMT) when both the number of features and the cardinality of the training data within each class grow large at the same pace. Under some mild assumptions, we show that the asymptotic classification error converges to a deterministic quantity that depends only on the covariances and means associated with each class as well as the problem dimensions. Such a result permits a better understanding of the performance of regularized QDA and can be used to determine the optimal regularization parameter that minimizes the misclassification error probability. Despite being valid only for Gaussian data, our theoretical findings are shown to yield a high accuracy in predicting the performances achieved with real data sets drawn from popular real data bases, thereby making an interesting connection between theory and practice.
 
×

An important research topic of the recent years has been to understand and analyze manifold-modeled data for clustering and classification applications. Most clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate difficult sampling conditions only to some extent, and may fail for scarcely sampled data manifolds or at high-curvature regions. In this paper, we consider a setting where each cluster is concentrated around a manifold and propose a manifold clustering algorithm that relies on the observation that the variation of the tangent space must be consistent along curves over the same data manifold. In order to achieve robustness against challenges due to noise, manifold intersections, and high curvature, we propose a progressive clustering approach: Observing the variation of the tangent space, we first detect the non-problematic manifold regions and form pre-clusters with the data samples belonging to such reliable regions. Next, these pre-clusters are merged together to form larger clusters with respect to constraints on both the distance and the tangent space variations. Finally, the samples identified as problematic are also assigned to the computed clusters to finalize the clustering. Experiments with synthetic and real datasets show that the proposed method outperforms the manifold clustering algorithms in comparison based on Euclidean distance and sparse representations.
 
×

In this paper, we propose a novel approach to automatically identify plant species using dynamics of plant growth and development or spatiotemporal evolution model (STEM). The online kernel adaptive autoregressive-moving-average (KAARMA) algorithm, a discrete-time dynamical system in the kernel reproducing Hilbert space (RKHS), is used to learn plant-development syntactic patterns from feature-vector sequences automatically extracted from 2D plant images, generated by stochastic L-systems. Results show multiclass KAARMA STEM can automatically identify plant species based on growth patterns. Furthermore, finite state machines extracted from trained KAARMA STEM retains competitive performance and are robust to noise. Automatically constructing an L-system or formal grammar to replicate a spatiotemporal structure is an open problem. This is an important first step to not only identify plants but also to generate realistic plant models automatically from observations.
 
×

Data in all Signal Processing (SP) applications is being generated super-exponentially, and at an ever increasing rate. A meaningful way to pre-process it so as to achieve feasible computation is by Partitioning the data [5]. Indeed, the task of partitioning is one of the most difficult problems in computing, and it has extensive applications in solving real-life problems, especially when the amount of SP data (i.e., images, voices, speakers, libraries etc.) to be processed is prohibitively large. The problem is known to be NP-hard. The benchmark solution for this for the Equi-partitioning Problem (EPP) has involved the classic field of Learning Automata (LA), and the corresponding algorithm, the Object Migrating Automata (OMA) has been used in numerous application domains. While the OMA is a fixed structure machine, it does not incorporate the Pursuit concept that has, recently, significantly enhanced the field of LA. In this paper, we pioneer the incorporation of the Pursuit concept into the OMA. We do this by a non-intuitive paradigm, namely that of removing (or discarding) from the query stream, queries that could be counter-productive. This can be perceived as a filtering agent triggered by a pursuit-based module. The resulting machine, referred to as the Pursuit OMA (POMA), has been rigorously tested in all the standard benchmark environments. Indeed, in certain extreme environments it is almost ten times faster than the original OMA. The application of the POMA to all signal processing applications is extremely promising.
 
×

We present a simple and computationally efficient algorithm, based on the accelerated Newton’s method, to solve the root finding problem associated with the projection onto the l1-ball problem. Considering an interpretation of the Michelot’s algorithm as Newton method, our algorithm can be understood as an accelerated version of the Michelot’s algorithm, that needs significantly less major iterations to converge to the solution. Although the worst-case performance of the propose algorithm is O(n^2), it exhibits in practice an O(n) performance and it is empirically demonstrated that it is competitive or faster than existing methods.
 
×

Variational methods for approximate inference in Bayesian models optimise a lower bound on the marginal likelihood, but the optimization problem often suffers from being non-convex and high-dimensional. This can be alleviated by working in a collapsed domain where a part of the parameter space is marginalized. We consider the KL-corrected collapsed variational bound and apply it to Dirichlet process mixture models, allowing us to reduce the optimization space considerably. We find that the variational bound exhibits consistent and exploitable structure, allowing the application of difference-of-convex optimization algorithms. We show how this yields an interpretable fixed-point update algorithm in the collapsed setting for the Dirichlet process mixture model. We connect this update formula to classical coordinate ascent updates, illustrating that the proposed improvement surprisingly reduces to the traditional scheme.
 
×

In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.
 
×

Manual labeling of individual instances is time-consuming. This is commonly resolved by labeling a bag-of-instances with a single common label or label-set. However, this approach is still time-costly for large datasets. In this paper, we propose a mixed-supervision multi-instance multi-label learning model for learning from easily available meta data information (MIML-AI). This auxiliary information is normally collected automatically with the data, e.g., an image location information or a document author name. We propose a discriminative graphical model with exact inferences to train a classifier based on auxiliary label information and a small number of labeled bags. This strategy utilizes meta data as means of providing a weaker label as an alternative to intensive manual labeling. Experiment on real data illustrates the effectiveness of our proposed method relative to current approaches, which do not use the information from bags that contain only meta-data label information.
 
Hankel Subspace Method For Efficient Gesture Representation
Bernardo Bentes Gatto, Anna Bogdanova, Lincon Sales Souza, Eulanda Miranda Dos Santos
×

Gesture recognition technology provides multiple opportunities for direct human-computer interaction, without the use of additional external devices. As such, it had been an appealing research area in the field of computer vision. Many of its challenges are related to the joint complexity of human gestures, which produce inconsistent distributions under different viewpoints. In this paper, we introduce a novel framework for gesture recognition, which achieves high discrimination of spatial and temporal information while significantly decreasing the computational cost. The proposed method consists of four stages. First, we generate an ordered subset of images from a gesture video, filtering out those that do not contribute to the recognition task. Second, we express spatial and temporal gesture information in a compact trajectory matrix. Then, we represent the obtained matrix as a subspace, achieving discriminative information, as the trajectory matrices derived from different gestures generate dissimilar clusters in a low dimension space. Finally, we apply soft weights to find the optimal dimension of each gesture subspace. We demonstrate practical and theoretical gains of our compact representation through experimental evaluation using two publicity available gesture datasets.
 
×

In an extension to some previous work on the topic, we show how all classical polynomial-based quadrature rules can be interpreted as Bayesian quadrature rules if the covariance kernel is selected suitably. As the resulting Bayesian quadrature rules have zero posterior integral variance, the results of this article are mostly of theoretical interest in clarifying the relationship between the two different approaches to numerical integration.
 
Online Function Minimization With Convex Random Relu Expansions
Laurens Bliek, Michel Verhaegen, Sander Wahls
×

We propose CDONE, a convex version of the DONE algorithm. DONE is a derivative-free online optimization algorithm that uses surrogate modeling with noisy measurements to find a minimum of objective functions that are expensive to evaluate. Inspired by their success in deep learning, CDONE makes use of rectified linear units, together with a nonnegativity constraint to enforce convexity of the surrogate model. This leads to a sparse and cheap to evaluate surrogate model of the unknown optimization objective that is still accurate and that can be minimized with convex optimization algorithms. The CDONE algorithm is demonstrated on a toy example and on the problem of hyper-parameter optimization for a deep learning example on handwritten digit classification.
 
×

Two standard assumptions of the classical blind source separation (BSS) theory are frequently violated by modern data sets. First, the majority of the existing methodology assumes vector-valued signals while data exhibiting a natural tensor structure is frequently observed. Second, many typical BSS applications exhibit serial dependence which is usually modeled using second order stationarity assumptions, which is however often quite unrealistic. To address these two issues we extend three existing methods of nonstationary blind source separation to tensor-valued time series. The resulting methods naturally factor in the tensor form of the observations without resorting to vectorization of the signals. Additionally, the methods allow for two types of nonstationarity, either the source series are blockwise second order weak stationary or their variances change smoothly in time. A simulation study and an application to video data show that the proposed extensions outperform their vectorial counterparts and successfully identify source series of interest.
 
×

We study the problem of matrix factorization by variational Bayes method, under the assumption that observed matrix is the product of low-rank dense and sparse matrices with additional noise. Under assumption of Laplace distribution for sparse matrix prior, we analytically derive an approximate solution of matrix factorization by minimizing Kullback-Leibler divergence between posterior and trial function. By evaluating our solution numerically, we also discuss accuracy of matrix factorization of our analytical solution.
 
×

In this paper, we regarded an absorbing inhomogeneous medium as an assembly of thin layers having different propagation properties. We derived a stochastic model for the refractive index and formulated the localisation problem given noisy distance measurements using graph realisation problem. We relaxed the problem using semi-definite programming (SDP) approach in l^p realisation domain and derived upper bounds that follow Edmundson-Madansky bound of order 6p (EM6p) on the SDP objective function to provide an estimation of the techniques’ localisation accuracy. Our results showed that the inhomogeneity of the media and the choice of l^p norm have significant impact on the ratio of the expected value of the localisation error to the upper bound for the expected optimal SDP objective value. The tightest ratio was derived when l^inf norm was used.
 
Blind Channel Equalization Of Encoded Data Over Galois Fields
Denis Gustavo Fantinato, Daniel Guerreiro E Silva, Romis Ribeiro De Faissol Attux, Aline Neves
×

In communication systems, the study of elements and structures defined over Galois fields are generally limited to data coding. However, in this work, a novel perspective that combines data coding and channel equalization is considered to compose a simplified communication system over the field. Besides the coding advantages, this framework is able to restore distortions or malfunctioning processes, and can be potentially applied in network coding models. Interestingly, the operation of the equalizer is possible from a blind standpoint through the exploration of the redundant information introduced by the encoder. More specifically, we define a blind equalization criterion based on the matching of probability mass functions (PMFs) via the Kullback-Leibler divergence. Simulations involving the main aspects of the equalizer and the criterion are performed, including the use of a genetic algorithm to aid the search for the solution, with promising results.
 
Unsupervised Domain Adaptation With Copula Models
Cuong D Tran, Vladimir Pavlovic, Ognjen (oggi) Rudovic
×

We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predictive densities beyond the common exponential family; (b) we show how to leverage Sklar’s theorem, the essence of the copula formulation relating the joint density to the copula dependency functions, to find effective feature mappings that mitigate the domain mismatch. By transforming the data to a copula domain, we show on a number of benchmark datasets (including human emotion estimation), and using different regression models for prediction, that we can achieve a more robust and accurate estimation of target labels, compared to recently proposed feature transformation (adaptation) methods.
19:10-21:00Kabayama/Matsumoto Room, ground floorWelcome Reception


Tuesday September 26, 2017
8:30-12:00Foyer, ground floorRegistration
9:00-10:00Iwasaki Koyata Memorial Hall, ground floorKeynote Lecture: Information Geometry for Signal Processing

Professor Shun-ichi Amari
RIKEN, Tokyo, Japan

Chair: Tomoko Matsui, The Institute of Statistical Mathematics, Japan

10:00-10:30Kabayama/Matsumoto Room, ground floorCoffee Break
10:30-12:30Iwasaki Koyata Memorial Hall, ground floorLecture Session 1: Machine learning theory and algorithm
Chair: Robert Jenssen, University of Tromso, Norway

10:30
Parallelizable Sparse Inverse Formulation Gaussian Processes (spingp)
Alexander Grigorievskiy, Neil Lawrence, Simo Särkkä
×

We propose a parallelizable sparse inverse formulation Gaussian process (SpInGP) for temporal models. It uses a sparse precision GP formulation and sparse matrix routines to speed up the computations. Due to the state-space formulation used in the algorithm, the time complexity of the basic SpInGP is linear, and because all the computations are parallelizable, the parallel form of the algorithm is sublinear in the number of data points. We provide example algorithms to implement the sparse matrix routines and experimentally test the method using both simulated and real data.
 
10:50
×

Polynomials have shown to be useful basis functions in the identification of nonlinear systems. However estimation of the unknown coefficients requires expensive algorithms, as for instance it occurs by applying an optimal least square approach. Bernstein polynomials have the property that the coefficients are the values of the function to be approximated at points in a fixed grid, thus avoiding a time-consuming training stage. This paper presents a novel machine learning approach to regression, based on new functions named particle-Bernstein polynomials, which is particularly suitable to solve multivariate regression problems. Several experimental results show the validity of the technique for the identification of nonlinear systems and the better performance achieved with respect to the standard techniques.
 
11:10
The Time Series Cluster Kernel
Karl Øyvind Mikalsen, Filippo Maria Bianchi, Cristina Soguero-Ruiz, Robert Jenssen
×

This paper presents the time series cluster kernel (TCK) for multivariate time series with missing data. Our approach leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with empirical prior distributions. Further, we exploit an ensemble learning approach to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. In comparative experiments, we demonstrate that the TCK is robust to parameter choices and illustrate its capabilities of dealing with multivariate time series, both with and without missing data.
 
11:30
Macau: Scalable Bayesian Factorization With High-Dimensional Side Information Using MCMC
Jaak Simm, Adam Arany, Pooya Zakeri, Tom Haber, Joerg Kurt Wegner, Vladimir Chupakhin, Hugo Ceulemans, Yves Moreau
×

Bayesian matrix factorization is a method of choice for making predictions for large-scale incomplete matrices, due to availability of efficient Gibbs sampling schemes and its robustness to overfitting. In this paper, we consider factorization of large scale matrices with high-dimensional side information. However, sampling the link matrix for the side information with standard approaches costs $O(F^3)$ time, where F is the dimensionality of the features. To overcome this limitation we, firstly, propose a prior for the link matrix whose strength is proportional to the scale of latent variables. Secondly, using this prior we derive an efficient sampler, with linear complexity in the number of non-zeros, $O(N_nz)$, by leveraging Krylov subspace methods, such as block conjugate gradient, allowing us to handle million-dimensional side information. We demonstrate the effectiveness of our proposed method in drug-protein interaction prediction task.
 
11:50
Dtw-Approach For Uncorrelated Multivariate Time Series Imputation
Thi Thu Hong Phan, Émilie Poisson Caillault, André Bigand, Alain Lefebvre
×

Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper, we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of recurrent data. This method involves two main steps. Firstly, we find the most similar sub-sequence to the sub-sequence before (resp. after) a gap based on the shape-features extraction and Dynamic Time Warping algorithms. Secondly, we fill in the gap by the next (resp. previous) sub-sequence of the most similar one on the signal containing missing values. Experimental results show that our approach performs better than several related methods in case of multivariate time series having low/non-correlations and effective information on each signal.
 
12:10
×

In this paper, we consider parameter estimation in latent, spatio-temporal Gaussian processes using particle Markov chain Monte Carlo methods. In particular, we use spectral decomposition of the covariance function to obtain a high-dimensional state-space representation of the Gaussian processes, which is assumed to be observed through a nonlinear non-Gaussian likelihood. We develop a Rao-Blackwellized particle Gibbs sampler to sample the state trajectory and show how to sample the hyperparameters and possible parameters in the likelihood. The proposed method is evaluated on a spatio-temporal population model and the predictive performance is evaluated using leave-one-out cross-validation.
12:30-13:45Kabayama/Matsumoto Room, ground floorLunch
13:45-14:00Iwasaki Koyata Memorial Hall, ground floorSupporter presentation: Whole Brain Architecture Initiative (WBAI)
14:00-16:00Iwasaki Koyata Memorial Hall, ground floorLecture Session 2: Machine learning applications
Chair: Deniz Erdogmus, Northeastern University, USA

14:00
×

Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text. Despite the overall fair quality, the generated images often expose visible flaws that lack structural definition for an object of interest. In this paper, we aim to extend state of the art for GAN-based text-to-image synthesis by improving perceptual quality of generated images. Differentiated from previous work, our synthetic image generator optimizes on perceptual loss functions that measure pixel, feature activation, and texture differences against a natural image. We present visually more compelling synthetic images of birds and flowers generated from text descriptions in comparison to some of the most prominent existing work.
 
14:20
×

This paper describes an iterative data-driven algorithm for automatically labeling coronary vessel segments in MDCT images. Such techniques are useful for effective presentation and communication of findings on coronary vessel pathology by physicians and computer-assisted diagnosis systems. The experiments are done on the 18 sets of coronary vessel data in the Rotterdam Coronary Artery Algorithm Evaluation Framework that contain segment labeling by medical experts. The performance of our algorithm show both good accuracy and efficiency compared to previous works on this task.
 
14:40
Neonatal Seizure Detection Using Convolutional Neural Networks
Alison O'shea, Gordon Lightbody, Geraldine Boylan, Andriy Temko
×

This study presents a novel end-to-end architecture that learns hierarchical representations from raw EEG data using fully convolutional deep neural networks for the task of neonatal seizure detection. The deep neural network acts as both feature extractor and classifier, allowing for end-to-end optimization of the seizure detector. The designed system is evaluated on a large dataset of continuous unedited multi-channel neonatal EEG totaling 835 hours and comprising of ”389 seizures. The proposed deep architecture, with sample-level filters, achieves an accuracy that is comparable to the state-of-the-art SVM-based neonatal seizure detector, which operates on a set of carefully designed hand-crafted features. The fully convolutional architecture allows for the localization of EEG waveforms and patterns that result in high seizure probabilities for further clinical examination.
 
15:00
Correntropy Induced Metric Based Common Spatial Patterns
Jiyao Dong, Badong Chen, Na Lu, Haixian Wang, Nanning Zheng
×

Common spatial patterns (CSP) is a widely used method in the field of electroencephalogram (EEG) signal processing. The goal of CSP is to find spatial filters that maximize the ratio between the variances of two classes. The conventional CSP is however sensitive to outliers because it is based on the L2-norm. Inspired by the correntropy induced metric (CIM), we propose in this work a new algorithm, called CIM based CSP (CSP-CIM), to improve the robustness of CSP with respect to outliers. The CSP-CIM searches the optimal solution by a simple gradient based iterative algorithm. A toy example and a real EEG dataset are used to demonstrate the desirable performance of the new method.
 
15:20
×

This paper addresses issues in human fall detection from videos. Unlike using handcrafted features in the conventional machine learning, we extract features from Convolutional Neural Networks (CNNs) for human fall detection. Similar to many existing work using two stream inputs, we use a spatial CNN stream with raw image difference and a temporal CNN stream with optical flow as the inputs of CNN. Different from conventional two stream action recognition work, we exploit sparse representation with residual-based pooling on the CNN extracted features, for obtaining more discriminative feature codes. For characterizing the sequential information in video activity, we use the code vector from long-range dynamic feature representation by concatenating codes in segment-levels as the input to a SVM classifier. Experiments have been conducted on two public video databases for fall detection. Comparisons with six existing methods show the effectiveness of the proposed method.
 
15:40
A Bayesian Forecasting And Anomaly Detection Framework For Vehicular Monitoring Networks
Maria Scalabrin, Matteo Gadaleta, Riccardo Bonetto, Michele Rossi
×

In this paper, we are concerned with the automated and runtime analysis of vehicular data from large scale traffic monitoring networks. This problem is tackled through localized and small-size Bayesian networks (BNs), which are utilized to capture the spatio-temporal relationships underpinning traffic data from nearby road links. A dedicated BN is set up, trained, and tested for each road in the monitored geographical map. The joint probability distribution between the cause nodes and the effect node in the BN is tracked through a Gaussian Mixture Model (GMM), whose parameters are estimated via Bayesian Variational Inference (BVI). Forecasting and anomaly detection are performed on statistical measures derived at runtime by the trained GMMs. Our design choices lead to several advantages: the approach is scalable as a small-size BN is associated with and independently trained for each road and the localized nature of the framework allows flagging atypical behaviors at their point of origin in the monitored geographical map. The effectiveness of the proposed framework is tested using a large dataset from a real network deployment, comparing its prediction performance with that of selected regression algorithms from the literature, while also quantifying its anomaly detection capabilities.
16:00-16:30Lecture Hall, 2nd floorCoffee Break
16:30-18:30Lecture Hall, 2nd floor
Poster Session 2: Machine learning applications
Chair: Vladimir Pavlovic, Rutgers University, USA

×

In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher’s discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.
 
×

This paper proposes a linear stochastic state space model for electrocardiogram signal processing and analysis. The model is obtained as a discretized version of Wiener process acceleration model. The model is combined with a fixed-lag Rauch-Tung-Striebel smoother to perform on-line signal denoising, feature extraction, and beat classification. The results indicate that the proposed approach outperforms a conventional FIR filter in terms of improved signal-to-noise ratio, and that the approach can be used for highly accurate on-line classification of normal beats and premature ventricular contractions. The benefits of the model include the possibility to use closed-form solutions to the optimal filtering and smoothing problems, quick adaptation to sudden changes in beat morphology and heart rate, simple and fast initialization, preprocessing-free operation, intuitive interpretation of the system state, and more.
 
Predicting Individualized Intelligence Quotient Scores Using Brainnetome-Atlas Based Functional Connectivity
Rongtao Jiang, Shile Qi, Yuhui Du, Weizheng Yan, Vince D. Calhoun, Tianzi Jiang, Jing Sui
×

Variation in several brain regions and neural parameters is associated with intelligence. In this study, we adopted functional connectivity (FC) based on Brainnetome-atlas to predict the intelligence quotient (IQ) scores quantitatively with a prediction framework incorporating advanced feature selection and regression methods. We compared prediction performance of five regression models and evaluated the effectiveness of feature selection. The best prediction performance was achieved by ReliefF+LASSO, by which correlations of r=0.72 and r=0.46 between prediction and true values were obtained for ”74 female and ”86 male subjects respectively in a leave-one-out-cross-validation, suggesting that for female subjects, a better prediction of IQ scores can be achieved using precise FCs. Further, weight analysis revealed the most predictive FCs and the relevant regions. Results support the hypothesis that intelligence is characterized by interaction between multiple brain regions, especially the parieto-frontal integration theory implicated areas. This study facilitates our understanding of the biological basis of intelligence by individualized prediction.
 
×

Ventricular tachycardia, ventricular flutter, and ventricular fibrillation are malignant forms of cardiac arrhythmias, whose occurrence may be a life-threatening event. Several methods exist for detecting these arrhythmias in the electrocardiogram. However, the use of Gaussian process classifiers in this context has not been reported in the current literature. In comparison to the popular support vector machines, Gaussian processes have the advantage of being fully probabilistic, they can be re-casted in Bayesian filtering compatible state-space form, and they can be flexibly combined with first-principles physical models. In this paper we use Gaussian process classification to detect malignant ventricular arrhythmias in the electrocardiogram. We describe how Gaussian process classifiers can be used to solve the detection problem, and show that the proposed classifiers achieve a performance that is comparable to that of the state-of-the-art methods henceforth laying down promising foundations for more general electrocardiogram-based arrhythmia detection framework.
 
×

Estimation is a critical component of synchronization in wireless and signal processing systems. There is a rich body of work on estimator derivation, optimization, and statistical characterization from analytic system models which are used pervasively today. We explore an alternative approach to building estimators which relies principally on approximate regression using large datasets and large computationally efficient artificial neural network models capable of learning non-linear function mappings which provide compact and accurate estimates. For single carrier PSK modulation, we explore the accuracy and computational complexity of such estimators compared with the current gold-standard analytically derived alternatives. We compare performance in various wireless operating conditions and consider the trade offs between the two different classes of systems. Our results show the learned estimators can provide improvements in areas such as short-time estimation and estimation under non-trivial real world channel conditions such as fading or other non-linear hardware or propagation effects.
 
A Deep Neural Network With A Restricted Noisy Channel For Identification Of Functional Introns
Alan Joseph Bekker, Michal Chorev, Liran Carmel, Jacob Goldberger
×

An appreciable fraction of introns is thought to be involved in cellular functions, but there is no obvious way to predict which specific intron is likely to be functional. For each intron we are given a feature representation that is based on its evolutionary patterns. For a small subsets of introns we are also given an indication that they are functional. For all other introns it is not known whether they are functional or not. Our task is to estimate what fraction of introns are functional and, how likely it is that each individual intron is functional. We define a probabilistic classification model that treats the given functionality labels as noisy versions of labels created by a Deep Neural Network model. The maximum-likelihood model parameters are found by utilizing the Expectation-Maximization algorithm. We show that roughly 80% of the functional introns are still not recognized as such, and that roughly a third of all introns are functional.
 
Adversarial Learning: A Critical Review And Active Learning Study
David J. Miller, Xinyi Hu, Zhicong Qiu, George Kesidis
×

This papers consists of two parts. The first is a critical review of prior art on adversarial learning, i) identifying some significant limitations of previous works, which have focused mainly on attack exploits and ii) proposing novel defenses against adversarial attacks. The second part is an experimental study considering the adversarial active learning scenario and an investigation of the efficacy of a mixed sample selection strategy for combating an adversary who attempts to disrupt the classifier learning.
 
×

Deep learning has gained considerable attention in the scientific community, breaking benchmark records in many fields such as speech and visual recognition. Motivated by extending advancement of deep learning approaches to brain imaging classification, we propose a framework, called ”deep neural network (DNN)+ layer-wise relevance propagation (LRP)”, to distinguish schizophrenia patients (SZ) from healthy controls (HCs) using functional network connectivity (FNC). ”00 Chinese subjects of 7 sites are included, each with a 50*50 FNC matrix resulted from group ICA on resting-state fMRI data. The proposed DNN+LRP not only improves classification accuracy significantly compare to four state-of-the-art classification methods (84% vs. less than 79%, ”0 folds cross validation) but also enables identification of the most contributing FNC patterns related to SZ classification, which cannot be easily traced back by general DNN models. By conducting LRP, we identified the FNC patterns that exhibit the highest discriminative power in SZ classification. More importantly, when using leave-one-site-out cross validation (using 6 sites for training, ” site for testing, 7 times in total), the cross-site classification accuracy reached 82%, suggesting high robustness and generalization performance of the proposed method, promising a wide utility in the community and great potentials for biomarker identification of brain disorders.
 
×

In the present paper, we propose a deep network architecture in order to improve the accuracy of pedestrian detection. The proposed method contains a proposal network and a classification network that are trained separately. We use a single shot multibox detector (SSD) as a proposal network to generate the set of pedestrian proposals. The proposal network is fine-tuned from a pre-trained network by several pedestrian data sets of large input size (5”2 × 5”2 pixels) in order to improve detection accuracy of small pedestrians. Then, we use a classification network to classify pedestrian proposals. We then combine the scores from the proposal network and the classification network to obtain better final detection scores. Experiments were evaluated using the Caltech test set, and, compared to other state-of-the-art methods of pedestrian detection task, the proposed method obtains better results for small pedestrians (30 to 50 pixels in height) with an average miss rate of 42%.
 
×

Current approaches on optimal spatio-spectral feature extraction for single-trial BCIs exploit mutual information based feature ranking and selection algorithms. In order to overcome potential confounders underlying feature selection by information theoretic criteria, we propose a non-parametric feature projection framework for dimensionality reduction that utilizes mutual information based stochastic gradient descent. We demonstrate the feasibility of the protocol based on analyses of EEG data collected during execution of open and close palm hand gestures. We further discuss the approach in terms of potential insights in the context of neurophysiologically driven prosthetic hand control.
 
×

This paper presents a fully convolutional architecture for pedestrian detection. The DenseNet model is incorporated in the Faster R-CNN framework to extract the deep convolutional features. A two-phase approach is suggested to minimize the false positives owing to hard negative backgrounds. Feature maps from multiple intermediate layers are taken into consideration to facilitate small-scale detection. The proposed method alongside few competent schemes are compared on two benchmark datasets. The obtained results demonstrate the potential of our approach in addressing the real world challenges.
 
Identification Of A Thermal Building Model By Learning The Dynamics Of The Solar Flux
Tahar Nabil, François Roueff, Jean-Marc Jicquel, Alexandre Girard
×

This article deals with the identification of a dynamic building model from on-site input-output records. In practice, the solar gains, a key input, are often unobserved due to the cost of the associated sensor. We suggest here to replace this sensor by a cheap outdoor temperature sensor, exposed to the sun. Our assumption is that the temperature bias between this sensor and a second sheltered sensor is an indirect observation of the solar flux. We derive a novel state-space model for the outdoor temperature bias, with sudden changes in the weather conditions accounted for by occasional high variance increments of the hidden state. The magnitude of the high values and the times at which they occur are estimated with an l”-regularized maximum likelihood approach. Finally, this model is appended to a thermal building model based on an equivalent RC network, forming a conditionally linear Gaussian state-space system. We apply the Expectation-Maximization algorithm with Rao-Blackwellised particle smoothing in order to learn the thermal model. We are able, despite the indirect observation of the solar flux, to correctly estimate the physical parameters of the building, in particular the static coefficients and the fast time constant.
 
Texture Classification From Single Uncalibrated Images: Random Matrix Theory Approach
Esmaeil S. Nadimi, Jurgen Herp, Maria Magdalena Buijs, Victoria Blanes-Vidal
×

We studied the problem of classifying textured-materials from their single-imaged appearance, under general viewing and illumination conditions, using the theory of random matrices. To evaluate the performance of our algorithm, two distinct databases of images were used: The CUReT database and our database of colorectal polyp images collected from patients undergoing colon capsule endoscopy for early cancer detection. During the learning stage, our classifier algorithm established the universality laws for the empirical spectral density of the largest singular value and normalized largest singular value of the image intensity matrix adapted to the eigenvalues of the information-plus-noise model. We showed that these two densities converge to the generalized extreme value (GEV-Frechet) and Gaussian G_1 distribution with rate O(N^1/2), respectively. To validate the algorithm, we introduced a set of unseen images to the algorithm. Misclassification rate of approximately 1%-6%, depending on the database, was obtained, which is superior to the reported values of 5%-45% in previous research studies.
 
Discriminating Bipolar Disorder From Major Depression Based On Kernel SVM Using Functional Independent Components
Shuang Gao, Elizabeth A. Osuch, Michael Wammes, Jean Théberge, Tian-Zi Jiang, Vince D. Calhoun, Jing Sui
×

Bipolar disorder (BD) and major depressive disorder (MDD) both share depressive symptoms, so how to discriminate them in early depressive episodes is a major clinical challenge. Independent components (ICs) extracted from fMRI data have been proved to carry distinguishing information and can be used for classification. Here we extend a previous method that makes use of multiple fMRI ICs to build linear subspaces for each individual, which is further used as input for classifiers. The similarity matrix between different subjects is first calculated using distance metric of principal angle, which is then projected into kernel space for support vector machine (SVM) classification among 37 BDs and 36 MDDs. In practice, we adopt forward selection technique on 20 ICs and nested 10-fold cross validation to select the most discriminative IC combinations of fMRI and determine the final diagnosis by majority voting mechanism. The results on human data demonstrate that the proposed method achieves much better performance than its initial version [8] (93% vs. 75%), and identifies 5 discriminative fMRI components for distinguishing BD and MDD patients, which are mainly located in prefrontal cortex, default mode network and thalamus etc. This work provides a new framework for helping diagnose the new patients with overlapped symptoms between BD and MDD, which not only adds to our understanding of functional deficits in mood disorders, but also may serve as potential biomarkers for their differential diagnosis.


Wednesday September 27, 2017
8:30-12:00Foyer, ground floorRegistration
9:00-10:00Iwasaki Koyata Memorial Hall, ground floorKeynote Lecture: On Bayesian Deep Learning and Deep Bayesian Learning

Yee Whye Teh
University of Oxford and DeepMind, United Kingdom

Chair: Jen-Tzung Chien, National Chiao Tung University, Taiwan

10:00-10:30Room 3, ground floorCoffee Break
10:30-12:30Iwasaki Koyata Memorial Hall, ground floorLecture Session 3: Deep learning and pattern recognition
Chair: David J. Miller, Pennsylvania State University, USA

10:30
×

Convolutional neural networks (CNNs), in which several convolutional layers extract feature patterns from an input image, are one of the most popular network architectures used for image classification. The convolutional computation, however, requires a high computational cost, resulting in an increased power consumption and processing time. In this paper, we propose a novel algorithm that substitutes a single layer for a pair formed by a convolutional layer and the following average-pooling layer. The key idea of the proposed scheme is to compute the output of the pair of original layers without the computation of convolution. To achieve this end, our algorithm generates summed area tables (SATs) of input images first and directly computes the output values from the SATs. We implemented our algorithm for forward propagation and backward propagation to evaluate the performance. Our experimental results showed that our algorithm achieved 17.1 times faster performance than the original algorithm for the same parameter used in ResNet-34.
 
10:50
×

Cross-resolution face recognition tackles the problem of matching face images with different resolutions. Although state-of-the-art convolutional neural network (CNN) based methods have reported promising performances on standard face recognition problems, such models cannot sufficiently describe images with resolution different from those seen during training, and thus cannot solve the above task accordingly. In this paper, we propose Guided Convolutional Neural Network (Guided-CNN), which is a novel CNN architecture with parallel sub-CNN models as guide and learners. Unique loss functions are introduced, which would serve as joint supervision for images within and across resolutions. Our experiments not only verify the use of our model for cross-resolution recognition, but also its applicability of recognizing face images with different degrees of occlusion.
 
11:10
Limiting The Reconstruction Capability Of Generative Neural Network Using Negative Learning
Asim Munawar, Phongtharin Vinayavekhin, Giovanni De Magistris
×

Generative models are widely used for unsupervised learning with various applications, including data compression and signal restoration. Training methods for such systems focus on the generality of the network given limited amount of training data. A less researched type of techniques concerns generation of only a single type of input. This is useful for applications such as constraint handling, noise reduction and anomaly detection. In this paper we present a technique to limit the generative capability of the network using negative learning. The proposed method searches the solution in the gradient direction for the desired input and in the opposite direction for the undesired input. One of the application can be anomaly detection where the undesired inputs are the anomalous data. We demonstrate the features of the algorithm using MNIST handwritten digit dataset and latter apply the technique to a real-world obstacle detection problem. The results clearly show that the proposed learning technique can significantly improve the performance for anomaly detection.
 
11:30
A Layer-Block-Wise Pipeline For Memory And Bandwidth Reduction In Distributed Deep Learning
Haruki Mori, Tetsuya Youkawa, Shintaro Izumi, Masahiko Yoshimoto, Hiroshi Kawaguchi, Atsuki Inoue
×

This paper describes a pipelined stochastic gradient descent (SGD) algorithm and its hardware architecture with a memory distributed structure. In the proposed architecture, a pipeline stage takes charge of multiple layers: a ”layer block.” The layer-block-wise pipeline has much less weight parameters for network training than conventional multithreading because weight memory is distributed to workers assigned to pipeline stages. The memory capacity of 2.25 GB for the four-stage proposed pipeline is about half of the 3.82 GB for multithreading when a batch size is 32 in VGG-F. Unlike multithreaded data parallelism, no parameter server for weight update or shared I/O data bus is necessary. Therefore, the memory bandwidth is drastically reduced. The proposed four-stage pipeline only needs memory bandwidths of 36.3 MB and ”7.0 MB per batch, respectively, for forward propagation and backpropagation processes, whereas four-thread multithreading requires a bandwidth of 974 MB overall for send and receive processes to unify its weight parameters. At the parallelization degree of four, the proposed pipeline maintains training convergence by a factor of ”.”2, compared with the conventional multithreaded architecture although the memory capacity and the memory bandwidth are decreased.
 
11:50
Text To Image Generative Model Using Constrained Embedding Space Mapping
Subhajit Chaudhury, Sakyasingha Dasgupta, Asim Munawar, Md. Anwarus Salam Khan, Ryuki Tachibana
×

We present a conditional generative method that maps low-dimensional embeddings of image and natural language to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. Based on this, we present a method to learn the conditional probability distribution of the two embedding spaces; first, by mapping them to a shared latent space and generating back the individual embeddings from this common space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick - wherein, the single shared latent space is replaced by two separate latent spaces. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the Euclidean distance between their distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors, thereby enabling the generation of specific colored images from the respective text data.
 
12:10
Deep Divergence-Based Clustering
Michael Kampffmeyer, Sigurd Løkse, Filippo Maria Bianchi, Lorenzo Livi, Arnt-Børre Salberg, Robert Jenssen
×

A promising direction in deep learning research is to learn representations and simultaneously discover cluster structure in unlabeled data by optimizing a discriminative loss function. Contrary to supervised deep learning, this line of research is in its infancy and the design and optimization of a suitable loss function with the aim of training deep neural networks for clustering is still an open challenge. In this paper, we propose to leverage the discriminative power of information theoretic divergence measures, which have experienced success in traditional clustering, to develop a new deep clustering network. Our proposed loss function incorporates explicitly the geometry of the output space, and facilitates fully unsupervised training end-to-end. Experiments on real datasets show that the proposed algorithm achieves competitive performance with respect to other state-of-the-art methods.
12:30-13:45Kabayama/Matsumoto Room, ground floorLunch
12:30-13:45Room 4, ground floorMLSP Technical Committee Lunch
13:45-14:00Iwasaki Koyata Memorial Hall, ground floorSupporter presentation: NEC Corporation
14:00-16:00Iwasaki Koyata Memorial Hall, ground floorLecture Session 4: Special session on machine learning for computational imaging
Chair: Brendt Wohlberg, Los Alamos National Laboratory, USA and Jong Chul Ye, KAIST, South Korea

14:00
×

Recent research in computed tomographic imaging has focused on developing techniques that enable reduction of the X-ray radiation dose without loss of quality of the reconstructed images or volumes. While penalized weighted-least squares (PWLS) approaches have been popular for CT image reconstruction, their performance degrades for very low dose levels due to the inaccuracy of the underlying WLS statistical model. We propose a new formulation for low-dose CT image reconstruction based on a shifted-Poisson model based likelihood function and a data-adaptive regularizer using the sparsifying transform model for images. The sparsifying transform is pre-learned from a dataset of patches extracted from CT images. The nonconvex cost function of the proposed penalized-likelihood reconstruction with sparsifying transforms regularizer (PL-ST) is optimized by alternating between a sparse coding step and an image update step. The image update step deploys a series of convex quadratic majorizers that are optimized using a relaxed linearized augmented Lagrangian method with ordered-subsets, reducing the number of (expensive) forward and backward projection operations. Numerical experiments show that for low dose levels, the proposed data-driven PL-ST approach outperforms prior methods employing a nonadaptive edge-preserving regularizer. PL-ST also outperforms prior PWLS-ST approach at very low X-ray doses.
 
14:20
Scene-Adapted Plug-And-Play Algorithm With Convergence Guarantees
Afonso Teodoro, José Bioucas-Dias, Mário Figueiredo
×

Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the convergence of the resulting algorithm may be difficult to analyse. In this paper, we plug a state-of-the-art denoiser, based on a Gaussian mixture model, in the iterations of an alternating direction method of multipliers and prove the algorithm is guaranteed to converge. Moreover, we build upon the concept of scene-adapted priors where we learn a model targeted to a specific scene being imaged, and apply the proposed method to address the hyperspectral sharpening problem.
 
14:40
Cover Tree Compressed Sensing For Fast MR Fingerprint Recovery
Mohammad Golbabaee, Zhouye Chen, Yves Wiaux, Mike Davies
×

We adopt a data structure in the form of cover trees and iteratively apply approximate nearest neighbour (ANN) searches for fast compressed sensing reconstruction of signals living on discrete smooth manifolds. Leveraging on the recent stability results for the inexact Iterative Projected Gradient (IPG) algorithm and by using the cover tree”s ANN searches, we decrease the projection cost of the IPG algorithm to be logarithmically growing with data population for low dimensional smooth manifolds. We apply our results to quantitative MRI compressed sensing and in particular within the Magnetic Resonance Fingerprinting (MRF) framework. For a similar (or sometimes better) reconstruction accuracy, we report 2-3 orders of magnitude reduction in computations compared to the standard iterative method, which uses brute-force searches.
 
15:00
×

We present a new deep learning-based approach for dense stereo matching. Compared to previous works, our approach does not use deep learning of pixel appearance descriptors, employing very fast classical matching scores instead. At the same time, our approach uses a deep convolutional network to predict the local parameters of cost volume aggregation process, which in this paper we implement using differentiable domain transform. By treating such transform as a recurrent neural network, we are able to train our whole system that includes cost volume computation, cost-volume aggregation (smoothing), and winner-takes-all disparity selection end-to-end. The resulting method is highly efficient at test time, while achieving good matching accuracy. On the KITTI 2012 and KITTI 2015 benchmark, it achieves a result of 5.08% and 6.34% error rate respectively while running at 29 frames per second rate on a modern GPU.
 
15:20
Noise Reduction In Low-Dose CT Using A 3D Multiscale Sparse Denoising Autoencoder
Katrin Mentl, Boris Mailhé, Florin C. Ghesu, Frank Schebesch, Tino Haderlein, Andreas Maier, Mariappan S. Nadar
×

This article presents a novel neural network-based approach for enhancement of 3D medical image data. The proposed networks learn a sparse representation basis by mapping the corrupted input data to corresponding optimal targets. To reinforce the adjustment of the network to the given data, the threshold values are also adaptively learned. In order to capture important image features on various scales and be able to process large computed tomography (CT) volumes in a reasonable time, a multiscale approach is applied. Recursively downsampled versions of the input are used and denoising operator of constant size are learnt at each scale. The networks are trained end-to-end from a database of real high-dose acquisitions with synthetic additional noise to simulate the corresponding low-dose scans. Both 2D and 3D networks are evaluated on CT volumes and compared to BM3D. The presented methods achieve an increase of 4% to ”% in the SSIM and of 2.4 to 2.8 dB in the PSNR with respect to the ground truth, outperform BM3D in quantitative comparisions and present no visible texture artifacts. By exploiting volumetric information, 3D networks achieve superior results over 2D networks.
 
15:40
×

Sequential dictionary learning algorithms have been successfully applied to a number of image processing problems. In a number of these problems however, the data used for dictionary learning are structured matrices with notions of smoothness in the column direction. This prior information which can be traduced as a smoothness constraint on the learned dictionary atoms has not been included in existing dictionary learning algorithms. In this paper, we remedy to this situation by proposing a regularized sequential dictionary learning algorithm. The proposed algorithm differs from the existing ones in their dictionary update stage. The proposed algorithm generates smooth dictionary atoms via the solution of a regularized rank-one matrix approximation problem where regularization is introduced via penalization in the dictionary update stage. Experimental results on synthetic and real data illustrating the performance of the proposed algorithm are provided.
16:00-16:30Lecture Hall, 2nd floorCoffee Break
16:30-18:30Lecture Hall, 2nd floor
Poster Session 3: Deep learning and pattern recognition
Chair: Jinsub Kim, Oregon State University, USA

×

Scattering Transforms (or ScatterNets) introduced by Mallat in 20”2 are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) - the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-”0 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.
 
Deep Convolutional Neural Networks For Interpretable Analysis Of EEG Sleep Stage Scoring
Albert Vilamala, Kristoffer Hougaard Madsen, Lars Kai Hansen
×

Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out over the last years. In this work, we resort to multitaper spectral analysis to create visually interpretable images of sleep patterns from EEG signals as inputs to a deep convolutional network trained to solve visual recognition tasks. As a working example of transfer learning, a system able to accurately classify sleep stages in new unseen patients is presented. Evaluations in a widely-used publicly available dataset favourably compare to state-of-the-art results, while providing a framework for visual interpretation of outcomes.
 
Adversarial Domain Separation And Adaptation
Jen-Chieh Tsai, Jen-Tzung Chien
×

Traditional domain adaptation methods attempted to learn the shared representation for distribution matching between source domain and target domain where the individual information in both domains was not characterized. Such a solution suffers from the mixing problem of individual information with the shared features which considerably constrains the performance for domain adaptation. To relax this constraint, it is crucial to extract both shared information and individual information. This study captures both information via a new domain separation network where the shared features are extracted and purified via separate modeling of individual information in both domains. In particular, a hybrid adversarial learning is incorporated in a separation network as well as an adaptation network where the associated discriminators are jointly trained for domain separation and adaptation according to the minmax optimization over separation loss and domain discrepancy, respectively. Experiments on different tasks show the merit of using the proposed adversarial domain separation and adaptation.
 
×

Due to the high spectral resolution and the similarity of some spectrums between different classes, hyperspectral image classification turns out to be an important but challenging task. Researches show the powerful ability of deep learning for hyperspectral image classification. However, the lack of training samples makes it difficult to extract discriminative features and achieve performance as expected. To solve the problem, a multi-scale CNN which can extract multi-scale features is designed for hyperspectral image classification. Furthermore, D-DSML, a diversified metric, is proposed to further improve the representational ability of deep methods. In this paper, a D-DSML-MSCNN method, which jointly learns deep multi-scale features and diversified metrics for hyperspectral image classification, is proposed to take both advantages of D-DSML and MSCNN. Experiments are conducted on Pavia University data to show the effectiveness of our method for hyperspectral image classification. The results show the advantage of our method when compared with other recent results.
 
Domain-Adaptive Generative Adversarial Networks For Sketch-To-Photo Inversion
Yen-Cheng Liu, Wei-Chen Chiu, Sheng-De Wang, Yu-Chiang Frank Wang
×

Generating photo-realistic images from multiple style sketches is one of challenging tasks in image synthesis with important applications such as facial composite for suspects. While machine learning techniques have been applied for solving this problem, the requirement of collecting sketch and face photo image pairs would limit the use of the learned model for rendering sketches of different styles. In this paper, we propose a novel deep learning model of Domain-adaptive Generative Adversarial Networks (DA-GAN). The design of DA-GAN performs cross-style sketch-to-photo inversion, which mitigates the difference across input sketch styles without the need to collect a large number of sketch and face image pairs for training purposes. In experiments, we show that our method is able to produce satisfactory results as well as performing favorably against state-of-the-art approaches.
 
×

The interpretability of prediction mechanisms with respect to the underlying prediction problem is often unclear. While several studies have focused on developing prediction models with meaningful parameters, the causal relationships between the predictors and the actual prediction have not been considered. Here, we connect the underlying causal structure of a data generation process and the causal structure of a prediction mechanism. To achieve this, we propose a framework that identifies the feature with the greatest causal influence on the prediction and estimates the necessary causal intervention of a feature such that a desired prediction is obtained. The general concept of the framework has no restrictions regarding data linearity; however, we focus on an implementation for linear data here. The framework applicability is evaluated using artificial data and demonstrated using real-world data.
 
×

We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated training instances is considered, wherein latent label variables for subsequently collected training instances form a first-order Markov chain. A convex optimization approach and expectation-maximization algorithms are presented to train classifiers. The efficacy of the proposed approach is validated using the experiments with the iris data and the MNIST handwritten digit data.
 
×

Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabet uniquely specifies the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. Previous work attemps to minimize a clustering objective function that measures discrepancy between a set of reconstruction values and the points from the time series. Unfortunately, the resulting algorithm is non-convergent, with no guarantee of finding even locally optimal solutions. The problem is a heuristic ”nearest neighbor” symbol assignment step. Alternatively, we introduce a new, locally optimal algorithm. We apply iterative ”nearest neighbor” symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the time series is achieved. While some approaches use vector quantization to partition the state space, our approach only ensures a partition in the space consisting of the entire time series (effectively, clustering in an infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. We demonstrate improvement, with respect to several measures, over a popular method used in the literature.
 
×

Timeseries sensor data processing is indispensable for system monitoring. Working with autonomous vehicles requires mechanisms that provide insightful information about the status of a mission. In a setting where time and resources are limited, trajectory classification plays a vital role in mission monitoring and failure detection. In this context, we use navigational data to interpret trajectory patterns and classify them. We implement Long Short-Term Memory (LSTM) based Recursive Neural Networks (RNN) that learn the most commonly used survey trajectory patterns from surveys executed by two types of Autonomous Underwater Vehicles (AUV). We compare the performance of our network against baseline machine learning methods.
 
×

The paper proposes the ScatterNet Hybrid Deep Learning (SHDL) network that extracts invariant and discriminative image representations for object recognition. SHDL framework is constructed with a multi-layer ScatterNet front-end, an unsupervised learning middle, and a supervised learning back-end module. Each layer of the SHDL network is automatically designed as an explicit optimization problem leading to an optimal deep learning architecture with improved computational performance as compared to the more usual deep network architectures. SHDL network produces the state-of-the-art classification performance against unsupervised and semi-supervised learning (GANs) on two image datasets. Advantages of the SHDL network over supervised methods (NIN, VGG) are also demonstrated with experiments performed on training datasets of reduced size.
 
Improving Image Classification With Frequency Domain Layers For Feature Extraction
José Augusto Stuchi, Marcus A. Angeloni, Rodrigo De Freitas Pereira, Levy Boccato, Guilherme Folego, Paulo Victor De Souza Prado, Romis Attux
×

Machine learning has been increasingly used in current days. Great improvements, especially in deep neural networks, helped to boost the achievable performance in computer vision and signal processing applications. Although different techniques were applied for deep architectures, the frequency domain has not been thoroughly explored in this field. In this context, this paper presents a new method for extracting discriminative features according to the Fourier analysis. The proposed frequency extractor layer can be combined with deep architectures in order to improve image classification. Computational experiments were performed on face liveness detection problem, yielding better results than those presented in the literature for the grandtest protocol of Replay-Attack Database. This paper also aims to raise the discussion on how frequency domain layers can be used in deep architectures to further improve the network performance.
 
Object Classification With Convolution Neural Network Based On The Time-Frequency Representation Of Their Echo
Mariia Dmitrieva, Matias Valdenegro-Toro, Keith Brown, Gary Heald, David Lane
×

This paper presents classification of spherical objects with different physical properties. The classification is based on the energy distribution in wideband pulses that have been scattered from objects. The echo is represented in Time-Frequency Domain (TFD), using Short Time Fourier Transform (STFT) with different window lengths, and is fed into a Convolution Neural Network (CNN) for classification. The results for different window lengths are analysed to study the influence of time and frequency resolution in classification. The CNN performs the best results with accuracy of (98.44 +- 0.8)% over 5 object classes trained on grayscale TFD images with 0.1 ms window length of STFT. The CNN is compared with a Multilayer Perceptron classifier, Support Vector Machine, and Gradient Boosting.
 
Compact Kernel Classifiers Trained With Minimum Classification Error Criterion
Ryoma Tani, Hideyuki Watanabe, Shigeru Katagiri, Miho Ohsaki
×

Unlike Support Vector Machine (SVM), Kernel Minimum Classification Error (KMCE) training frees kernels from training samples and jointly optimizes weights and kernel locations. Focusing on this feature of KMCE training, we propose a new method for developing compact (small scale but highly accurate) kernel classifiers by applying KMCE training to support vectors (SVs) that are selected (based on the weight vector norm) from the original SVs produced by the Multi-class SVM (MSVM). We evaluate our proposed methodinfourclassificationtasksandclearlydemonstrateits effectiveness: onlya3%dropinclassificationaccuracy(from 9”.9to89.”%)withjust”0%oftheoriginalSVs. Inaddition, we mathematically reveal that the value of MSVM’s kernel weight indicates the geometric relation between a training sample and margin boundaries.
18:40-21:00Kabayama/Matsumoto Room, ground floorBanquet


Thursday September 28, 2017
8:30-12:00Foyer, ground floorRegistration
9:00-10:00Iwasaki Koyata Memorial Hall, ground floorKeynote Lecture: Cosmology and Fundamental Physics with Big Astronomical Data

Naoki Yoshida, University of Tokyo, Japan

Chair: Naonori Ueda, NTT Communication Science Laboratories, Japan

10:00-10:30Room 3, ground floorCoffee Break
10:30-12:30Iwasaki Koyata Memorial Hall, ground floorLecture Session 5: Special session on deep learning for speech enhancement
Chair: Jun Du, University of Science and Technology of China, China

10:30
×

In this paper we propose to use utterance-level Permutation Invariant Training (uPIT) for speaker independent multi-talker speech separation and denoising, simultaneously. Specifically, we train deep bi-directional Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) using uPIT, for single-channel speaker independent multi-talker speech separation in multiple noisy conditions, including both synthetic and real-life noise signals. We focus our experiments on generalizability and noise robustness of models that rely on various types of a priori knowledge e.g. in terms of noise type and number of simultaneous speakers. We show that deep bi-directional LSTM RNNs trained using uPIT in noisy environments can improve the Signal-to-Distortion Ratio (SDR) as well as the Extended Short-Time Objective Intelligibility (ESTOI) measure, on the speaker independent multi-talker speech separation and denoising task, for various noise types and Signal-to-Noise Ratios (SNRs). Specifically, we first show that LSTM RNNs can achieve large SDR and ESTOI improvements, when evaluated using known noise types, and that a single model is capable of handling multiple noise types with only a slight decrease in performance. Furthermore, we show that a single LSTM RNN can handle both two-speaker and three-speaker noisy mixtures, without a priori knowledge about the exact number of speakers. Finally, we show that LSTM RNNs trained using uPIT generalize well to noise types not seen during training.
 
10:50
Semi-Blind Speech Enhancement Based On Recurrent Neural Network For Source Separation And Dereverberation
Masaya Wake, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
×

This paper describes a semi-blind speech enhancement method using a semi-blind recurrent neural network (SB-RNN) for human-robot speech interaction. When a robot interacts with a human using speech signals, the robot inputs not only audio signals recorded by its own microphone but also speech signals made by the robot itself, which can be used for semi-blind speech enhancement. The SB-RNN consists of cascaded two modules: a semi-blind source separation module and a blind dereverberation module. Each module has a recurrent layer to capture the temporal correlations of speech signals. The SB-RNN is trained in a manner of multi-task learning, i.e., isolated echoic speech signals are used as teacher signals for the output of the separation module in addition to isolated unechoic signals for the output of the dereverberation module. Experimental results showed that the source to distortion ratio was improved by 2.30 dB on average compared to a conventional method based on a semi-blind independent component analysis. The results also showed the effectiveness of modularization of the network, multi-task learning, the recurrent structure, and semi-blind source separation.
 
11:10
×

Recurrent neural network (RNN) based on long short-term memory (LSTM) has been successfully developed for single-channel source separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The performance of source separation is constrained due to the limitation of internal memory which could not sufficiently preserve long-term characteristics from different sources. This study deals with this limitation by incorporating an external memory in RNN and accordingly presents a memory augmented neural network for source separation. In particular, we carry out a neural Turing machine to learn a separation model for sequential signals of speech and noise in presence of different speakers and noise types. Experiments show that speech enhancement based on memory augmented neural network consistently outperforms that using deep neural network and LSTM in terms of short-term objective intelligibility measure.
 
11:30
×

Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multi-channel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-to-distortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.
 
11:50
×

We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating diverse noisy speech samples for training. It is shown that an arbitrary noise signal segment can be well represented by a linear combination of microstructure noise bases. Accordingly, we propose to generate these mixing noise signals by designing a set of compact and analytic noise bases without using any realistic noise types. The experiments demonstrate that this noise generation scheme can yield comparable performance to that using 50 real noise types. Furthermore, by supplementing the collected noise types with the synthesized noise bases, we observe remarkable performance improvements implying that not only a large collection of real-world noise signals can be alleviated, but also a good noise generalization capability can be achieved.
 
12:10
×

This paper aims to address two issues existing in the current speech enhancement methods: ”) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously. To solve the first problem, we propose a novel convolutional neural network (CNN) model for complex spectrogram enhancement, namely esti-mating clean real and imaginary (RI) spectrograms from noisy ones. The reconstructed RI spectrograms are directly used to synthesize enhanced speech waveforms. In addition, since log-power spectrogram (LPS) can be represented as a function of RI spectrograms, its reconstruction is also con-sidered as another target. Thus a unified objective function, which combines these two targets (reconstruction of RI spectrograms and LPS), is equivalent to simultaneously optimizing two commonly used objective metrics: segmental signal-to-noise ratio (SSNR) and log-spectral distortion (LSD). Therefore, the learning process is called multi-metrics learning (MML). Experimental results confirm the effectiveness of the proposed CNN with RI spectrograms and MML in terms of improved standardized evaluation metrics on a speech enhancement task.
12:30-13:45Kabayama/Matsumoto Room, ground floorLunch
13:45-14:00Iwasaki Koyata Memorial Hall, ground floorSupporter presentation: Bridgestone Corporation
14:00-16:00Iwasaki Koyata Memorial Hall, ground floorLecture Session 6: Special session on new extensions and applications of non-negative audio modeling
Chair: Hirokazu Kameoka, NTT Communication Science Laboratories, Japan and Alexey Ozerov, Technicolor, France

14:00
×

While time-frequency masking is a powerful approach for speech enhancement in terms of signal recovery accuracy (e.g., signal-to-noise ratio), it can over-suppress and damage speech components, leading to limited performance of succeeding speech processing systems. To overcome this shortcoming, this paper proposes a method to restore missing components of time-frequency masked speech spectrograms based on direct estimation of a time domain signal. The proposed method allows us to take account of the local interdependencies of the elements of the complex spectrogram derived from the redundancy of a time-frequency representation as well as the global structure of the magnitude spectrogram. The effectiveness of the proposed method is demonstrated through experimental evaluation, using spectrograms filtered with masks to enhance of noisy speech. Experimental results show that the proposed method significantly outperformed conventional methods, and has the potential to estimate both phase and magnitude spectra simultaneously and precisely.
 
14:20
×

This paper presents an accelerated version of positive semidefinite tensor factorization (PSDTF) for blind source separation. PSDTF works better than nonnegative matrix factorization (NMF) by dropping the arguable assumption that audio signals can be whitened in the frequency domain by using short-term Fourier transform (STFT). Indeed, this assumption only holds true in an ideal situation where each frame is infinitely long and the target signal is completely stationary in each frame. PSDTF thus deals with full covariance matrices over frequency bins instead of forcing them to be diagonal as in NMF. Although PSDTF significantly outperforms NMF in terms of separation performance, it suffers from a heavy computational cost due to the repeated inversion of big covariance matrices. To solve this problem, we propose an intermediate model based on diagonal plus low-rank covariance matrices and derive the expectation-maximization (EM) algorithm for efficiently updating the parameters of PSDTF. Experimental results showed that our method can dramatically reduce the complexity of PSDTF by several orders of magnitude without a significant decrease in separation performance.
 
14:40
Independent Low-Rank Matrix Analysis Based On Complex Student's T-Distribution For Blind Source Separation
Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono
×

In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we introduce an isotropic complex Student’s t-distribution as a source generative model, which includes the isotropic complex Gaussian distribution used in conventional ILRMA. Experiments are conducted using both music and speech BSS tasks, and the results show the validity of the proposed method.
 
15:00
Neural Network Alternatives To Convolutive Audio Models For Source Separation
Shrikant Venkataramani, Cem Subakan, Paris Smaragdis
×

Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSS_eval metrics.
 
15:20
×

This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
 
15:40
×

We consider example-guided audio source separation approaches, where the audio mixture to be separated is supplied with source examples that are assumed matching the sources in the mixture both in frequency and time. These approaches were successfully applied to the tasks such as source separation by humming, score-informed music source separation, and music source separation guided by covers. Most of proposed methods are based on nonnegative matrix factorization (NMF) and its variants, including methods using NMF models pre-trained from examples as an initialization of mixture NMF decomposition, methods using those models as hyperparameters of priors of mixture NMF decomposition, and methods using coupled NMF models. Moreover, those methods differ by the choice of the NMF divergence and the NMF prior. However, there is no systematic comparison of all these methods. In this work, we compare existing methods and some new variants on the score-informed and cover-guided source separation tasks.
16:00-16:30Lecture Hall, 2nd floorCoffee Break
16:30-18:30Lecture Hall, 2nd floor
Poster Session 4: Applications of machine learning in speech, audio and music processing
Chair: Shinji Watanabe, Johns Hopkins University, USA

Inferring Room Semantics Using Acoustic Monitoring
Muhammad Ahmed Shah, Bhiksha Raj, Khaled Harras
×

Abstract Having knowledge of the environmental context of the user i.e. the knowledge of the users” indoor location and the semantics of their environment, can facilitate the development of a host of location-aware applications. In this paper we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space over time, using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings we update our confidence in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms on three university campuses. In our evaluations, the confidence for the true label generally outstripped the confidence for all other labels and in some cases converged to ”00% with less than 30 samples.
 
×

This paper presents an automatic approach for parameter training for a sparsity-based pitch estimation method that has been previously published. For this pitch estimation method, the harmonic dictionary is a key parameter that needs to be carefully prepared beforehand. In the original method, extensive human supervision and involvement are required to construct and label the dictionary. In this study, we propose to employ dictionary learning algorithms to learn the dictionary directly from training data. We apply and compare 3 typical dictionary learning algorithms, i.e., the method of optimized directions (MOD), K-SVD and online dictionary learning (ODL), and propose a post-processing method to label and adapt a learned dictionary for pitch estimation. Results show that MOD and properly initialized ODL (pi-ODL) can lead to dictionaries that exhibit the desired harmonic structures for pitch estimation, and the postprocessing method can significantly improve performance of the learned dictionaries in pitch estimation. The dictionary obtained with pi-ODL and post-processing attained pitch estimation accuracy close to the optimal performance of the manual dictionary. It is positively shown that dictionary learning is feasible and promising for this application.
 
Local Gaussian Model With Source-Set Constraints In Audio Source Separation
Rintaro Ikeshita, Masahito Togami, Yohei Kawaguchi, Yusuke Fujita, Kenji Nagamatsu
×

To improve the performance of blind audio source separation of convolutive mixtures, the local Gaussian model (LGM) having full rank covariance matrices proposed by Duong et al. is extended. The previous model basically assumes that all sources contribute to each time-frequency slot, which may fail to capture the characteristic of signals with many intermittent silent periods. A constraint on source sets that contribute to each time-frequency slot is therefore explicitly introduced. This approach can be regarded as a relaxation of the sparsity constraint in the conventional time-frequency mask. The proposed model is jointly optimized among the original local Gaussian model parameters, the relaxed version of the time-frequency mask, and a permutation alignment, leading to a robust permutation-free algorithm. We also present a novel multi-channel Wiener filter weighted by a relaxed version of the time-frequency mask. Experimental results over noisy speech signals show that the proposed model is effective compared with the original local Gaussian model and is comparable to its extension, the multi-channel nonnegative matrix factorization.
 
Infinite Probabilistic Latent Component Analysis For Audio Source Separation
Kazuyoshi Yoshii, Eita Nakamura, Katsutoshi Itoyama, Masataka Goto
×

This paper presents a statistical method of audio source separation based on a nonparametric Bayesian extension of probabilistic latent component analysis (PLCA). A major approach to audio source separation is to use nonnegative matrix factorization (NMF) that approximates the magnitude spectrum of a mixture signal at each frame as the weighted sum of fewer source spectra. Another approach is to use PLCA that regards the magnitude spectrogram as a two-dimensional histogram of ``sound quanta” and classifies each quantum into one of sources. While NMF has a physically-natural interpretation, PLCA has been used successfully for music signal analysis. To enable PLCA to estimate the number of sources, we propose Dirichlet process PLCA (DP-PLCA) and derive two kinds of learning methods based on variational Bayes and collapsed Gibbs sampling. Unlike existing learning methods for nonparametric Bayesian NMF based on the beta or gamma processes (BP-NMF and GaP-NMF), our sampling method can efficiently search for the optimal number of sources without truncating the number of sources to be considered. Experimental results showed that DP-PLCA is superior to GaP-NMF in terms of source number estimation.
 
Mutual Singular Spectrum Analysis For Bioacoustics Classification
Bernardo Bentes Gatto, Juan Gabriel Colonna, Eulanda Miranda Dos Santos, Eduardo Freire Nakamura
×

Bioacoustics signals classification is an important instrument used in environmental monitoring as it gives the means to efficiently acquire information from the areas, which most of the time are unfeasible to approach. To address these challenges, bioacoustics signals classification systems should meet some requirements, such as low computational resources capabilities. In this paper, we propose a novel bioacoustics signals classification method where no preprocessing techniques are involved and which is able to match sets of signals. The advantages of our proposed method include: a novel and compact representation for bioacoustics signals, which is independent of the signals length. In addition, no preprocessing is required, such as segmentation, noise reduction or syllable extraction. We show that our method is theoretically and practically attractive through experimental results employing a publicity available bioacoustics signal dataset.
 
×

The non-negative matrix factorization (NMF) approach has shown to work reasonably well for monaural speech enhancement tasks. This paper proposes addressing two shortcomings of the original NMF approach: (”) the objective functions for the basis training and separation (Wiener filtering) are inconsistent (the basis spectra are not trained so that the separated signal becomes optimal); (2) minimizing spectral divergence measures does not necessarily lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. To address the first shortcoming, we have previously proposed an algorithm for Discriminative NMF (DNMF), which optimizes the same objective for basis training and separation. To address the second shortcoming, we have previously introduced novel frameworks called the cepstral distance regularized NMF (CDRNMF) and mel-generalized cepstral distance regularized NMF (MGCRNMF), which aim to enhance speech both in the spectral domain and feature domain. This paper proposes combining the goals of DNMF and MGCRNMF by incorporating the MGC regularizer into the DNMF objective function and proposes an algorithm for parameter estimation. The experimental results revealed that the proposed method outperformed the baseline approaches.
 
×

Recently, theminimummeansquarederror(MMSE)hasbeen a benchmark of optimization criterion for deep neural net- work(DNN)basedspeechenhancement. Inthisstudy, aprob- abilistic learning framework to estimate the DNN parameter- s for single-channel speech enhancement is proposed. First, the statistical analysis shows that the prediction error vector at the DNN output well follows a unimodal density for each log- power spectral component. Accordingly, we present a max- imum likelihood (ML) approach to DNN parameter learning by charactering the prediction error vector as a multivariate Gaussiandensitywithazeromeanvectorandanunknownco- variance matrix. It is demonstrated that the proposed learning approach can achieve a better generalization capability than MMSE-based DNN learning for unseen noise types, which can significantly reduce the speech distortions in low SNR environments.
 
Learning Embeddings For Speaker Clustering Based On Voice Equality
Yanick Lukic, Carlo Vogt, Oliver Dürr, Thilo Stadelmann
×

Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices - namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just $100$ instead of $590$ unrelated speakers to learn an embedding suited for clustering.
 
×

We aim to reduce the cost of sound monitoring for maintain machinery by reducing the sampling rate, i.e., sub-Nyquist sampling. Monitoring based on sub-Nyquist sampling requires two sub-systems: a sub-system on-site for sampling machinery sounds at a low rate and a sub-system off-site for detecting anomalies from the subsampled signal. This paper proposes a method for achieving both subsystems. First, the proposed method uses non-uniform sampling to encode higher than the Nyquist frequency. Second, the method applies a long short-term memory-(LSTM)-based autoencoder network for detecting anomalies. The novelty of the proposed network is that the subsampled time-domain signal is demultiplexed and received as input in an end-to-end manner, enabling anomaly detection from the subsampled signal. Experimental results indicate that our method is suitable for anomaly detection from the subsampled signal.
 
Renyi Entropy Based Mutual Information For Semi-Supervised Bird Vocalization Segmentation
Anshul Thakur, Vinayak Abrol, Pulkit Sharma, Padmanabhan Rajan
×

In this paper we describe a semi-supervised algorithm to segment bird vocalizations using matrix factorization and R”enyi entropy based mutual information. Singular value decomposition (SVD) is applied on pooled time-frequency representations of bird vocalizations to learn basis vectors. By utilizing only a few of the bases, a compact feature representation is obtained for input test data. R”enyi entropy based mutual information is calculated between feature representations of consecutive frames. After some simple post-processing, a threshold is used to reliably distinguish bird vocalizations from other sounds. The algorithm is evaluated on the field recordings of different bird species and different SNR conditions. The results highlight the effectiveness of the proposed method in all SNR conditions, improvements over other methods, and its generality.
 
A Recurrent Encoder-Decoder Approach With Skip-Filtering Connections For Monaural Singing Voice Separation
Stylianos Ioannis Mimilakis, Konstantinos Drossos, Tuomas Virtanen, Gerald Schuller
×

The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral representations are then used to derive time-frequency masks. In this work we introduce a method to directly learn time-frequency masks from an observed mixture magnitude spectrum. We employ recurrent neural networks and train them using prior knowledge only for the magnitude spectrum of the target source. To assess the performance of the proposed method, we focus on the task of singing voice separation. The results from an objective evaluation show that our proposed method provides comparable results to deep learning based methods which operate over complicated signal representations. Compared to previous methods that approximate time-frequency masks, our method has increased performance of signal to distortion ratio by an average of 3.8 dB.
 
Speech Recognition Features Based On Deep Latent Gaussian Models
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
×

This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.
18:30-19:00Iwasaki Koyata Memorial Hall, ground floorEnd of the conference
Powered by CONWIZ, © Copyright 2017 | Privacy Policy and Terms of Use | Page maintained by Jan Larsen