Andrew Errity

Graduate Research Student, School Of Computing, Dublin City University.
Home

Research

Publications

Undergrad Work

Links

Contact

Research Summary


Manifold learning for speech analysis

During the production of speech the movements of the articulators (tongue, lips, etc.) are limited by constraints imposed by human physiology. Thus, the speech production apparatus has only limited degrees of freedom. Furthermore, the set of sounds used in all human spoken communication is only a small subset of all producible sounds. These facts motivate the view that it may be possible to adequately represent speech using only a small number of parameters and thus speech has inherent low dimensional structure. A number of dimensionality reduction methods capable of discovering such underlying structure have previously been applied to speech. However speech may lie on a manifold nonlinearly embedded in high dimensional space which classic linear dimensionality reduction methods would be unable to discover. A number of manifold learning, also referred to as nonlinear dimensionality reduction, methods which aim to exploit such nonlinearities have recently been proposed. We apply these manifold learning methods to speech in order to explore and exploit the possible underlying manifold structure.

The manifold learning methods used in our investigations are locally linear embedding, Isomap, and Laplacian Eigenmaps. The classic linear method, principal component analysis (PCA), is also applied to facilitate the comparison of linear and nonlinear methods. We apply these methods to a number of different high dimensional feature representations of speech data; both conventional magnitude spectrum and less widely used phase spectrum derived representations are investigated. The resulting low dimensional representations are analysed through visualisation, phone recognition, and speaker recognition experiments. These recognition experiments are primarily used as a means of evaluating how much meaningful discriminatory information is contained in the low dimensional representations produced by each method. These experiments also serve to display the potential value of these methods in speech processing applications.

We have found manifold learning methods to be capable of producing meaningful low dimensional representations of speech data suggesting speech has low dimensional manifold structure. In general these methods were found to outperform PCA in low dimensions, indicating that speech may indeed lie on a manifold nonlinearly embedded in high dimensional space.


Nonlinear filtering of speech

Nonlinear filtering algorithms allow one to estimate the underlying time-varying state of a system based on measurements of the system nonlinearly related to this state. We have developed an approach using one such filtering algorithm, unscented Kalman filtering, to estimate features (line spectral frequencies) describing the vocal tract directly from the speech signal. Evaluation of this approach on synthetic vowel sounds has shown it to produce smooth and accurate estimates of the underlying feature trajectories with performance comparable to existing methods.