Andrew Errity

School Of Computing, Dublin City University.
Home

Research

Publications

Undergrad Work

Links

Contact

Bibliography

[1] A. Errity and J. McKenna. A comparison of linear and nonlinear dimensionality reduction methods applied to synthetic speech. In Proc. of Interspeech 2009 - Eurospeech, pages 1095-1098, Brighton, UK, September 2009.
[ bib ]
In this study a number of linear and nonlinear dimensionality reduction methods are applied to high dimensional representations of synthetic speech to produce corresponding low dimensional embeddings. Several important characteristics of the synthetic speech, such as formant frequencies and f0, are known and controllable prior to dimensionality reduction. The degree to which these characteristics are retained after dimensionality reduction is examined in visualisation and classification experiments. Results of these experiments indicate that each method is capable of discovering meaningful low dimensional representations of synthetic speech and that the nonlinear methods may outperform linear methods in some cases.
[2] A. Errity and J. McKenna. A comparative study of linear and nonlinear dimensionality reduction for speaker identification. In Proc. of the 15th Int. Conf. on Digital Signal Processing (DSP), pages 587-590, Cardiff, Wales, July 2007.
[ bib | .pdf ]
In this paper we apply linear and nonlinear dimensionality reduction methods to speech produced by a number of different speakers in an effort to yield low dimensional features capable of discriminating between speakers. The classical linear dimensionality reduction method, principal component analysis (PCA), and the nonlinear manifold learning method, Isomap, are investigated. The resulting features are evaluated in GMM-based speaker identification experiments and compared to conventional cepstral features. Isomap is shown to give the highest accuracy for very low dimensions, outperforming MFCCs and PCA transformed features. Isomap is shown to be useful for visualisation of speaker clusters. For higher dimensions, speaker identification results indicate that features resulting from PCA offer improvements over conventional MFCCs.
[3] A. Errity and J. McKenna. An investigation of manifold learning for speech analysis. In Proc. of the Int. Conf. on Spoken Language Processing (Interspeech 2006 - ICSLP), pages 2506-2509, Pittsburgh PA, USA, September 2006.
[ bib | .pdf ]
Due to the physiological constraints of articulatory motion the speech apparatus has limited degrees of freedom. As a result, the range of speech sounds a human is capable of producing may lie on a low dimensional submanifold of the high dimensional space of all possible sounds. In this study a number of manifold learning algorithms are applied to speech data in an effort to extract useful low dimensional structure from the high dimensional speech signal. The ability of these manifold learning algorithms to separate vowels in a low dimensional space is evaluated and compared to a classical linear dimensionality reduction method. Results indicate that manifold learning algorithms outperform classical methods in low dimensions and are capable of discovering useful manifold structure in speech data.
Keywords: ISOMAP, LLE, nonlinear dimensionality reduction, speech dimensionality
[4] A. Errity, J. McKenna, and S. Isard. Unscented Kalman filtering of line spectral frequencies. In Proc. of the Int. Conf. on Spoken Language Processing (Interspeech 2004 - ICSLP), pages 2697-2700, Jeju, Korea, October 2004.
[ bib | .pdf ]
We propose a new method for estimating Line Spectral Frequency (LSF) trajectories that uses unscented Kalman filtering (UKF). This method is based upon an iterative Expectation Maximisation (EM) approach in which LSF estimates are generated during a forward pass and then smoothed during a backward pass. The EM approach also provides re-estimated Kalman filter parameters for further forward-backward passes that improve estimation. This approach exploits the non-independence of neighbouring spectra. We estimate LSFs as they have good interpolation and quantization properties. This allows us to estimate LSF trajectories that are guaranteed to result in stable filters. We analyse noisy synthetic speech using this technique. The results compare favourably with other methods.
[5] Andrew Errity, John McKenna, and Barry Kirkpatrick. Manifold learning-based feature transformation for phone classification. In Mohamed Chetouani, Amir Hussain, Bruno Gas, Maurice Milgram, and Jean-Luc Zarader, editors, Advances in Nonlinear Speech Processing, International Conference on Non-Linear Speech Processing, NOLISP 2007, Paris, France, May 22-25, 2007, Revised Selected Papers, volume 4885 of Lecture Notes in Computer Science, pages 132-141. Springer, 2007.
[ bib | .pdf ]
This paper investigates approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of techniques have been developed in recent years that attempt to discover the geometric structure of the underlying low dimensional manifold. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional feature representations produced by these techniques are applied to several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and PCA, a linear dimensionality reduction method, transformed features. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.
[6] A. Errity, J. McKenna, and B. Kirkpatrick. Manifold learning-based feature transformation for phone classification. In Proc. of the ISCA Tutorial and Research Workshop on Nonlinear Speech Processing (NOLISP), pages 43-46, Paris, France, May 2007.
[ bib | .pdf ]
This paper investigates approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of techniques have been developed in recent years that attempt to discover the geometric structure of the underlying low dimensional manifold. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional feature representations produced by these techniques are applied to several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and PCA, a linear dimensionality reduction method, transformed features. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.
[7] A. Errity, J. McKenna, and B. Kirkpatrick. Dimensionality reduction methods applied to both magnitude and phase derived features. In Proc. of Interspeech 2007 - Eurospeech, pages 1957-1960, Antwerp, Belgium, August 2007.
[ bib ]
A number of previous studies have shown that speech sounds may have an intrinsic low dimensional structure. Such studies have focused on magnitude-based features ignoring phase information, as is the convention in many speech processing applications. In this paper dimensionality reduction methods are applied to MFCC and modified group delay function (MODGDF) features derived from the magnitude and phase spectrum, respectively. The low dimensional structure of these representations is examined and a method to combine these features is detailed. Results show that both magnitude and phase derived features have a low dimensional structure. MFCCs are found to offer higher accuracy than MODGDFs in phone classification tasks. Results indicate that combining MFCCs and MODGDFs gives improvements for phone classification. PCA is shown to be capable of efficiently combining MFCCs and MODGDFs for improved classification accuracy without large increases in feature dimensionality.
Keywords: manifold learning, dimensionality reduction, phase, modified group delay function
[8] B. Kirkpatrick, D. O'Brien, R. Scaife, and A. Errity. Spectral dynamics as a source of discontinuity in concatenative speech synthesis. In Proc. of the 15th Int. Conf. on Digital Signal Processing (DSP), pages 615-618, Cardiff, Wales, July 2007.
[ bib | .pdf ]
The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not accurately reflect human perception of spectral discontinuity in concatenated speech. Previous studies on spectral join costs have focused predominantly on static spectral measures extracted from the unit boundary. In this paper spectral dynamic behaviour is investigated as a source of discontinuity in concatenated speech. A number of measures representing spectral dynamics are tested for the task of detecting discontinuities. The spectral dynamic measures tested contain information correlating with human perception of discontinuities, suggesting that spectral dynamics are a source of discontinuity in concatenated speech. A strategy to effectively combine dynamic and static measures is proposed using principal component analysis (PCA).
[9] B. Kirkpatrick, D. O'Brien, R. Scaife, and A. Errity. On the role of spectral dynamics in unit selection speech synthesis. In Proc. of Interspeech 2007 - Eurospeech, pages 2889-2892, Antwerp, Belgium, August 2007.
[ bib ]

This file has been generated by bibtex2html 1.74