School of Computing RINCE Speech Processing Group Dublin City University


Barry Kirkpatrick

PhD Research Student
Speech Laboratory
RINCE
School of Computing
Dublin City University
Dublin 9
Ireland




Investigation Into New Distance Measures For Concatenative Speech Synthesis

In concatenative speech synthesis the speech waveform is generated by sequentially concatenating pre-recorded speech segments, referred to as acoustic units, which are normally chosen from a large database of such units. The primary problem associated with this form of speech synthesis is in selecting an appropriate sequence of acoustic units that results in natural sounding speech. When acoustic units are concatenated spectral discontinuities occur across the boundary between the units. Such discontinuities give rise to undesirable acoustic artefacts in the synthesised speech waveform reducing the quality of the speech. The process of choosing acoustic units that result in natural sounding speech is referred to as the unit selection problem.

In order to optimise the unit selection process a distance metric is defined to quantify the discontinuity between adjacent units. Units are subsequently selected that minimise this distance measure. To date no distance measure has been developed that is found to be consistently in agreement with human auditory perception of discontinuity in the synthesised speech.

The goal of this project is to develop a new objective distance measure that quantifies the level of discontinuity between adjacent units consistent with the perceived auditory discontinuities of a human observer. This will contribute to advancing automatic high quality speech synthesis and to further understanding of the human auditory system.


Project Supervisors

Dr. Darragh O'Brien, School of Computing, Dublin City University.
Dr. Ronan Scaife, School of Electronic Engineering, Dublin City University.

Publications

  1. A. Errity, J. McKenna, and B. Kirkpatrick. Manifold learning-based feature transformation for phone classification. In Mohamed Chetouani, Amir Hussain, Bruno Gas, Maurice Milgram, and Jean-Luc Zarader, editors, Advances in Nonlinear Speech Processing, volume 4885 of Lecture Notes in Computer Science, pages 132-141. Springer, 2007.

  2. B. Kirkpatrick, D. O'Brien and R. Scaife, "Feature transformation applied to the detection of discontinuities in concatenated speech," in Proc. of the 6th ISCA Workshop on Speech Synthesis (SSW6), Bonn, Germany, August 2007.

  3. B. Kirkpatrick, D. O'Brien, R. Scaife, and A. Errity, "On the role of spectral dynamics in unit selection speech synthesis," in Proc. of Interspeech 2007 - Eurospeech, Antwerp, Belgium, August 2007.

  4. A. Errity, J. McKenna, and B. Kirkpatrick, "Dimensionality reduction methods applied to both magnitude and phase derived features," in Proc. of Interspeech 2007 - Eurospeech, Antwerp, Belgium, August 2007.

  5. B. Kirkpatrick, D. O'Brien, R. Scaife, and A. Errity, "Spectral dynamics as a source of discontinuity in concatenative speech synthesis," in Proc. of the 15th Int. Conf. on Digital Signal Processing (DSP), Cardiff, Wales, July 2007.

  6. A. Errity, J. McKenna, and B. Kirkpatrick, "Manifold learning-based feature transformation for phone classification," in Proc. of the ISCA Tutorial and Research Workshop on Nonlinear Speech Processing (NOLISP), Paris, France, May 2007.

  7. B. Kirkpatrick, D. O'Brien and R. Scaife, "Feature extraction for spectral continuity measures in concatenative speech synthesis," in Proc. International Conference on Spoken Language Processing (ICSLP), Pittsburgh PA, USA, September, 2006. (pdf)

  8. B. Kirkpatrick, D. O'Brien and R. Scaife, "A comparison of spectral continuity measures as a join cost in concatenative speech synthesis," in Proc. of the Irish Signal and Systems Conference (ISSC), Dublin, 2006. (pdf)