Barry Kirkpatrick
PhD Research Student Speech Laboratory
RINCE School of Computing Dublin City University
Dublin 9 Ireland
Investigation Into New Distance Measures For Concatenative Speech Synthesis
In concatenative speech synthesis the speech waveform is generated by sequentially concatenating
pre-recorded speech segments, referred to as acoustic units, which are normally chosen from a large
database of such units. The primary problem associated with this form of speech synthesis is
in selecting an appropriate sequence of acoustic units that results in natural sounding speech.
When acoustic units are concatenated spectral discontinuities occur across the boundary between the
units. Such discontinuities give rise to undesirable acoustic artefacts in the synthesised
speech waveform reducing the quality of the speech. The process of choosing acoustic units that
result in natural sounding speech is referred to as the unit selection problem.
In order to optimise the unit selection process a distance metric is defined to quantify the discontinuity
between adjacent units. Units are subsequently selected that minimise this distance measure. To date no
distance measure has been developed that is found to be consistently in agreement with human auditory perception
of discontinuity in the synthesised speech.
The goal of this project is to develop a new objective distance measure that quantifies the level of
discontinuity between adjacent units consistent with the perceived auditory discontinuities of a human observer.
This will contribute to advancing automatic high quality speech synthesis and to further understanding of
the human auditory system.
Project Supervisors Dr. Darragh O'Brien, School of Computing, Dublin City University.
Dr. Ronan Scaife, School of Electronic Engineering, Dublin City University.
Publications
-
A. Errity, J. McKenna, and B. Kirkpatrick. Manifold learning-based feature transformation for phone classification.
In Mohamed Chetouani, Amir Hussain, Bruno Gas, Maurice Milgram, and Jean-Luc Zarader, editors, Advances in Nonlinear Speech Processing,
volume 4885 of Lecture Notes in Computer Science, pages 132-141. Springer, 2007.
-
B. Kirkpatrick, D. O'Brien and R. Scaife,
"Feature transformation applied to the detection of discontinuities in concatenated speech,"
in Proc. of the 6th ISCA Workshop on Speech Synthesis (SSW6), Bonn, Germany, August 2007.
-
B. Kirkpatrick, D. O'Brien, R. Scaife, and A. Errity,
"On the role of spectral dynamics in unit selection speech synthesis,"
in Proc. of Interspeech 2007 - Eurospeech, Antwerp, Belgium, August 2007.
-
A. Errity, J. McKenna, and B. Kirkpatrick,
"Dimensionality reduction methods applied to both magnitude and phase derived features,"
in Proc. of Interspeech 2007 - Eurospeech, Antwerp, Belgium, August 2007.
-
B. Kirkpatrick, D. O'Brien, R. Scaife, and A. Errity, "Spectral dynamics as a source of discontinuity in
concatenative speech synthesis,"
in Proc. of the 15th Int. Conf. on Digital Signal Processing (DSP), Cardiff, Wales, July 2007.
-
A. Errity, J. McKenna, and B. Kirkpatrick, "Manifold learning-based feature transformation for phone classification,"
in Proc. of the ISCA Tutorial and Research Workshop on Nonlinear Speech Processing (NOLISP), Paris, France, May 2007.
-
B. Kirkpatrick, D. O'Brien and R. Scaife,
"Feature extraction for spectral continuity measures in concatenative speech synthesis,"
in Proc. International Conference on Spoken Language Processing (ICSLP), Pittsburgh PA, USA, September, 2006.
(pdf)
-
B. Kirkpatrick, D. O'Brien and R. Scaife,
"A comparison of spectral continuity measures as a join cost in concatenative speech synthesis,"
in Proc. of the Irish Signal and Systems Conference (ISSC), Dublin, 2006.
(pdf)
|
|