Home Biography Research Description Publications Links Contact

Assistive Technology Machine Translation
The Integrated Language Technologies (ILT) 1.9 work package ``Assistive Technologies'' is a project within the Centre for Next Generation Localisation (CNGL), a Centre for Science and Engineering Tecnology (CSET) funded by SFI for 2008-2013. Research will focus on the use of language technology in general, and translation in particular, to assist patients with limited English when communicating with healthcare professionals such as doctors and medical secretaries. This project has a two-fold focus that includes patients of linguistic minorities, as well as native Deaf users of Irish Sign Language. We will be focussing on scenarios that take place in clinic and hospital reception areas, such as appointment scheduling, registering for treatment, collecting prescriptions, leaving samples, etc.

Rough overview of past work (PhD 2004-2008)

Sign languages (SLs) are the first and preferred languages of the Deaf Community worldwide. As with other minority languages, they are often poorly resourced and in many cases lack political and social recognition. As with speakers of minority languages, Deaf people are often required to access documentation or communicate in a language that is not natural to them. In an attempt to alleviate this problem we are developing an example-based machine translation (EBMT) system to allow Deaf people to access information in the language of their choice. While some research exists on translating between natural and sign languages, we believe ours is the first attempt to tackle this problem using an EBMT approach.

An EBMT approach necessitates the composition of a bilingual data set aligned sententially and sub-sententially using a predefined method. The lack of a formally adopted, or even recognised, writing system for SLs makes finding a dataset suited to our method difficult. Of the small few transcription methods available, we have chosen to use annotated video data to construct our bilingual corpus. An example of such data may be seen below where the video of SL utterances is present in the upper left corner and the respective annotations of this data presented horizontally below in correspondence with a timeline.

The annotations are composed of a gloss for the articulations of the right and left hands with the possibility of including non-manual feature (NMF) details such as head nods and eyebrow movement that can alter the semantics of a sentence. One of the main advantages to using annotated data is that all features, (i.e. glosses, NMFs and phonetic description of the signs in terms of handshape, orientation etc.) can be included and temporally aligned. This allows for the annotations to be bound together according to their time frames to form chunks that can correspond to the chunks formed on the spoken language side of the text. The Marker Hypothesis is used to chunk the spoken language side of the texts. Despite the different chunking methods, manual examination of both chunk sets showed a large number of potentially alignable chunks are produced.

We have developed an EBMT system using data in Dutch Sign Language/Nederlandse Gebarentaal (NGT). The dataset is composed of only 561 sentences of poetry and children's fables, a topic not suited to machine translation. For this reason we have created and developed a dataset of Irish Sign Language (ISL) videos with corresponding annotations three times the size of the NGT corpus and on the more suited closed domain topic of flight information queries.

Currently output is in the form of the SL video annotations. In future work, we intend to make use of the phonetic details added to the annotations in combination with the glosses and NMFs to automatically produce sign language using an signing avatar like the one below.