GaelTech Project

I currently manage the GaelTech project at the ADAPT Centre (School of Computing) in DCU. This research programme is funded by the Department of Culture, Heritage and the Gaeltacht. The project focuses on the development of natural language processing (NLP) tools and resources for the Irish language. Two key motivations are (1) to provide training for a new generation of Irish language technologists and (2) to develop resources in a step to mitigate against the risk of digital extinction of the Irish language. This is in line with the forthcoming Digital Plan for the Irish language along with the recommendations set out by the 2012 White Paper Irish in the Digital Age

The project involves participation in the Universal Dependencies Project. Our aim is to ensure that the Irish Universal Dependency treebank is part of each version release and grows in size in order to be useful in parsing technologies and support other downstream tasks such as machine translation, grammar checking, text summarisation, language analysis tools etc. With its inclusion in the UD dataset we hope that international researchers to use the dataset as part of cross- or multi-lingual studies.

Three main strands of research are:

  • Irish Parsing Research and Expansion of the Irish Dependency Treebank(s)
  • Automated Processing of Irish Multiword Expressions
  • NLP for Irish User-Generated Content (Twitter)
  • Projected outcomes of the project:

  • Irish Universal Dependency Treebank (~5,000 trees)
  • Annotation Guidelines for Irish parsing in UD
  • Improved parsing models for automatic parsing Irish text
  • Lexicon of Irish Multiword Expressions
  • MWE Identification tool for Irish
  • Treebank of Irish tweets (~1500+ tweets)
  • English/Irish Tweet part-of-speech tagger
  • Linguistic analysis of Irish-English code-switching online