TET 2020: 1st International Workshop on Terminology Extraction and Translation
Co-located with xyz
Terms are productive in nature, and new terms are being created all the time. A term could have multiple meanings depending on the context in which it appears. For example, word ``terminal'' (`a bus terminal' or `terminal disease' or `computer terminal') could have very different meanings depending on the context in which they appear. A polysemous term (e.g. terminal) could have many translation equivalents in a target language. For example, the English word `charge' has more than twenty target equivalents in Hindi (e.g. `dam' for `value', `bhar' for `load', `bojh' for `burden'). While encountering a judicial document, translation of ``charge'' has to be a particular Hindi word: `aarop'. The target translation could lose its meaning if one does not take the term translation and domain knowledge into account. Accordingly, the preservation of domain knowledge from source to target is pivotal in any translation workflow (TW), and this is one of the customers' primary concerns in the translation industry. For years, terminologies are used by human translators in ensuring translation consistency and reducing ambiguity, which is also arguably the most important external translational resource in any TWs. Creation of (monolingual and bilingual) termbanks is a never ending process as new terms are always being produced. In this context, Terminology as a Service (TaaS), an European Commission's project, aimed to address the need for instant access to the most up-to-date terms, user participation in the acquisition and sharing of multilingual terminological data, and efficient solutions for terminology resources reuse. Terminology management, acquisition of terminological resources (monolingual or multilingual), and term annotation are the active areas of natural langauge processing (NLP) research. As an example, automatic extraction of monolingual or bilingual terminology is a productive field of NLP research, be it from domain-specific parallel corpora (Haque et al., 2018) or comparable corpora (Terryn et al., 2019).
Multiword units (MWUs) are overwhelmingly present in terminology (Sag et al., 2002 ). Computational treatments of MWUs is pivotal in many natural language processing (NLP) tasks (Haque et al., 2019). Accordingly, handling techniques of single-word and multiword terms are actively investigated in translation technology, machine translation (MT), recommender systems, IR and other NLP tasks.
Annotation techniques have been widely studied in many areas of NLP. However, terminology annotation is a rarely investigated domain in NLP due to many challenges (Terryn et al., 2019). Terryn et al. (2019) presented studies relating to annotation and evaluation of multilingual automatic terminology extraction from comparable corpora. Pinnis et al. (2012) investigated term extraction, tagging and mapping techniques for under-resourced languages. In order to evaluate the quality of the bilingual terms in MT, Arčan et al. (2014) manually created a terminology gold standard for the IT domain. Haque et al. (2019) demonstrated a semi-automatic terminology annotation strategy from which a gold standard for evaluating terminology translation in automatic translation can be created.
Terminology translation plays a critical role in domain-specific MT. Given a long history of classical MT models (e.g. phrase-based statistical MT (PB-SMT) (Koehn et al., 2003), there have been many studies that investigated the exploitation of the terminological resources in improving MT quality. Even the PB-SMT decoder Moses enables a solution to term translation with its XML-markup approach. Integrating external knowledge, such as terminology, into the neural MT (NMT) (Vaswani et al.,2017) is to be a challenging task due its architecture and other reasons. Although, there have been a few research works that looked into the downstream applications of bilingual terminologies such as their use in improving the quality of MT itself, e.g. guiding NMT decoder that prioritizes translation recommendations (Chatterjee et al., 2017), constraint (i.e. translation candidates from terminology) decoding (Hokamp and Liu, 2017), instance-based adaptation (Farajian et al., 2018). As far as evaluation of term translation is concerned, no standard automatic MT evaluation metric (e.g., BLEU) can provide much information on how good or bad an MT system is at translating domain-specific expressions. In industrial TWs, translation service providers generally hire human experts related to the concerned domain for identifying term translation problems in MT. Nevertheless, such a process is expensive and time consuming. Moreover, in an industrial setting, retraining of customer-specific MT engines from scratch is carried out quite often when a reasonable amount of new training data pertaining to the domain and style on which that MT system was built or a new state-of-the-art MT technique are available. In industry, carrying out human evaluation on term translation each time from scratch when an MT system is updated would be exorbitant in a commercial context. Researchers have started to explore the terminology translation evaluation in MT (e.g. Haque et al., 2019). Effective solutions to the problem of terminology translation evaluation would certainly aid MT users who want to assess their MT systems quickly in the area of domain-specific term translation.
We attempt to put together the topics discussed above and the wider related domains under an umbrella by this workshop. This workshop welcomes various stakeholders such as NLP experts and scholars, early-stage researchers, computer scientists and corpus linguists together to present their work and share their experiences.
Call for Papers - Topics
We are interested in a wide range of NLP topics which are of relevance for terminology extraction, management, handling techniques, annotation, translation, evaluation etc. Topics of interest include (but not limited to):
--Computational treatment of terminology in NLP.
--Terminology in translation technology
-- Terminology translation and evaluation
-- Terminology in industrial TWs
-- Terminology management
-- Terminoloy annotation & development of monolingual and bilingual term annotation tools
-- Evaluation of terminology annotation
-- Terminology acquisition from parallel, comparable, or monolingual data (methodologies and description of tools)
-- Evaluation of monolingual and bilingual terminology extraction
-- Handling techniques of multi domain terminologies
-- Integration of Terminology in NLP models (e.g. MT, CAT tools)
-- Handling techniques of discontinuous terms
-- Terminology extraction, annotation, management, translation and evaluation for morphologically rich and complex languages
-- Terminology extraction, annotation, management, translation and evaluation for under-resourced languages
Rejwanul Haque, ADAPT Centre, Dublin City University, Ireland
Mohammed Hasanuzzaman, ADAPT Centre, CIT, Ireland
Chao-Hong Liu, ADAPT Centre, Dublin City University, Ireland
Submission Deadline: Date, Date – 23:59 EST (New York City Time) [ Current EST ]
Notification of Acceptance: Date
Camera-ready Papers Due: Date
Workshop: Days, Date
We invite the submission of original research results related to the areas of the workshop.
– Research papers (maximum 8 pages (excluding references) ACL style ) should present mature work and established results.
– Short papers (maximum 6 pages (excluding references) ACL style) may present proposed research directions, initial results etc.
– Final versions will be given one additional page of content so that reviewers' comments can be taken into account.
– Submissions should be in English and be submitted in PDF.
– Submission of the papers should be done through the TET-2020 EasyChair page .
– System description may consist of up to four (4) pages of content, plus unlimited references.