Seminar talk Adapt - Prof Mikel Forcada - 12/06/18

Video Category: 
Transfer Talk

Seminar: One-parameter models for sentence-level post-editing effort estimation

Abstract: Methods to predict the effort needed to post-edit a given machine translation (MT) output are seen as a promising direction to making MT more useful in the translation industry. Despite the wide variety of approaches that have been proposed, with increasing complexity as regards their number of features and parameters, the problem is far from solved. Focusing on post-editing time as effort indicator, this paper takes a step back and analyses the performance of very simple, easy to interpret one-parameter estimators that are based on general properties of the data: (a) a weighted average of measured post-editing times in a training set, where weights are an exponential function of edit distances between the new segment and those in training data; (b) post-editing time as a linear function of the length of the segment; and (c) source and target statistical language models. These simple estimators outperform strong baselines and are surprisingly competitive compared to more complex estimators, which have many more parameters and combine rich features. These results suggest that before blindly attempting sophisticated machine learning approaches to build post-editing effort predictors, one should first consider simple, intuitive and interpretable models, and only then incrementally improve them by adding new features and gradually increasing their complexity. In a preliminary analysis, simple linear combinations of estimators of types (b) and (c) do not seem to be able to improve the performance of the single best estimator, which suggests that more complex, non-linear models could indeed be beneficial when multiple indicators are used.

Speaker: Prof. Mikel L. Forcada, Universitat d'Alacant

Prof. Mikel L. Forcada was born in Caracas (Venezuela) in 1963 and is married with two children. He graduated in Science in 1986 and got his Ph.D. in Chemistry in 1991. Since 2002 he is full professor of Computer Languages and Systems at the Universitat d'Alacant. Prof. Forcada is president of the European Association for Machine Translation since 2015 and book review editor of the international journal Machine Translation. From the turn of the millennium on, Prof. Forcada's interests have mainly focused on the field of translation technologies, but he has worked in fields as diverse as quantum chemistry, biotechnology, surface physics, machine learning (especially with neural networks) and automata theory. He is the author of more than 70 articles in international journals, papers in international conferences and book chapters, of which about 40 are about translation technologies. In 2004, after heading several publicly- and privately-funded projects on machine translation he started the free/open-source machine translation platform Apertium (with more than 26 language pairs), where he is currently the president of the project management committee. He is also administrator in three more free/open-source software projects (Bitextor, Orthoepikon, Tagaligner) and co-founder of Prompsit Language Engineering (2006). Prof. Forcada has participated in the scientific committees of more than twenty international conferences and workshops. During 2009–2010 he has been an ETS Walton Visiting Professor in the machine translation group at Dublin City University.