Pratyush Banerjee - PhD Transfer Talk - 13th September 2011

Video Category: 
Transfer Talk
Pratyush Banerjee - PhD Transfer Talk

Title: Combinations of Domain-Specific Models in Domain Adaptation of  Statistical Machine Translation

Abstract: Adapting existing Statistical Machine Translation (SMT) systems to new domains different from the training domain poses a range of challenges. In this report, we address some of the challenges by identifying and leveraging homogeneity in sub-parts of the training data. Depending on the type of adaptation required, we have experimented with different techniques of creating and combining domain-specific models of SMT. In our first set of experiments, we train a classifier to combine multiple domain-specific SMT models in order to produce better quality translations compared to those produced by a single generic model. In the next set of experiments we identify the effect of combinations of domain-specific models within different components of the SMT system. We test and evaluate our approaches by applying them to real life problems of translating professionally edited corporate content as well as user-generated forum content data. Finally we detail our future directions towards automatically identifying sub-parts of the training data which are best suited for domain-specific translations and their subsequent comparison with the natural divisions present in the data. Furthermore, we include efforts in finding an optimal way of splitting a dataset into homogeneous sub-parts to ensure improved translation quality, as part of our future work.