Judging Grammaticality: Experiments in Sentence Classification

Joachim Wagner, Jennifer Foster and Josef van Genabith (2009): Judging Grammaticality: Experiments in Sentence Classification. In CALICO Journal , pages 474-490, volume 26, number 3


A classifier which is capable of distinguishing a syntactically well formed sentence from a syntactically ill formed one has the potential to be useful in an L2 language-learning context. In this article, we describe a classifier which classifies English sentences as either well formed or ill formed using information gleaned from three different natural language processing techniques. We describe the issues involved in acquiring data to train such a classifier and present experimental results for this classifier on a variety of ill formed sentences. We demonstrate that (a) the combination of information from a variety of linguistic sources is helpful, (b) the trade-off between accuracy on well formed sentences and accuracy on ill formed sentences can be fine tuned by training multiple classifiers in a voting scheme, and (c) the performance of the classifier is varied, with better performance on transcribed spoken sentences produced by less advanced language learners.


Grammar Checker, Error Detection, Natural Language Parsing, Probabilistic Grammars, Precision Grammars, Decision Tree Learning, Voting Classifiers, N-gram Models, Learner Corpora


We have presented a new method for judging the grammaticality of a sentence which makes use of probabilistic parsing with treebank-induced grammars. Our new method exploits the differences between parse results for grammars trained on grammatical, ungrammatical, and mixed treebanks. The method combines well with n-gram and deep grammar methods in a machine-learning-based framework. In addition, voting classifiers have been proposed to tune the accuracy trade-off. This provides an alternative to the common practice of applying n-gram filters to increase the accuracy on grammatical data (Gamon et al., 2008; Lee & Seneff, 2008).

Our method was trained on sentences from the BNC and artificially distorted versions of these sentences produced using an error creation procedure. When tested on real learner data, we found that the method's accuracy drops, indicating that the next step in our research is to refine the error creation procedure to take into account a broader class of errors, including, for example, preposition errors and mass noun errors. In addition, we intend to experiment with adding noncontinuous sequential patterns as used by Sun et al. (2007) to our n-gram method to see if this improves performance. Another interesting future direction is to explore the relationship between our work and the machine-learning-based methods used in the machine translation community to evaluate the fluency of machine translation system output (Albrecht & Hwa, 2007). The area of research concerned with automatically evaluating writing style might also provide useful insights.

Full article available on the CALICO website


Presentation at Calico '08 Workshop

DCU Online Research Access Service #15662

<< back

2010-08-19T13:24:22+0100 Thu Aug 19 13:24:22 IST 2010
© 2009, 2010 Joachim Wagner jWAGnEr@COMPUTiNg.dcu.IE