Joachim Wagner and Jennifer Foster (2009): The effect of correcting grammatical errors on parse probabilities. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT'09), Paris, France, 7th-9th October, 2009
We parse the sentences in three parallel error corpora using a generative, probabilistic parser and compare the parse probabilities of the most likely analyses for each grammatical sentence and its closely related ungrammatical counterpart.
Broad-coverage probabilistic grammars induced from treebanks tend to parse any input. In many applications, robustness is required, but in others, the ability to distinguish ill-formed sentences from well-formed is also necessary. We explore the relationship between parse tree probability and sentence grammaticality to better understand how probability of a parse tree can be used to detect a grammatical error. We parse the sentences in three parallel error corpora using a generative, probabilistic parser and compare the parse probabilities of the most likely analyses for each grammatical sentence and its closely related ungrammatical counterpart. We examine the effect of particular error types on parse probability. While there is a clear tendency for a grammatical error to negatively affect the parse probability, sentence, for error types which involve a change in sentence length, the picture is less clear.
Grammar Checker, Error Detection, Natural Language Parsing, Probabilistic Grammars, Learner Corpora
We have parsed the sentences in three parallel error corpora using a generative, probabilistic parser and examined the parse probability of the most likely analysis of each sentence. We find that grammatical errors have some negative effect on the probability assigned to the best parse, a finding which corroborates previous evidence linking sentence grammaticality to frequency. In our experiment, we approximate sentence probability by looking only at the most likely analysis -- it might be useful to see if the same effect holds if we sum over parse trees. To fully exploit parse or sentence probability in an error detection system, it is necessary to fully account for the effect on probability of 1) non-structural factors such as sentence length and 2) particular error types. This study represents a contribution towards the latter.