Research Interests

Please note that this page is under construction and that it is all a bit rough at the moment ...

Most of the publications cited here are collaborations with friends and colleagues many of whom are now located in different parts of the world. In subsequent versions of this page I hope to intoduce links to their pages where you can find lots of interesting stuff. The research interests listed below are past and present. You'll be able to tell from the dates of the publications.

Compiling LFGs from TreeBank Resources

Probabilistic Unification Grammars (e.g. LFG-DOP) require large and high quality training corpora. These corpora have to provide tree structures with feature structure annotations. Such corpora are expensive to construct and hard to come by. The traditional procedure for constructing such corpora is to write (use) a large-scale unification grammar and parse text. Typicaly for each string in the input text the grammar will produce hundreds or thousands of candidate tree-feature structure pairs from which a highly trained linguist has to pick the best analysis for inclusion in the training corpus. This is time consuming and error prone. We have developed a number of alternative methods. In each case the basic idea is extremely simple. As input our methods require a treebank. In the original method we automatically compile a CF-PSG from the treebank following the method of [Charniak,96]. We then manualy annotate the CF-PSG with f-structure equations and provide macros for the lexical categories. Then (and this is the trick) we "reparse" the treebank entries (not the strings) simply following the annotations put in there by the original human annotators and while we do that solve the f-equations on the rules encountered along that process. This results in an f-structure induced by the best-fitting tree for the example at hand. If the f-structure annotations are deterministic, then the whole process is and we do not have to chose from hundreds or thousands of alternatives. In further work we have automated f-structure annotation. We state a small number of annotation principles in the form of regular expressions which are applied to PCFG rules extracted from the treebank. Alternatively, annotation principles can be stated in terms of a rewriting system that rewrites flat sets of tree descriptions. More recently, we have developed an automatic annotation algorithm that traverses treebank trees and annotates nodes in the tree with attribute-value structure information and have applied this to the whole WSJ section of the Penn-II treebank resource. This is joint work with (at various stages) Louisa Sadler, Anette Frank, Aoife Cahill, Mairead McCarthy and Andy Way. Some resources produced are available at the Dublin-Essex TreeBank webpage maintained by Andy Way. The approach and further ideas on compiling LFG semantic forms and on structure-preserving grammar compaction are reported in:

Metaphors and Logic

Metaphors and Logic have tradionally been seen as uneasy bedfellows. Most metaphors are simply literally false, hence logic has not much to say about them, or so it goes. Here is a different idea: metaphor is intimately related to a notion of similarity and hence simile. What we can do (at least naively) is interpret a metaphor as a corresponding reduced (or elliptical) simile: John is a fox = in some sense John is like a fox = John and the set of foxes share a property. Now the last sentence can easily be represented in classical type theory and in fact we can translate John is a fox compositionally into a type theory expression denoting John and the set of all foxes share a property. Tradionally reduction of metaphor to a corresponding elliptical simile has been opposed [Davidson,67] on three grounds: (i) metaphor and simile have different truth conditions: metaphors are false, similes ar true; (ii) simile is trivial: everything is similar to everything else and (iii) simile does not have the special force metaphor has: metaphor is harder to understand then simile. Our approach answers these three criticisms. (i) in our approach (most) metaphors are literaly false while their translations as simile are (mostly) contingent. (ii) Our translation to simile guards against trivialization (hence the translations are contingent). (iii) Translation requires special work and the result is interpreted as an invitation to the recipient to find an interesting shared property. A deductive account of this is sketched. Part of this work is with Carl Vogel and is reported in:

Linear Logic and Glue Language Semantics

Unlike classical logic, linear Logic is a resource sensitive logic. Versions of linear logic are used in linguistics and natural language processing: e.g. non-commutative linear logic is employed in categorial grammar and multiplicative linear logic is used in the so called glue language based approaches to semantic composition in LFG. The work reported below is with Richard Crouch, Anette Frank and Michael Dorna. Our work has mainly centered on ways of introducing (i) underspecification and (ii) dynamic semantics into the glue language approach. We also worked on ways of using linear logic in ambiguity-preserving machine translation. This is reported in

Interpreting LFG f-structures as Quasi-Logical Forms or Underspecified Discourse Representation Structures

LFG f-structures are first and foremost abstract syntactic representations. However they do contain some basic semantic information so much so that they can be read as (i.e. translated into) corresponding Quasi-Logical Forms (QLFs) or Underspecified Discourse Representation Structures (UDRSs). This is joint work with Richard Crouch. It is reported in:

Machine Translation and Ambiguity-Preserving Transfer

Ideally, if some ambiguity in a source language carries over intact into a target language one would not want to define transfer in a machine translation system on disambiguated (semantic or syntactic) representations. The reason is that one simply doesn't want to do the extra work involved in transfering disambiguated representations if one can avoid this. The work reported below is with Richard Crouch, Anette Frank , Michael Dorna and Martin Emele. Some of our explorations in LFG using a number of approaches such as glue language semantics, f-structures or exploiting the f-structure-UDRSs correspondence are reported in:

FraCaS: A Framework for Computational Semantics

This was a European project on taking stock of current approaches to computational semantics, to see what they have in common and where they differ all with a view to suggesting a unfied approach. The FraCaS webpage is located at http://www.cogsci.ed.ac.uk/~fracas and many of our reports are downloadable from there. Our findings are reported in

Reusability of Gramatical Resources

This was European project on the reusability of grammatical resources. The basic idea is that rather than developing grammatical resources from scratch for each application one could take a look at which resources are available and how they might be migrated to a new application or into a new formalism etc. Some of the findings are reported in:

Back to my homepage