about

research

products

calendar

previous

links



What follows is a reverse chronological listing (from June 2008) of some of the work I have done in computational linguistics, language technology, and artificial intelligence at University of Washington, Stanford University, and University of Texas.

 

"Learning a Translation Lexicon from non Parallel Corpora." A final project for the Master's degree in Computational Linguistics evaluating the performance of syntactic context windows against positional context windows in extracting word translations from non parallel English and German newswire corpora.
Washington, June 2008 [Paper] | [Slides]

"Unsupervised Approaches to POS Tagging." A survey of five different strategies to learn tagging probabilities from unlabeled text in building a part-of-speech tagger.
Washington, March 2008 [Report]

"Springing up Baby." Co-authored with Prof. Emily Bender a problem on word ambiguity in translation for the open round of the 2008 North American Computational Linguistics Olympiad.
Washington, February 2008 [NACLO] | [Problems] | [Solutions]

"A Phonemic Inventory of Italian." A project on recording and analyzing the basic sounds, allophones, and phonological rules in a native speaker's language.
Washington, December 2007 [Report] | [Sounds.zip]

"Press Release Translation."  Translated the press release of a multilingual image search tool into Hindi using native speaker knowledge and Hindi Wordnet.
Washington, September 2007 [Original] | [Hindi] | [PanImages] | [Wordnet]

"Certificate in Empirical Foundations for Theories of Language." Attended month-long Linguistic Society of America's Summer Institute hosted at Stanford University. Activities included attending lectures on statistical machine translation (Kevin Knight, Philip Resnik, Philipp Koehn), statistical parsing (Christopher Manning), and statistical grammar induction (Dan Klein).
Stanford, July 2007 [LSA] | [Summer Institute]

"Computational Linguistics in Industry." A part of the LSA Summer Institute hosted at Stanford University, wherein lectures were given by computational linguists working in industry. I wrote a short commentary about lessons learned.
Stanford, July 2007 [Program] | [Commentary]

"Cross-linguistic Annotator for Hindi, Icelandic, and Spanish." A semester-long project on inducing a language independent semi-supervised part-of-speech tagger using tagged English text and interlinear gloss text in the target language. There are three sets of slides detailing the task description, progress, and challenge language exercise. Joint work with Sabrina Burleigh.
Washington, June 2007 [IGT] | [Report] | [Slides 1] | [Slides 2] | [Slides 3]

"Sanskrit LKB." A term project on implementing a unification-based lexical framework for Sanskrit language in Head-driven Phrase Structure Grammar and Linguistic Knowledge Base.
Washington, May 2007 [
LKB] | [HPSG] | [Data]  

"IGT Detection: A Machine Learning Approach to the Sequence Labeling Task." A project on designing a system for supervised learning of labels using off-the-shelf classifiers. Joint work with Sabrina Burleigh.
Washington, March 2007 [Report] | [Slides]

"Natural Language Generator." A project on generating well-formed English sentences in the domain of family genealogy by implementing a text-planner, a micro-planner, and a surface-realizer. Joint work with Sabrina Burleigh, Raghavan Srinivasan, and Yohei Sakata.
Washington, December 2006 [Report] | [Data.zip]

"Document Clustering." A project on implementing supervised clustering of legitimate emails and junk emails using data from the Enron email corpus and Hierarchical Agglomerative Clustering algorithm. Joint work with Sabrina Burleigh and Kathleen Sickles.
Washington, December 2006
[Enron Corpus]

"Vector Space Retrieval." A project on building a vector space model over a set of documents, facilitating search and retrieval of documents most relevant to a query. Joint work with Sabrina Burleigh and Kathleen Sickles.
Washington, November 2006

"CYK Parser." A project on developing a broad coverage probabilistic parser for English using the Cocke-Younger-Kasami algorithm and probabilistic context-free grammar learned from the Penn Treebank. Joint work with Sabrina Burleigh.
Washington, November 2006
[CYK] | [Penn Treebank]

"N-gram Language Models." A quasi-genetic study of English, German, Spanish, and Portuguese using Kullback-Leibler divergence and  Cavnar & Trenkle's Textcat. Also conducted experiments on different sized language models for Portuguese using data from Portuguese Newswire corpus.
Washington, October 2006 [KL] | [Textcat] | [Portuguese Newswire]

"Korean Markov Tagger." A project on building a supervised tagger using Hidden Markov Models and Viterbi decoding on a subset of morphologically annotated Korean text. Joint work with Sabrina Burleigh and Kathleen Sickles. Washington, October 2006 [Korean Corpus]

"Automation - Its Impact on our Lives." A commentary on computer automation seeping in every aspect of our lives.
Texas, July 2006 [Commentary]

"Handwritten Digit Recognition." Evaluated application of several neural network learning algorithms (back propagation, self-organizing maps, learning vector quantization) in a script recognition task.
Texas, December 2005 [LENS] | [Corpus]

"Agent Dispersion." A project on implementing dispersion of software agents in a multiagent environment. Joint work with Byung Kang.
Texas, May 2005 [RoboCupRescue] | [Report]

"Story Project." Designed a knowledge representation and question answering engine based on children's stories.
Texas, December 2004 [KM]

"Artificial Intelligence in Movies and Media." A presentation on application of AI techniques in movies & media production, and depiction / coverage of AI in films and news. Note links to media clips and sound bites do not work at present. Joint work with Lingling Tong, Richard Meth, and Michael Rosiles.
Texas, October 2004  [Slides]

 

="right">