|
"Learning a Translation
Lexicon from non Parallel Corpora." A final project for the
Master's degree in Computational Linguistics evaluating the
performance of syntactic context windows against positional context
windows in extracting word translations from non parallel English
and German newswire corpora.
Washington, June 2008 [Paper] | [Slides] |
|
"Unsupervised Approaches to
POS Tagging." A survey of five different strategies to
learn tagging probabilities from unlabeled text in building a part-of-speech tagger.
Washington, March 2008 [Report] |
|
"Springing up Baby."
Co-authored with Prof. Emily Bender a problem on word ambiguity in
translation for the open round of the 2008 North American
Computational Linguistics Olympiad.
Washington, February 2008 [NACLO]
| [Problems]
| [Solutions] |
|
"A Phonemic Inventory of
Italian." A project on recording and analyzing the basic
sounds, allophones, and phonological rules in a native speaker's
language.
Washington, December 2007 [Report] | [Sounds.zip] |
|
"Press Release
Translation." Translated the press release of a
multilingual image search tool into Hindi using native speaker
knowledge and Hindi Wordnet.
Washington,
September 2007 [Original]
| [Hindi] | [PanImages]
| [Wordnet] |
|
"Certificate in
Empirical Foundations for Theories of Language." Attended
month-long Linguistic Society of America's Summer Institute hosted
at Stanford University. Activities included attending lectures on
statistical machine translation (Kevin Knight, Philip Resnik,
Philipp Koehn), statistical parsing (Christopher Manning), and
statistical grammar induction (Dan Klein).
Stanford, July 2007 [LSA]
| [Summer
Institute] |
"Computational Linguistics in Industry."
A part of the LSA Summer Institute hosted at Stanford University,
wherein lectures were given by computational linguists working in
industry. I wrote a short commentary about lessons learned.
Stanford, July 2007 [Program]
| [Commentary] |
|
"Cross-linguistic Annotator
for Hindi, Icelandic, and Spanish." A semester-long project on inducing
a language independent semi-supervised part-of-speech tagger using
tagged English text and interlinear gloss text in the target
language. There are three sets of slides detailing the task
description, progress, and challenge language exercise. Joint work
with Sabrina Burleigh.
Washington, June 2007 [IGT]
| [Report] | [Slides
1] | [Slides 2] |
[Slides 3] |
|
"Sanskrit LKB."
A term project on implementing a unification-based lexical framework
for Sanskrit language in
Head-driven Phrase Structure Grammar and
Linguistic Knowledge Base.
Washington, May 2007 [LKB]
| [HPSG]
|
[Data] |
"IGT Detection: A Machine
Learning Approach to the Sequence Labeling Task." A project
on designing a system for supervised learning of labels using
off-the-shelf classifiers. Joint work with Sabrina
Burleigh.
Washington, March 2007 [Report]
| [Slides] |
|
"Natural Language
Generator." A project on generating well-formed English
sentences in the domain of family genealogy by implementing a
text-planner, a micro-planner, and a surface-realizer. Joint work with Sabrina
Burleigh, Raghavan Srinivasan, and Yohei Sakata.
Washington, December 2006 [Report]
| [Data.zip] |
|
"Document
Clustering." A
project on implementing supervised clustering of legitimate emails and junk
emails using data from the Enron email
corpus and Hierarchical Agglomerative
Clustering algorithm. Joint work with Sabrina Burleigh and Kathleen Sickles.
Washington, December 2006 [Enron
Corpus] |
"Vector Space Retrieval."
A project on building a vector space model over a set of
documents, facilitating search and retrieval of documents most
relevant to a query. Joint work with Sabrina Burleigh and Kathleen Sickles.
Washington, November 2006 |
|
"CYK Parser." A
project on developing a broad coverage probabilistic parser for
English using the Cocke-Younger-Kasami algorithm and probabilistic context-free grammar learned from the
Penn
Treebank. Joint work with Sabrina Burleigh.
Washington, November 2006 [CYK]
| [Penn
Treebank] |
|
"N-gram Language
Models." A quasi-genetic study of English, German, Spanish, and
Portuguese using Kullback-Leibler divergence and
Cavnar & Trenkle's Textcat. Also conducted
experiments on different sized language models for Portuguese using
data from Portuguese Newswire corpus.
Washington, October 2006 [KL]
| [Textcat]
| [Portuguese
Newswire] |
|
"Korean Markov Tagger."
A project on building a supervised tagger using Hidden Markov
Models and Viterbi decoding on a subset of morphologically annotated
Korean text. Joint work with Sabrina Burleigh and
Kathleen Sickles. Washington, October 2006 [Korean
Corpus] |
|
"Automation - Its Impact on
our Lives." A commentary on computer automation seeping in every
aspect of our lives.
Texas, July 2006 [Commentary] |
|
"Handwritten Digit Recognition." Evaluated application of
several neural network learning algorithms (back propagation,
self-organizing maps, learning vector quantization) in a script
recognition task.
Texas, December 2005 [LENS]
| [Corpus] |
"Agent Dispersion." A
project on implementing dispersion of software agents in a multiagent environment.
Joint work with Byung
Kang.
Texas, May 2005 [RoboCupRescue]
| [Report] |
|
"Story Project." Designed a knowledge
representation and question answering engine based on children's
stories.
Texas, December 2004 [KM] |
|
"Artificial Intelligence in
Movies and Media." A presentation on application of AI
techniques in movies & media production, and depiction / coverage of
AI in films and news. Note links to media clips and sound bites do
not work at present. Joint work with Lingling Tong, Richard Meth, and Michael
Rosiles.
Texas, October 2004 [Slides] |