################################################### ## ## ## ALANIS: A Learning by ANalogy Inferencer for ## ## Structured data ## ## Copyright (C) 2004 2005 2006 Nicolas Stroppa ## ## This program is distributed under the terms ## ## of the GNU General Public License. ## ## ## ################################################### Description =========== - ALANIS implements a supervised learning algorithm based on analogical proportions capable of dealing with structured data such as strings, trees, and feature structures. Its main area of application is Natural Language Learning. For example, it can be used to infer the morphological analysis (seen as a tree or a feature structure) or the pronunciation (seen as a string) of a given wordform. It needs a training file of pairs of (input,output), and an input file of inputs to analyze. For a given input, possibly several (or no) solutions are proposed. Usage ===== In the example of lemma_to_derivation_tree (string => tree): Usage: lemma_to_derivation_tree [-h] -n searched_bins_n -i input_file -t test_file -o output_file [ -s input_seed_file] [ -S output_seed_file] [ -l first_line] -d input_max_degree -e input_max_conseps -w input_weighting_factor -D output_max_degree -E output_max_conseps -W output_weighting_factor Options: -n, --searched_bins_n Number of searched bins per input -i, --input_file Input file (training set) -t, --test_file Test file (test set) -o, --output_file Output file -s, --input_seed_file Input seed file (default=) -S, --output_seed_file Outut seed file (default=) -l, --first_line Number of the first line to process (default=1) -h, --help Display this help message and exit -d, --input_max_degree Input max degree -e, --input_max_conseps Input max conseps -w, --input_weighting_factor Weighting factor -D, --output_max_degree Output max degree -E, --output_max_conseps Output max conseps -W, --output_weighting_factor Output weighting factor Format ====== Each line in the training file is composed of three columns separated with a TAB. These three column contain respectively the input, the system_id, and the output. Each line in the input file is composed of an entry to analyze per line. The output is an xml file containing the (weighted and sorted) list of proposed solutions for each input. (See the references for more information about the system_id.) References ========== - Nicolas Stroppa and François Yvon. An analogical learner for morphological analysis. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005), pages 120-127, Ann Arbor, MI, June 2005. Association for Computational Linguistics. - Nicolas Stroppa. Définitions et caractérisations de modèles à base d'analogies pour l'apprentissage automatique des langues naturelles. PhD Thesis. École Nationale Supérieure des Télécommunications, Paris, November 2005. Dependencies ============ - In order to compile alanis, you will need to install: - vaucanson, version >= 1.0 (http://www.lrde.epita.fr/cgi-bin/twiki/view/Vaucanson/WebHome) How to install ============== - unpack the distribution: > tar -zvxf alanis-version.tar.gz - the package is now in the directory ./alanis-version Go to that directory: > cd alanis-version - configure, build and install: > ./configure --with-vcsn=root_of_vaucanson/include > make > make install