################################################### ## ## ## ANHI: A Naive HMM Implementation ## ## Copyright (C) 2005 2006 Nicolas Stroppa ## ## This program is distributed under the terms ## ## of the GNU General Public License. ## ## ## ################################################### Description =========== - Anhi is a naive HMM implementation whose primary goal is to align sequences of graphemes and sequences of phonemes using an Hidden Markov Model and the EM algorithm. - You can use Anhi in two mode: supervised and unsupervised. In the supervised mode, you will need to provide a file in which the lengths of the sequences of phonemes and graphemes are equal. The parameters of the HMM will be estimated using these alignments. In the unsupervised mode, you first have to use a bootstrap strategy to adapt these lengths in order to have an initial alignment. Four strategies are provided: left, right, center, or random alignment. The random alignment will provide the best results. Usage ===== Usage: anhi_align [-h] [-k] -i input_file -o output_file [ -t training_file] [ -n n_iterations] [ -b bootstrap_alignment] [ -p placeholder_char] Options: -i, --input_file Input file (file to align) -o, --output_file Output file (result of alignment) -t, --training_file Training file to estimate HMM parameters (not required in the unsupervised mode) (default=) -n, --n_iterations Number of iterations of the EM algorithm (unsupervised mode) (default=0) -b, --bootstrap_alignment Strategy to use to create a bootstrap alignment (left, right, center, random) (default=) -k, --keep_intermediate_files Keep generated files in unsupervised mode -p, --placeholder_char Placeholder character (default==) -h, --help Display this help message and exit Format ====== Each line in the input file is composed of two columns separated with a TAB. These two column contains respectively the sequence of phonemes and graphemes. Each phoneme must be represented using a _unique_ ASCII symbol. The placeholder character is used to denote a null alignment (symbol '=' by default). Examples ======== ## Supervised mode. > anhi_align -t training_file -i input_file -o output_file_b -b random ## Unsupervised mode. > anhi_align -i input_file -o output_file_b -b random ## This will perform a (random) bootstrap alignment. > anhi_align -i output_file_b -n 5 -o output_file ## This will execute 5 iterations of EM. ## You can also directly do: > anhi_align -i input_file -o output_file -b random -n 5 How to install ============== - unpack the distribution: > tar -zvxf anhi-version.tar.gz - the package is now in the directory ./anhi-version Go to that directory: > cd anhi-version - configure, build and install: > ./configure > make > make install