This software is a parameter estimator for maximun entropy models [1]. Given a set of events as training data, the program outputs parameters that maximize the likelihood of the training data. The software supports the following functions.
A maximum entropy model gives a conditional probability p(x|y) of an event e=<x,y>, where x is a target event and y is a history event. Characteristics of an event are represented with a bundle of feature functions (or feature in short) f_i. Each characteristic corresponds to each i, and f_i(x|y) takes the non-zero positive value when an event <x,y> has the i-th characteristic.
Given an event e=<x,y> and feature functions f_i, a maximum entropy model gives a probability by the following formula.
p(x|y) = 1/(Z_y) prod( a_i^f_i(x,y) )a_i is a model parameter, and intuitively, it represents the weight of a feature f_i. Z_y is a normalizing factor for letting the summation of probabilities p(x|y) be 1 for all possible x under a given y.
Given a set of feature functions and training data of observed events as an input, Amis computes and outputs the optimal parameters a_i. The program supports several algorithms for parameter estimation: IIS, GIS, BFGS, and their modified version for supporting feature forests.
The program can be compiled and run on IA machines of the above specs, or SPARC machines of the equivalent specs. More memory/hard disk will be required depending on the size of input data.
Amis supports "configure" script, and is compiled and installed by the following instructions. (In the following examples, % represents a command prompt.)
% ./configureAs a default, the executable is installed in /usr/local/. When you want to install in another directory ($DIR in the example), specify the option as following.
% ./configure --prefix=$DIROther than "--prefix", various options are supported.
| Option | Default | Valid values | Effect |
|---|---|---|---|
| --enable-debug | no | 0 - 5 or no | Specify whether debug messages are printed or not. The greater value is given, and more messages are printed. |
| --enable-profile | no | 0 - 5 or no | Specify whether profiling (measuring the execution time of each function) is enabled or not. The greater value is specified, and more functions are profiled. |
| --enable-ambiguity | no | no or yes | Specify whether ambiguous events are supported or not. If you enable the support for ambiguous events, the program gets slow very much. You should specify "no" when ambiguous events are not required. |
% makeThe executable "amis" is created.
% make check
% make install
The above instructions will install "/usr/local/bin/amis".
To start up Amis, execute "amis" with an argument specifying a configuration file (described later).
% amis [configuration file]When you omit the argument, "amis.conf" will be a configuration file as a default. When the configuration file is not found (or cannot be read), the program stops with an error.
You can specify the following options at startup-time.
| Option | Default | Valid values | Effect |
|---|---|---|---|
| -h | nothing | nothing | Print help messages |
| -f | binary | [binary|integer|real] | Specify the type of features |
| -m | amis.model | file name | Specify the name of a model file (described later) |
| -e | amis.event | file name | Specify the name of an event file (described later) |
| -o | amis.output | file name | Specify the name of an output file (described later) |
| -l | amis.log | file name | Specify the name of a log file (described later) |
| -d | Amis | [Amis|CMEE] | Specify the data file format |
| -a | IIS | [IIS|GIS|BFGS] | Specify the parameter estimation algorithm |
| -i | 200 | non-zero positive value | Specify the number of iterations |
| -n | 200 | non-zero positive value | Specify the number of iterations in Newton's method |
| -s | 5 | non-zero positive value | Specify the memory size for limited-memory BFGS |
| -r | 1 | non-zero positive value | Specify the interval of printing the progress of computation |
| -p | 6 | non-zero positive value | Specify the significant digits |
The configuration is enabled in the following order.
This program is composed of the following modules
Model file
Event file Configuration file
| |
V V
----------------------- ---------------
| | | |
| Data file manager |<---| Property |
| | | |
----------------------- ---------------
| | |
| V V
| ----------- ---------------
| | | | |
| | Model |---->| |
| | | | |
| ----------- | Algorithm |---> Output model
| --------------- | |
| | | | |
--->| Event space |---->| |
| | | |
--------------- ---------------
|
-----------------------
| Model expectation |
| Empirical expectation |
-----------------------
This section explains the data file format: Amis format and AmisTree format. While this software also accepts CMEE format, Amis format is more flexible. AmisTree format is used for estimating maximum entropy models for feature forests.
For each file, # to the end of line is a comment and ignored. Comments are treated as a space. Each token is separated by spaces or tabs, and "new line" represents the end of line. Colons (:) are a special character. When you want to use these special characters as a part of tokens, escape the character with a backslash (\). A backslash itself is represented as \\.
To use amis, configuration file,model file, and event file should be prepared. Each of them is explained below.
In a configuration file, specify the name of an option and its value. See the following example.
DATA_FORMAT Amis FEATURE_TYPE integer MODEL_FILE me.model EVENT_FILE me.event OUTPUT_FILE me.output LOG_FILE me.log ESTIMATION_ALGORITHM IIS NUM_ITERATIONS 200 NUM_NEWTON_ITERATIONS 20 REPORT_INTERVAL 1 PRECISION 6
The following options are available.
| Option | Default | Valid values | Effect |
|---|---|---|---|
| DATA_FORMAT | Amis | Amis CMEE AmisTree | Specify the data file format |
| FEATURE_TYPE | binary | binary integer real | Specify the type of features |
| MODEL_FILE | amis.model | file names | Specify the names of model files |
| EVENT_FILE | amis.event | file names | Specify the names of event files |
| OUTPUT_FILE | amis.output | file name | Specify the name of an output file |
| LOG_FILE | amis.log | file name | Specify the name of a log file |
| ESTIMATION_ALGORITHM | IIS | IIS GIS BFGS | Specify the algorithm for parameter estimation. |
| FEATURE_COUNT_HASH | FALSE | TRUE FALSE | Specify the class ("map" or "vector") used for the factoring optimization of IIS algorithms. As a default, "vector" is used. |
| NUM_ITERATIONS | 200 | non-zero positive value | Specify the number of iterations |
| MEMORY_SIZE | 5 | non-zero positive value | Specify the memory size for limited-memory BFGS |
| REPORT_INTERVAL | 1 | non-zero positive value | Specify the interval of reporting the progress of computation |
| PRECISION | 6 | non-zero positive value | Specify the number of significant digits |
| EVENT_ON_FILE | FALSE | TRUE FALSE | Specify whether the event data is put on memory (default), or on a file. This option is used when the event data is too large and the memory is unsufficient. |
| EVENT_ON_FILE_NAME | amis.event.tmp | file name | Specify the file of storing event data. Effective only when EVENT_ON_FILE option is enabled |
A model file gives a set of feature functions and corresponding initial parameters. See the following example.
feature1 1.0 feature2 2.0 feature3 0.3 ...
Each line corresponds to each feature. First, you specify the name of a feature. For feature names, you can use any characters except for spaces, tabs, colons (:), and pounds (#). Next, following spaces or tabs, specify the initial parameter of the feature. Initial values are given by C-style floating point values. Initial parameters can be any positive values (usually, 1.0). Only two tokens, feature name and parameter, can be specified in each line.
An event file gives events as training data. In detail, you specify activated features for an observed event and its complement events. Here, an observed event is an event observed in the training data. A complement event is an event whose target event is substituted from the observed event. As descrbied in Introduction, we must normalize the summation of probabilities for all events which can be observed under the same history event of the observed event. Such events are given via complement events. To sum up, each event is represented with an observed event and a set of events observable under the same history event.
See the following example.
event_1 1 feature1:2 feature3 0 feature1 0 feature2:3 event_2 0 feature2 feature3:5 1 feature1 0 feature3 ...
Each block separated by blank lines corresponds to one event. In the first line, you specify the name of an event. You can use any characters except for special characters mentioned above. Other lines represent an observed event or complement events. At the beginning of a line, specify the number of times the event observed. For an observed event, it should be non-zero positive value, and for a complement event it should be zero. When you have enabled ambiguous events, you can specify positive real values. Otherwise, the number of observed times must be an integer and only one observed event is permitted for one event description. Next, enumerate activated features for an event. Each feature must be defined in a model file. If you specified a feature not found in a model file, it would be an error. The value of a feature function can be specified following the feature name. As in the above example, specify the feature value following a colon (:). When omitted, it will be 1.
Each event description is separated by blank lines. Note that a line with only comments is also treated as a blank line.
When you use an estimation algorithm for feature forests, an event file must be written in AmisTree format. Configuration and model files are the same as Amis format. This section describes the format of event files.
An event file in AmisTree format is as following.
event_1 2
feature1:2 feature2:3 feature3
{ dnode_1 ( node_1 feature1:2 { dnode_2 ( node_2
feature2:3 ) ( node_3 ) } { dnode_3 $node_2 (
node_4 feature3 ) } ) }
event_2 1
feature2:3
{ dnode_1 ( node_1 feature1 ) ( node_2 { dnode_2
( node_3 feature2:3 ) ( node_4 feature3 ) } ) }
...
Note: The third and following lines in each event description
must be in deed represented in a line.
As in the Amis format, blank lines separates each event description. In the AmisTree format, an event is represented with three lines. The first line specifies the name of an event and the number of times of the observed event. In the above example, event_1 is observed twice, and event_2 once. In the second line, you enumerate activated features of an observed event. As in the Amis format, specify the name of a feature together with its value. The third line represents an observed event and complement events in a feature forest. Disjunctive nodes are represented with curly braces. Between the curly braces, the name of a node is first specified, and conjunctive nodes follow. Conjunctive nodes are represented with round brackets. Between the round brackets, the name of a node is first specified, and activated features follow. Feature descriptions are the same as Amis format. You can also specify disjunctive nodes as daughter nodes. Node names are used to represent structure-sharing. Already appeared nodes can be refered by "$" followed by the node name. In event_1 in the above example, $node2 represents the sharing with node2 already appeared. By using node sharing and pack the feature forest smaller, the computational complexity reduces, and the computation will be accelerated. You can use any characters except for special characters for node names and feature names.
Do not forget to put spaces before and after curly braces and round brackets. Without spaces, they will be treated as a part of node names or feature names.
An output format is common to Amis and AmisTree format.
Amis outputs pairs of feature and parameter. The output format is the same as a model file. Since the output format is the same as a model file, the output file can be reused as an input of the new computation. That is, we can further progress the parameter estimation given already estimated data.
Parameters a_i corresponding to feature functions are output, and we can compute the probability of an unknown event by the product of a_i for all activated features.
Along with a simple example, I describe how to use the estimator. Suppose we are making a maximum entropy model for POS tagging. For example, to sentence "I like handball", we can assign POS to each word, like "I/Noun like/Verb handball/Noun". We consider the tagging task as a sequence of events, each of which is the task to assign one of tags to each word given a previous word and tag. For example, when we are looking at "like", we select a tag (target) for "like" under the context (history) of having "I/Noun" as a previous word. In the Amis format, tagging events are expressed as following ("BOS" is for "Beginnig Of Sentence").
event_BOB/BOS-I/Noun 1 BOS/BOS-I/Noun */*-I/Noun */*-*/Noun 0 */*-*/Verb 0 */*-*/Prep 0 */*-*/Modif event_I/Noun-like/Verb 0 I/Noun-like/Noun */Noun-like/Noun */*-like/Noun */*-*/Noun 1 I/Noun-like/Verb */Noun-like/Verb */*-like/Verb */*-*/Verb 0 */Noun-like/Prep */*-like/Prep */*-*/Prep 0 */*-*/Modif event_like/Verb-handball/Noun ...
Each block separated by a blank line corresponds to each event. The first line of each block is a name of an event, which has in fact no effect to the estimator (just for human readability). Other lines describe the features activated for the event. Each line corresponds to each target, in this example, Noun, Verb, Prep, and Modif. Every event description must have every line for all possible targets for the history. The beginning of the line describes whether the event is "observed" or not. If the pair of target and history corresponding to the line is observed in the training data, it has the positive number. If it is not observed, for example "assign Verb for I", it is 0.
The model file will be as follows.
BOS/BOS-I/Noun 1.0 */*-I/Noun 1.0 */*-*/Noun 1.0 */*-*/Verb 1.0 */*-*/Prep 1.0 */*-*/Modif 1.0 ...
Running the estimator, and we get will get the output file like this:
BOS/BOS-I/Noun 8.03 */*-I/Noun 1.45 */*-*/Noun 0.84 */*-*/Verb 0.72 */*-*/Prep 0.54 */*-*/Modif 0.48 ...Using these parameter values, we can compute the probability of an event. For example, un-normalized score for event "assign Noun for I under the context BOS/BOS" is,
Similarly, q(Verb|...)=0.72, q(Prep|...)=0.54, q(Modif|...)=0.48. Hence, the probability of the event of assining Noun is,q(Noun|BOS/BOS-I/*)= prod( a_i^f_i(Noun|BOS/BOS-I/*) )= prod( 8.03 * 1.45 * 0.84 )= 9.78
p(Noun|BOS/BOS-I/*)= q(Noun|BOS/BOS-I/*) / Z_y= 9.78 / (9.78+0.72+0.54+0.48)= 0.849