Amis - A maximum entropy estimator for feature forests

April 10th, 2002
Yusuke Miyao
Department of Computer Science, University of Tokyo
yusuke@is.s.u-tokyo.ac.jp
Japanese version

Contents


Introduction

This software is a parameter estimator for maximun entropy models [1]. Given a set of events as training data, the program outputs parameters that maximize the likelihood of the training data. The software supports the following functions.

A maximum entropy model gives a conditional probability p(x|y) of an event e=<x,y>, where x is a target event and y is a history event. Characteristics of an event are represented with a bundle of feature functions (or feature in short) f_i. Each characteristic corresponds to each i, and f_i(x|y) takes the non-zero positive value when an event <x,y> has the i-th characteristic.

Given an event e=<x,y> and feature functions f_i, a maximum entropy model gives a probability by the following formula.

p(x|y) = 1/(Z_y) prod( a_i^f_i(x,y) )
a_i is a model parameter, and intuitively, it represents the weight of a feature f_i. Z_y is a normalizing factor for letting the summation of probabilities p(x|y) be 1 for all possible x under a given y.

Given a set of feature functions and training data of observed events as an input, Amis computes and outputs the optimal parameters a_i. The program supports several algorithms for parameter estimation: IIS, GIS, BFGS, and their modified version for supporting feature forests.


Requirements

Hardware

CPU
PentiumIII 500MHz or above
Memory
256 MB or above
Hard disk
50MB or above

The program can be compiled and run on IA machines of the above specs, or SPARC machines of the equivalent specs. More memory/hard disk will be required depending on the size of input data.

Software

OS
Linux, Solaris
Compiler
G++ 2.95.4
Library
Standard C++ Library

Installation and startup

Installation

Amis supports "configure" script, and is compiled and installed by the following instructions. (In the following examples, % represents a command prompt.)

  1. Run "configure" script
    % ./configure
    As a default, the executable is installed in /usr/local/. When you want to install in another directory ($DIR in the example), specify the option as following.
    % ./configure --prefix=$DIR
    Other than "--prefix", various options are supported.
    OptionDefaultValid valuesEffect
    --enable-debugno0 - 5 or no Specify whether debug messages are printed or not. The greater value is given, and more messages are printed.
    --enable-profileno0 - 5 or no Specify whether profiling (measuring the execution time of each function) is enabled or not. The greater value is specified, and more functions are profiled.
    --enable-ambiguitynono or yes Specify whether ambiguous events are supported or not. If you enable the support for ambiguous events, the program gets slow very much. You should specify "no" when ambiguous events are not required.
    For other options, see the help message of configure script (printed by help option) or manuals.
  2. Compile the program by "make" command
    % make
    The executable "amis" is created.
  3. Check whether the compilation is successful or not.
    % make check
  4. Install the executable and manuals.
    % make install

The above instructions will install "/usr/local/bin/amis".

Startup

To start up Amis, execute "amis" with an argument specifying a configuration file (described later).

% amis [configuration file]
When you omit the argument, "amis.conf" will be a configuration file as a default. When the configuration file is not found (or cannot be read), the program stops with an error.

You can specify the following options at startup-time.

OptionDefaultValid valuesEffect
-hnothingnothingPrint help messages
-fbinary[binary|integer|real] Specify the type of features
-mamis.modelfile name Specify the name of a model file (described later)
-eamis.eventfile name Specify the name of an event file (described later)
-oamis.outputfile name Specify the name of an output file (described later)
-lamis.logfile name Specify the name of a log file (described later)
-dAmis[Amis|CMEE] Specify the data file format
-aIIS[IIS|GIS|BFGS] Specify the parameter estimation algorithm
-i200non-zero positive value Specify the number of iterations
-n200non-zero positive value Specify the number of iterations in Newton's method
-s5non-zero positive value Specify the memory size for limited-memory BFGS
-r1non-zero positive value Specify the interval of printing the progress of computation
-p6non-zero positive value Specify the significant digits

The configuration is enabled in the following order.

  1. Startup options
  2. Configuration file
  3. Default values


External specs

This program is composed of the following modules

Model file and event file are processed by the responsible modules, and model object and event space object are created. Given model and event space objects, the algorithm module computes parameter estimation and outputs the model data. The property module reads a configuration file and controlls the other modules.

       Model file
       Event file            Configuration file
            |                        |
            V                        V
 -----------------------      ---------------
|                       |    |               |
|   Data file manager   |<---|   Property    |
|                       |    |               |
 -----------------------      ---------------
   |             |                   |
   |             V                   V
   |        -----------       ---------------
   |       |           |     |               |
   |       |   Model   |---->|               |
   |       |           |     |               |
   |        -----------      |   Algorithm   |---> Output model
   |    ---------------      |               |
   |   |               |     |               |
   --->|  Event space  |---->|               |
       |               |     |               |
        ---------------       ---------------
                                     |
                          -----------------------
                         |   Model expectation   |
                         | Empirical expectation |
                          -----------------------

Input/output

This section explains the data file format: Amis format and AmisTree format. While this software also accepts CMEE format, Amis format is more flexible. AmisTree format is used for estimating maximum entropy models for feature forests.

For each file, # to the end of line is a comment and ignored. Comments are treated as a space. Each token is separated by spaces or tabs, and "new line" represents the end of line. Colons (:) are a special character. When you want to use these special characters as a part of tokens, escape the character with a backslash (\). A backslash itself is represented as \\.

Input (Amis format)

To use amis, configuration filemodel file, and event file should be prepared. Each of them is explained below.

Configuration file

In a configuration file, specify the name of an option and its value. See the following example.

DATA_FORMAT     Amis
FEATURE_TYPE    integer
MODEL_FILE      me.model
EVENT_FILE      me.event
OUTPUT_FILE     me.output
LOG_FILE        me.log
ESTIMATION_ALGORITHM    IIS
NUM_ITERATIONS  200
NUM_NEWTON_ITERATIONS   20
REPORT_INTERVAL 1
PRECISION       6

The following options are available.

OptionDefaultValid valuesEffect
DATA_FORMATAmisAmis CMEE AmisTree Specify the data file format
FEATURE_TYPEbinarybinary integer real Specify the type of features
MODEL_FILEamis.modelfile names Specify the names of model files
EVENT_FILEamis.eventfile names Specify the names of event files
OUTPUT_FILEamis.outputfile name Specify the name of an output file
LOG_FILEamis.logfile name Specify the name of a log file
ESTIMATION_ALGORITHMIISIIS GIS BFGS Specify the algorithm for parameter estimation.
FEATURE_COUNT_HASHFALSETRUE FALSE Specify the class ("map" or "vector") used for the factoring optimization of IIS algorithms. As a default, "vector" is used.
NUM_ITERATIONS200non-zero positive value Specify the number of iterations
MEMORY_SIZE5non-zero positive value Specify the memory size for limited-memory BFGS
REPORT_INTERVAL1non-zero positive value Specify the interval of reporting the progress of computation
PRECISION6non-zero positive value Specify the number of significant digits
EVENT_ON_FILEFALSETRUE FALSE Specify whether the event data is put on memory (default), or on a file. This option is used when the event data is too large and the memory is unsufficient.
EVENT_ON_FILE_NAMEamis.event.tmpfile name Specify the file of storing event data. Effective only when EVENT_ON_FILE option is enabled
Options specified by the configuration file can be overwritten by startup options.

Model file

A model file gives a set of feature functions and corresponding initial parameters. See the following example.

feature1    1.0
feature2    2.0
feature3    0.3
...

Each line corresponds to each feature. First, you specify the name of a feature. For feature names, you can use any characters except for spaces, tabs, colons (:), and pounds (#). Next, following spaces or tabs, specify the initial parameter of the feature. Initial values are given by C-style floating point values. Initial parameters can be any positive values (usually, 1.0). Only two tokens, feature name and parameter, can be specified in each line.

Event file

An event file gives events as training data. In detail, you specify activated features for an observed event and its complement events. Here, an observed event is an event observed in the training data. A complement event is an event whose target event is substituted from the observed event. As descrbied in Introduction, we must normalize the summation of probabilities for all events which can be observed under the same history event of the observed event. Such events are given via complement events. To sum up, each event is represented with an observed event and a set of events observable under the same history event.

See the following example.

event_1
1    feature1:2 feature3
0    feature1
0    feature2:3

event_2
0    feature2 feature3:5
1    feature1
0    feature3

...

Each block separated by blank lines corresponds to one event. In the first line, you specify the name of an event. You can use any characters except for special characters mentioned above. Other lines represent an observed event or complement events. At the beginning of a line, specify the number of times the event observed. For an observed event, it should be non-zero positive value, and for a complement event it should be zero. When you have enabled ambiguous events, you can specify positive real values. Otherwise, the number of observed times must be an integer and only one observed event is permitted for one event description. Next, enumerate activated features for an event. Each feature must be defined in a model file. If you specified a feature not found in a model file, it would be an error. The value of a feature function can be specified following the feature name. As in the above example, specify the feature value following a colon (:). When omitted, it will be 1.

Each event description is separated by blank lines. Note that a line with only comments is also treated as a blank line.

Input (AmisTree format)

When you use an estimation algorithm for feature forests, an event file must be written in AmisTree format. Configuration and model files are the same as Amis format. This section describes the format of event files.

An event file in AmisTree format is as following.

event_1  2
feature1:2 feature2:3 feature3
{ dnode_1 ( node_1 feature1:2 { dnode_2 ( node_2
feature2:3 ) ( node_3 ) } { dnode_3 $node_2 (
node_4 feature3 ) } ) }

event_2  1
feature2:3
{ dnode_1 ( node_1 feature1 ) ( node_2 { dnode_2
( node_3 feature2:3 ) ( node_4 feature3 ) } ) }

...

Note: The third and following lines in each event description must be in deed represented in a line.

As in the Amis format, blank lines separates each event description. In the AmisTree format, an event is represented with three lines. The first line specifies the name of an event and the number of times of the observed event. In the above example, event_1 is observed twice, and event_2 once. In the second line, you enumerate activated features of an observed event. As in the Amis format, specify the name of a feature together with its value. The third line represents an observed event and complement events in a feature forest. Disjunctive nodes are represented with curly braces. Between the curly braces, the name of a node is first specified, and conjunctive nodes follow. Conjunctive nodes are represented with round brackets. Between the round brackets, the name of a node is first specified, and activated features follow. Feature descriptions are the same as Amis format. You can also specify disjunctive nodes as daughter nodes. Node names are used to represent structure-sharing. Already appeared nodes can be refered by "$" followed by the node name. In event_1 in the above example, $node2 represents the sharing with node2 already appeared. By using node sharing and pack the feature forest smaller, the computational complexity reduces, and the computation will be accelerated. You can use any characters except for special characters for node names and feature names.

Do not forget to put spaces before and after curly braces and round brackets. Without spaces, they will be treated as a part of node names or feature names.

Output

An output format is common to Amis and AmisTree format.

Amis outputs pairs of feature and parameter. The output format is the same as a model file. Since the output format is the same as a model file, the output file can be reused as an input of the new computation. That is, we can further progress the parameter estimation given already estimated data.

Parameters a_i corresponding to feature functions are output, and we can compute the probability of an unknown event by the product of a_i for all activated features.


Example

Along with a simple example, I describe how to use the estimator. Suppose we are making a maximum entropy model for POS tagging. For example, to sentence "I like handball", we can assign POS to each word, like "I/Noun like/Verb handball/Noun". We consider the tagging task as a sequence of events, each of which is the task to assign one of tags to each word given a previous word and tag. For example, when we are looking at "like", we select a tag (target) for "like" under the context (history) of having "I/Noun" as a previous word. In the Amis format, tagging events are expressed as following ("BOS" is for "Beginnig Of Sentence").

event_BOB/BOS-I/Noun
1  BOS/BOS-I/Noun */*-I/Noun */*-*/Noun
0  */*-*/Verb
0  */*-*/Prep
0  */*-*/Modif

event_I/Noun-like/Verb
0  I/Noun-like/Noun */Noun-like/Noun */*-like/Noun */*-*/Noun
1  I/Noun-like/Verb */Noun-like/Verb */*-like/Verb */*-*/Verb
0  */Noun-like/Prep */*-like/Prep */*-*/Prep
0  */*-*/Modif

event_like/Verb-handball/Noun
...

Each block separated by a blank line corresponds to each event. The first line of each block is a name of an event, which has in fact no effect to the estimator (just for human readability). Other lines describe the features activated for the event. Each line corresponds to each target, in this example, Noun, Verb, Prep, and Modif. Every event description must have every line for all possible targets for the history. The beginning of the line describes whether the event is "observed" or not. If the pair of target and history corresponding to the line is observed in the training data, it has the positive number. If it is not observed, for example "assign Verb for I", it is 0.

The model file will be as follows.

BOS/BOS-I/Noun  1.0
*/*-I/Noun      1.0
*/*-*/Noun      1.0
*/*-*/Verb      1.0
*/*-*/Prep      1.0
*/*-*/Modif     1.0
...

Running the estimator, and we get will get the output file like this:

BOS/BOS-I/Noun  8.03
*/*-I/Noun      1.45
*/*-*/Noun      0.84
*/*-*/Verb      0.72
*/*-*/Prep      0.54
*/*-*/Modif     0.48
...
Using these parameter values, we can compute the probability of an event. For example, un-normalized score for event "assign Noun for I under the context BOS/BOS" is,
q(Noun|BOS/BOS-I/*)
= prod( a_i^f_i(Noun|BOS/BOS-I/*) )
= prod( 8.03 * 1.45 * 0.84 )
= 9.78
Similarly, q(Verb|...)=0.72, q(Prep|...)=0.54, q(Modif|...)=0.48. Hence, the probability of the event of assining Noun is,
p(Noun|BOS/BOS-I/*)
= q(Noun|BOS/BOS-I/*) / Z_y
= 9.78 / (9.78+0.72+0.54+0.48)
= 0.849

Internal specs

For internal specifications, see other files

Misc. (restrictions and known bugs)


References

[1] Adam L. Berger, Stephen A. Della Pietra and Vincent J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39-71, 1996.

[2] Yusuke Miyao and Jun'ichi Tsujii. Maximum entropy estimation for feature forests. In Proc. HLT2002.

[3] Stephen A. Della Pietra, Vincent J. Della Pietra and John Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, 1997.

[4] Jorge Nocedal. Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35:773-783, 1980.

[5] Andrew Borthwick. ChoiceMaker Maximum Entropy Estimator. ChoiceMaker Technologies, Inc. Email borthwic@cs.nyu.edu for information. 1999.