Junk Tagger

An English POS tagger using ME models with unsupervised HMMs described in
"A Maximum Entropy Tagger with Unsupervised Hidden Markov Models"
Jun'ichi Kazama, Yusuke Miyao, and Jun'ichi Tsujii,
In Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001), pp. 333--340

Download

2003/5/21
Junk0.41dist_genia_2003_5_21.tar.gz
 -  binaries for Linux/x86.
 -  source codes of the tagger and HMM estimation.
 -  parameters of HMMs and ME models for WSJ (news) and GENIA (biomedical) corpus.

As the tool for estimating ME models I used the Amis developed by Yusuke Miyao.  (see http://www-tsujii.is.s.u-tokyo.ac.jp/~yusuke/amis/index.html)

This package is currently not compilable with GCC3 and the released Amis packages.
I'm currently working on making the package comatible with GCC3 and the latest Amis.

2003/6/15
Junk0.42_2003_6_15.tar.gz
  What's new?
  - now compatible with GCC3.2 and the latest Amis (Amis3, ver0.29; as of 2003/6/12).
  - contains binaries for Linux/x86 (includes the above Amis binary for those who want to train ME models).
  - training procedures are partially documented.

2003/10/16
Junk0.42_1.2003_10_16.tar.gz (I pulled this version since I found critical bugs)
   What's new?
   - bug fix
   - add parameters from new GENIA POS corpus  (Ver 3.01)
     (use Ruby/tagger2genia2.rb, DATA/GENIA/TATEPOS2)

2004/2/13
Junk0.43_2004_2_13.tar.gz
Junk0.43_2004_3_4.tar.gz
   - fixed the critical bugs in the previous version


Future Direction
  - Clean up the source codes. There are a lot of dead codes abandoned during the research.
  - Re-write using a more stable (i.e., keeping backward compatibility or rarely changing) language such as Java.



Jun'ichi Kazama