Junk Tagger
An English POS tagger using ME models with unsupervised HMMs described
in
"A Maximum Entropy Tagger with Unsupervised Hidden Markov Models"
Jun'ichi Kazama, Yusuke Miyao, and Jun'ichi Tsujii,
In Proceedings of the Sixth Natural Language Processing Pacific Rim
Symposium (NLPRS2001), pp. 333--340
Download
2003/5/21
Junk0.41dist_genia_2003_5_21.tar.gz
- binaries for Linux/x86.
- source codes of the tagger and HMM estimation.
- parameters of HMMs and ME models for WSJ (news) and GENIA
(biomedical) corpus.
As the tool for estimating ME models I used the Amis developed by
Yusuke Miyao. (see http://www-tsujii.is.s.u-tokyo.ac.jp/~yusuke/amis/index.html)
This package is currently not compilable with GCC3 and the released
Amis packages.
I'm currently working on making the package comatible with GCC3 and the
latest Amis.
2003/6/15
Junk0.42_2003_6_15.tar.gz
What's new?
- now compatible with GCC3.2 and the latest Amis (Amis3,
ver0.29; as of 2003/6/12).
- contains binaries for Linux/x86 (includes the above Amis
binary for those who want to train ME models).
- training procedures are partially documented.
2003/10/16
Junk0.42_1.2003_10_16.tar.gz (I pulled this version since I found
critical bugs)
What's new?
- bug fix
- add parameters from new GENIA POS corpus (Ver 3.01)
(use Ruby/tagger2genia2.rb,
DATA/GENIA/TATEPOS2)
2004/2/13
Junk0.43_2004_2_13.tar.gz
Junk0.43_2004_3_4.tar.gz
- fixed the critical bugs in the previous version
Future Direction
- Clean up the source codes. There are a lot of dead codes
abandoned during the research.
- Re-write using a more stable (i.e., keeping backward
compatibility or rarely changing) language such as Java.
Jun'ichi Kazama