opennlp.grok.preprocess.tokenize
Class TokenizerME
java.lang.Object
|
+--opennlp.grok.preprocess.tokenize.TokenizerME
- All Implemented Interfaces:
- opennlp.common.preprocess.Pipelink, opennlp.common.preprocess.Tokenizer
- Direct Known Subclasses:
- EnglishTokenizerME
- public class TokenizerME
- extends java.lang.Object
- implements opennlp.common.preprocess.Tokenizer
A Tokenizer for converting raw text into separated tokens. It uses Maximum
Entropy to make its decisions. The features are loosely based off of Jeff
Reynar's UPenn thesis "Topic Segmentation: Algorithms and Applications.",
which is available from his homepage: .
- Version:
- $Revision: 1.12 $, $Date: 2002/11/26 03:27:51 $
- Author:
- Tom Morton
Constructor Summary |
TokenizerME(opennlp.maxent.MaxentModel mod)
Class constructor which takes the string locations of the information
which the maxent model needs. |
Method Summary |
static void |
main(java.lang.String[] args)
Trains a new model. |
void |
process(opennlp.common.xml.NLPDocument doc)
Tokenize an NLPDocument. |
java.util.Set |
requires()
|
java.lang.String[] |
tokenize(java.lang.String s)
Tokenize a String. |
static void |
train(java.lang.String[] args)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TokenizerME
public TokenizerME(opennlp.maxent.MaxentModel mod)
- Class constructor which takes the string locations of the information
which the maxent model needs.
process
public void process(opennlp.common.xml.NLPDocument doc)
- Tokenize an NLPDocument.
- Specified by:
process
in interface opennlp.common.preprocess.Pipelink
requires
public java.util.Set requires()
- Specified by:
requires
in interface opennlp.common.preprocess.Pipelink
tokenize
public java.lang.String[] tokenize(java.lang.String s)
- Tokenize a String.
- Specified by:
tokenize
in interface opennlp.common.preprocess.Tokenizer
- Parameters:
s
- The string to be tokenized.
- Returns:
- A string array containing individual tokens as elements.
train
public static void train(java.lang.String[] args)
main
public static void main(java.lang.String[] args)
- Trains a new model. Call from the command line with "java opennlp.grok.preprocess.tokenize.TokenizerME trainingdata modelname"
Copyright © 2003 Jason Baldridge and Gann Bierner. All Rights Reserved.