|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--opennlp.grok.preprocess.sentdetect.SentenceDetectorME
A sentence detector for splitting up raw text into sentences. A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.
Constructor Summary | |
SentenceDetectorME(opennlp.maxent.MaxentModel m)
Constructor which takes a MaxentModel and calls the three-arg constructor with that model, an SDContextGenerator, and the default end of sentence scanner. |
|
SentenceDetectorME(opennlp.maxent.MaxentModel m,
opennlp.maxent.ContextGenerator cg)
Constructor which takes a MaxentModel and a ContextGenerator. |
|
SentenceDetectorME(opennlp.maxent.MaxentModel m,
opennlp.maxent.ContextGenerator cg,
EndOfSentenceScanner s)
Creates a new SentenceDetectorME instance. |
Method Summary | |
protected boolean |
isAcceptableBreak(java.lang.String s,
int fromIndex,
int candidateIndex)
Allows subclasses to check an overzealous (read: poorly trained) model from flagging obvious non-breaks as breaks based on some boolean determination of a break's acceptability. |
static void |
main(java.lang.String[] args)
Trains a new sentence detection model. |
void |
process(opennlp.common.xml.NLPDocument doc)
Sentence detect a document. |
java.util.Set |
requires()
|
java.lang.String[] |
sentDetect(java.lang.String s)
Detect sentences in a String. |
int[] |
sentPosDetect(java.lang.String s)
Detect the position of the first words of sentences in a String. |
static opennlp.maxent.GISModel |
train(opennlp.maxent.EventStream es,
int iterations,
int cut)
|
static opennlp.maxent.GISModel |
train(java.io.File inFile,
int iterations,
int cut,
EndOfSentenceScanner scanner)
Use this training method if you wish to supply an end of sentence scanner which provides a different set of ending chars than the default one, which is "\\.|!|\\?|\\\"|\\)". |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public SentenceDetectorME(opennlp.maxent.MaxentModel m)
m
- The MaxentModel which this SentenceDetectorME will use to
evaluate end-of-sentence decisions.public SentenceDetectorME(opennlp.maxent.MaxentModel m, opennlp.maxent.ContextGenerator cg)
m
- The MaxentModel which this SentenceDetectorME will use to
evaluate end-of-sentence decisions.cg
- The ContextGenerator object which this SentenceDetectorME
will use to turn Strings into contexts for the model to
evaluate.public SentenceDetectorME(opennlp.maxent.MaxentModel m, opennlp.maxent.ContextGenerator cg, EndOfSentenceScanner s)
SentenceDetectorME
instance.
m
- The MaxentModel which this SentenceDetectorME will use to
evaluate end-of-sentence decisions.cg
- The ContextGenerator object which this SentenceDetectorME
will use to turn Strings into contexts for the model to
evaluate.s
- the EndOfSentenceScanner which this SentenceDetectorME
will use to locate end of sentence indexes.Method Detail |
public void process(opennlp.common.xml.NLPDocument doc)
process
in interface opennlp.common.preprocess.Pipelink
doc
- The NLPDocument on which to perform sentence detection.public java.util.Set requires()
requires
in interface opennlp.common.preprocess.Pipelink
public java.lang.String[] sentDetect(java.lang.String s)
sentDetect
in interface opennlp.common.preprocess.SentenceDetector
s
- The string to be processed.
public int[] sentPosDetect(java.lang.String s)
sentPosDetect
in interface opennlp.common.preprocess.SentenceDetector
s
- The string to be processed.
protected boolean isAcceptableBreak(java.lang.String s, int fromIndex, int candidateIndex)
The implementation here always returns true, which means that the MaxentModel's outcome is taken as is.
s
- the string in which the break occured.fromIndex
- the start of the segment currently being evaluatedcandidateIndex
- the index of the candidate sentence ending
public static opennlp.maxent.GISModel train(opennlp.maxent.EventStream es, int iterations, int cut) throws java.io.IOException
java.io.IOException
public static opennlp.maxent.GISModel train(java.io.File inFile, int iterations, int cut, EndOfSentenceScanner scanner) throws java.io.IOException
java.io.IOException
public static void main(java.lang.String[] args) throws java.io.IOException
Trains a new sentence detection model.
Usage: java opennlp.grok.preprocess.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?
java.io.IOException
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |