opennlp.grok.preprocess.sentdetect
Class SentenceDetectorME

java.lang.Object
  |
  +--opennlp.grok.preprocess.sentdetect.SentenceDetectorME
All Implemented Interfaces:
opennlp.common.preprocess.Pipelink, opennlp.common.preprocess.SentenceDetector
Direct Known Subclasses:
EnglishSentenceDetectorME

public class SentenceDetectorME
extends java.lang.Object
implements opennlp.common.preprocess.SentenceDetector

A sentence detector for splitting up raw text into sentences. A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.

Version:
$Revision: 1.13 $, $Date: 2002/02/08 12:13:37 $
Author:
Jason Baldridge

Constructor Summary
SentenceDetectorME(opennlp.maxent.MaxentModel m)
          Constructor which takes a MaxentModel and calls the three-arg constructor with that model, an SDContextGenerator, and the default end of sentence scanner.
SentenceDetectorME(opennlp.maxent.MaxentModel m, opennlp.maxent.ContextGenerator cg)
          Constructor which takes a MaxentModel and a ContextGenerator.
SentenceDetectorME(opennlp.maxent.MaxentModel m, opennlp.maxent.ContextGenerator cg, EndOfSentenceScanner s)
          Creates a new SentenceDetectorME instance.
 
Method Summary
protected  boolean isAcceptableBreak(java.lang.String s, int fromIndex, int candidateIndex)
          Allows subclasses to check an overzealous (read: poorly trained) model from flagging obvious non-breaks as breaks based on some boolean determination of a break's acceptability.
static void main(java.lang.String[] args)
          Trains a new sentence detection model.
 void process(opennlp.common.xml.NLPDocument doc)
          Sentence detect a document.
 java.util.Set requires()
           
 java.lang.String[] sentDetect(java.lang.String s)
          Detect sentences in a String.
 int[] sentPosDetect(java.lang.String s)
          Detect the position of the first words of sentences in a String.
static opennlp.maxent.GISModel train(opennlp.maxent.EventStream es, int iterations, int cut)
           
static opennlp.maxent.GISModel train(java.io.File inFile, int iterations, int cut, EndOfSentenceScanner scanner)
          Use this training method if you wish to supply an end of sentence scanner which provides a different set of ending chars than the default one, which is "\\.|!|\\?|\\\"|\\)".
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SentenceDetectorME

public SentenceDetectorME(opennlp.maxent.MaxentModel m)
Constructor which takes a MaxentModel and calls the three-arg constructor with that model, an SDContextGenerator, and the default end of sentence scanner.

Parameters:
m - The MaxentModel which this SentenceDetectorME will use to evaluate end-of-sentence decisions.

SentenceDetectorME

public SentenceDetectorME(opennlp.maxent.MaxentModel m,
                          opennlp.maxent.ContextGenerator cg)
Constructor which takes a MaxentModel and a ContextGenerator. calls the three-arg constructor with a default ed of sentence scanner.

Parameters:
m - The MaxentModel which this SentenceDetectorME will use to evaluate end-of-sentence decisions.
cg - The ContextGenerator object which this SentenceDetectorME will use to turn Strings into contexts for the model to evaluate.

SentenceDetectorME

public SentenceDetectorME(opennlp.maxent.MaxentModel m,
                          opennlp.maxent.ContextGenerator cg,
                          EndOfSentenceScanner s)
Creates a new SentenceDetectorME instance.

Parameters:
m - The MaxentModel which this SentenceDetectorME will use to evaluate end-of-sentence decisions.
cg - The ContextGenerator object which this SentenceDetectorME will use to turn Strings into contexts for the model to evaluate.
s - the EndOfSentenceScanner which this SentenceDetectorME will use to locate end of sentence indexes.
Method Detail

process

public void process(opennlp.common.xml.NLPDocument doc)
Sentence detect a document.

Specified by:
process in interface opennlp.common.preprocess.Pipelink
Parameters:
doc - The NLPDocument on which to perform sentence detection.

requires

public java.util.Set requires()
Specified by:
requires in interface opennlp.common.preprocess.Pipelink

sentDetect

public java.lang.String[] sentDetect(java.lang.String s)
Detect sentences in a String.

Specified by:
sentDetect in interface opennlp.common.preprocess.SentenceDetector
Parameters:
s - The string to be processed.
Returns:
A string array containing individual sentences as elements.

sentPosDetect

public int[] sentPosDetect(java.lang.String s)
Detect the position of the first words of sentences in a String.

Specified by:
sentPosDetect in interface opennlp.common.preprocess.SentenceDetector
Parameters:
s - The string to be processed.
Returns:
A integer array containing the positions of the beginning of every sentence

isAcceptableBreak

protected boolean isAcceptableBreak(java.lang.String s,
                                    int fromIndex,
                                    int candidateIndex)
Allows subclasses to check an overzealous (read: poorly trained) model from flagging obvious non-breaks as breaks based on some boolean determination of a break's acceptability.

The implementation here always returns true, which means that the MaxentModel's outcome is taken as is.

Parameters:
s - the string in which the break occured.
fromIndex - the start of the segment currently being evaluated
candidateIndex - the index of the candidate sentence ending
Returns:
true if the break is acceptable

train

public static opennlp.maxent.GISModel train(opennlp.maxent.EventStream es,
                                            int iterations,
                                            int cut)
                                     throws java.io.IOException
java.io.IOException

train

public static opennlp.maxent.GISModel train(java.io.File inFile,
                                            int iterations,
                                            int cut,
                                            EndOfSentenceScanner scanner)
                                     throws java.io.IOException
Use this training method if you wish to supply an end of sentence scanner which provides a different set of ending chars than the default one, which is "\\.|!|\\?|\\\"|\\)".

java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException

Trains a new sentence detection model.

Usage: java opennlp.grok.preprocess.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?

java.io.IOException


Copyright © 2003 Jason Baldridge and Gann Bierner. All Rights Reserved.