opennlp.grok.preprocess.mwe
Class LexicalMWE

java.lang.Object
  |
  +--opennlp.grok.preprocess.mwe.LexicalMWE
All Implemented Interfaces:
opennlp.maxent.Evalable, opennlp.common.preprocess.NameFinder, opennlp.common.preprocess.Pipelink
Direct Known Subclasses:
EnglishCommonFixedLexicalMWE, EnglishFixedLexicalMWE, EnglishVariableLexicalMWE

public class LexicalMWE
extends java.lang.Object
implements opennlp.common.preprocess.NameFinder, opennlp.maxent.Evalable

A Fixed Lexicon Multi-Word Expression finder that uses a list of MWE to determine if a sequence of words is in the MWE model.

This finds common multi-word expressions which are completely fixed in English. Examples are "ad hoc", "au pair". Most are foreign language expressions which have been borrowed by English, although they might be analysable in their native language, the consitutent words make no sense when analysed with English grammar except as part of the MWE. Rather than extend the grammar to include these special usages it is much easier to treat the whole MWE as a lexicon entry with the right POS, semantic, etc. tags.

Token tagging is delayed to a later stage in the pipeline.

 <?xml version="1.0" encoding="UTF-8"?>
 <nlpDocument>
   <text>
     <p>
       <s>
         <t>
           <w>ad</w>
         </t>
         <t>
           <w>hoc</w>
         </t>
       </s>
     </p>
   </text>
 </nlpDocument>

 is transformed to:

 <?xml version="1.0" encoding="UTF-8"?>
 <nlpDocument>
   <text>
     <p>
       <s>
         <t type="mwe">
           <w>ad</w>
           <w>hoc</w>
         </t>
       </s>
     </p>
   </text>
 </nlpDocument>

 
This class implements the matching algorithm, while subclasses are used to load particular models (lists on MWEs).

It requires a Tokenizer class to be ahead of it in the pipeline.

There is no EventCollector defined, as this does not use the maximum entropy algorithm.

Version:
$Revision: 1.1 $, $Date: 2002/03/12 12:51:20 $
Author:
Mike Atkinson

Field Summary
protected  MWEModel model
          The multi-word expression model to use to find MWEs.
 
Constructor Summary
protected LexicalMWE()
          Constructor for the LexicalMWE object
  LexicalMWE(MWEModel mod)
          Constructor for the LexicalMWE object
 
Method Summary
 opennlp.maxent.EventCollector getEventCollector(java.io.Reader r)
          Gets the EventCollector attribute of the LexicalMWE object
 java.lang.String getNegativeOutcome()
          Gets the NegativeOutcome attribute of the LexicalMWE object
 void localEval(opennlp.maxent.MaxentModel model, java.io.Reader r, opennlp.maxent.Evalable e, boolean verbose)
          NOT IMPLEMENTED
 void process(opennlp.common.xml.NLPDocument doc)
          Find the Fixed Lecical Multi-Word Expressions in a document.
 java.util.Set requires()
          It requires a Tokenizer class to be ahead of it in the pipeline.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

model

protected MWEModel model
The multi-word expression model to use to find MWEs.

Constructor Detail

LexicalMWE

public LexicalMWE(MWEModel mod)
Constructor for the LexicalMWE object

Parameters:
mod - The model to use to find MWEs.

LexicalMWE

protected LexicalMWE()
Constructor for the LexicalMWE object

Method Detail

getNegativeOutcome

public java.lang.String getNegativeOutcome()
Gets the NegativeOutcome attribute of the LexicalMWE object

Specified by:
getNegativeOutcome in interface opennlp.maxent.Evalable
Returns:
The NegativeOutcome value

getEventCollector

public opennlp.maxent.EventCollector getEventCollector(java.io.Reader r)
Gets the EventCollector attribute of the LexicalMWE object

Specified by:
getEventCollector in interface opennlp.maxent.Evalable
Parameters:
r - Source of the event collector data.
Returns:
always returns null

localEval

public void localEval(opennlp.maxent.MaxentModel model,
                      java.io.Reader r,
                      opennlp.maxent.Evalable e,
                      boolean verbose)
NOT IMPLEMENTED

Specified by:
localEval in interface opennlp.maxent.Evalable
Parameters:
model - Maximum Entropy models are not used by this class.
r - Source of the data.
e -
verbose -

process

public void process(opennlp.common.xml.NLPDocument doc)
Find the Fixed Lecical Multi-Word Expressions in a document.

Specified by:
process in interface opennlp.common.preprocess.Pipelink
Parameters:
doc - A JDOM document which contains the results of previous pipe stages processing.

requires

public java.util.Set requires()
It requires a Tokenizer class to be ahead of it in the pipeline.

Specified by:
requires in interface opennlp.common.preprocess.Pipelink
Returns:
contains the required classes which must be before this class in the pipeline.


Copyright © 2003 Jason Baldridge and Gann Bierner. All Rights Reserved.