|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--opennlp.grok.preprocess.mwe.LexicalMWE
A Fixed Lexicon Multi-Word Expression finder that uses a list of MWE to determine if a sequence of words is in the MWE model.
This finds common multi-word expressions which are completely fixed in English. Examples are "ad hoc", "au pair". Most are foreign language expressions which have been borrowed by English, although they might be analysable in their native language, the consitutent words make no sense when analysed with English grammar except as part of the MWE. Rather than extend the grammar to include these special usages it is much easier to treat the whole MWE as a lexicon entry with the right POS, semantic, etc. tags.
Token tagging is delayed to a later stage in the pipeline.
<?xml version="1.0" encoding="UTF-8"?> <nlpDocument> <text> <p> <s> <t> <w>ad</w> </t> <t> <w>hoc</w> </t> </s> </p> </text> </nlpDocument> is transformed to: <?xml version="1.0" encoding="UTF-8"?> <nlpDocument> <text> <p> <s> <t type="mwe"> <w>ad</w> <w>hoc</w> </t> </s> </p> </text> </nlpDocument>This class implements the matching algorithm, while subclasses are used to load particular models (lists on MWEs).
It requires a Tokenizer class to be ahead of it in the pipeline.
There is no EventCollector defined, as this does not use the maximum entropy algorithm.
Field Summary | |
protected MWEModel |
model
The multi-word expression model to use to find MWEs. |
Constructor Summary | |
protected |
LexicalMWE()
Constructor for the LexicalMWE object |
|
LexicalMWE(MWEModel mod)
Constructor for the LexicalMWE object |
Method Summary | |
opennlp.maxent.EventCollector |
getEventCollector(java.io.Reader r)
Gets the EventCollector attribute of the LexicalMWE object |
java.lang.String |
getNegativeOutcome()
Gets the NegativeOutcome attribute of the LexicalMWE object |
void |
localEval(opennlp.maxent.MaxentModel model,
java.io.Reader r,
opennlp.maxent.Evalable e,
boolean verbose)
NOT IMPLEMENTED |
void |
process(opennlp.common.xml.NLPDocument doc)
Find the Fixed Lecical Multi-Word Expressions in a document. |
java.util.Set |
requires()
It requires a Tokenizer class to be ahead of it in the pipeline. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected MWEModel model
Constructor Detail |
public LexicalMWE(MWEModel mod)
mod
- The model to use to find MWEs.protected LexicalMWE()
Method Detail |
public java.lang.String getNegativeOutcome()
getNegativeOutcome
in interface opennlp.maxent.Evalable
public opennlp.maxent.EventCollector getEventCollector(java.io.Reader r)
getEventCollector
in interface opennlp.maxent.Evalable
r
- Source of the event collector data.
null
public void localEval(opennlp.maxent.MaxentModel model, java.io.Reader r, opennlp.maxent.Evalable e, boolean verbose)
localEval
in interface opennlp.maxent.Evalable
model
- Maximum Entropy models are not used by this class.r
- Source of the data.e
- verbose
- public void process(opennlp.common.xml.NLPDocument doc)
process
in interface opennlp.common.preprocess.Pipelink
doc
- A JDOM document which contains the results of previous pipe stages processing.public java.util.Set requires()
requires
in interface opennlp.common.preprocess.Pipelink
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |