opennlp.tools.chunker
Class ChunkerME

java.lang.Object
  extended by opennlp.tools.chunker.ChunkerME
All Implemented Interfaces:
Chunker
Direct Known Subclasses:
ParserChunker, TreebankChunker

public class ChunkerME
extends java.lang.Object
implements Chunker

The class represents a maximum-entropy-based chunker. Such a chunker can be used to find flat structures based on sequence inputs such as noun phrases or named entities.


Field Summary
protected  BeamSearch beam
          The beam used to search for sequences of chunk tag assignments.
protected  opennlp.maxent.MaxentModel model
          The model used to assign chunk tags to a sequence of tokens.
 
Constructor Summary
ChunkerME(opennlp.maxent.MaxentModel mod)
          Creates a chunker using the specified model.
ChunkerME(opennlp.maxent.MaxentModel mod, ChunkerContextGenerator cg)
          Creates a chunker using the specified model and context generator.
ChunkerME(opennlp.maxent.MaxentModel mod, ChunkerContextGenerator cg, int beamSize)
          Creates a chunker using the specified model and context generator and decodes the model using a beam search of the specified size.
 
Method Summary
 java.util.List chunk(java.util.List toks, java.util.List tags)
          Generates chunk tags for the given sequence returning the result in a list.
 java.lang.String[] chunk(java.lang.Object[] toks, java.lang.String[] tags)
          Generates chunk tags for the given sequence returning the result in an array.
static void main(java.lang.String[] args)
          Trains the chunker using the specified parameters.
 double[] probs()
          Returns an array with the probabilities of the last decoded sequence.
 void probs(double[] probs)
          Populates the specified array with the probabilities of the last decoded sequence.
static opennlp.maxent.GISModel train(opennlp.maxent.EventStream es, int iterations, int cut)
          Trains a new model for the ChunkerME.
protected  boolean validOutcome(java.lang.String outcome, java.lang.String[] sequence)
          This method determines wheter the outcome is valid for the preceeding sequence.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

beam

protected BeamSearch beam
The beam used to search for sequences of chunk tag assignments.


model

protected opennlp.maxent.MaxentModel model
The model used to assign chunk tags to a sequence of tokens.

Constructor Detail

ChunkerME

public ChunkerME(opennlp.maxent.MaxentModel mod)
Creates a chunker using the specified model.

Parameters:
mod - The maximum entropy model for this chunker.

ChunkerME

public ChunkerME(opennlp.maxent.MaxentModel mod,
                 ChunkerContextGenerator cg)
Creates a chunker using the specified model and context generator.

Parameters:
mod - The maximum entropy model for this chunker.
cg - The context generator to be used by the specified model.

ChunkerME

public ChunkerME(opennlp.maxent.MaxentModel mod,
                 ChunkerContextGenerator cg,
                 int beamSize)
Creates a chunker using the specified model and context generator and decodes the model using a beam search of the specified size.

Parameters:
mod - The maximum entropy model for this chunker.
cg - The context generator to be used by the specified model.
beamSize - The size of the beam that should be used when decoding sequences.
Method Detail

chunk

public java.util.List chunk(java.util.List toks,
                            java.util.List tags)
Description copied from interface: Chunker
Generates chunk tags for the given sequence returning the result in a list.

Specified by:
chunk in interface Chunker
Parameters:
toks - a list of the tokens or words of the sequence.
tags - a list of the pos tags of the sequence.
Returns:
a list of chunk tags for each token in the sequence.

chunk

public java.lang.String[] chunk(java.lang.Object[] toks,
                                java.lang.String[] tags)
Description copied from interface: Chunker
Generates chunk tags for the given sequence returning the result in an array.

Specified by:
chunk in interface Chunker
Parameters:
toks - an array of the tokens or words of the sequence.
tags - an array of the pos tags of the sequence.
Returns:
an array of chunk tags for each token in the sequence.

validOutcome

protected boolean validOutcome(java.lang.String outcome,
                               java.lang.String[] sequence)
This method determines wheter the outcome is valid for the preceeding sequence. This can be used to implement constraints on what sequences are valid.

Parameters:
outcome - The outcome.
sequence - The precceding sequence of outcome assignments.
Returns:
true is the outcome is valid for the sequence, false otherwise.

probs

public void probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk. The specified array should be at least as large as the numbe of tokens in the previous call to chunk.

Parameters:
probs - An array used to hold the probabilities of the last decoded sequence.

probs

public double[] probs()
Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk.

Returns:
An array with the same number of probabilities as tokens were sent to chunk when it was last called.

train

public static opennlp.maxent.GISModel train(opennlp.maxent.EventStream es,
                                            int iterations,
                                            int cut)
                                     throws java.io.IOException
Trains a new model for the ChunkerME.

Parameters:
es -
iterations -
cut -
Returns:
the new model
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Trains the chunker using the specified parameters.
Usage: ChunkerME trainingFile modelFile.
Training file should be one word per line where each line consists of a space-delimited triple of "word pos outcome". Sentence breaks are indicated by blank lines.

Parameters:
args - The training file and the model file.
Throws:
java.io.IOException - When the specifed files can not be read.


Copyright 2008 Jason Baldridge, Gann Bierner, and Thomas Morton. All Rights Reserved.