public final class JcmsTokenizer
extends org.apache.lucene.analysis.Tokenizer
This should be a good tokenizer for most European-language documents:
| Modifier and Type | Field and Description |
|---|---|
static int |
ACRONYM |
static int |
ACRONYM_DEP |
static int |
ALPHANUM |
static int |
APOSTROPHE |
static int |
CJ |
static int |
COMPANY |
static int |
EMAIL |
static int |
HOST |
static int |
NUM |
static java.lang.String[] |
TOKEN_TYPES
String token types that correspond to token type int constants
|
| Constructor and Description |
|---|
JcmsTokenizer()
Creates a new instance of the
JcmsTokenizer. |
JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)
Creates a new JcmsTokenizer with a given
AttributeFactory |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
void |
end() |
int |
getMaxTokenLength()
Retrieve the max allowed token length
|
boolean |
incrementToken() |
void |
reset() |
void |
setMaxTokenLength(int length)
Set the max allowed token length.
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic static final int ALPHANUM
public static final int APOSTROPHE
public static final int ACRONYM
public static final int COMPANY
public static final int EMAIL
public static final int HOST
public static final int NUM
public static final int CJ
public static final int ACRONYM_DEP
public static final java.lang.String[] TOKEN_TYPES
public JcmsTokenizer()
JcmsTokenizer. Attaches
the input to the newly created JFlex scanner.
See http://issues.apache.org/jira/browse/LUCENE-1068public JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)
AttributeFactoryfactory - the attribute factory to usepublic void setMaxTokenLength(int length)
length - a length, must be greated than zero, default is 255public int getMaxTokenLength()
setMaxTokenLength(int)public final boolean incrementToken()
throws java.io.IOException
incrementToken in class org.apache.lucene.analysis.TokenStreamjava.io.IOExceptionpublic final void end()
throws java.io.IOException
end in class org.apache.lucene.analysis.TokenStreamjava.io.IOExceptionpublic void close()
throws java.io.IOException
close in interface java.io.Closeableclose in interface java.lang.AutoCloseableclose in class org.apache.lucene.analysis.Tokenizerjava.io.IOExceptionpublic void reset()
throws java.io.IOException
reset in class org.apache.lucene.analysis.Tokenizerjava.io.IOExceptionCopyright © 2001-2017 Jalios SA. All Rights Reserved.