public final class JcmsTokenizer
extends org.apache.lucene.analysis.Tokenizer
This should be a good tokenizer for most European-language documents:
Modifier and Type | Field and Description |
---|---|
static int |
ACRONYM |
static int |
ACRONYM_DEP |
static int |
ALPHANUM |
static int |
APOSTROPHE |
static int |
CJ |
static int |
COMPANY |
static int |
EMAIL |
static int |
HOST |
static int |
NUM |
static java.lang.String[] |
TOKEN_TYPES
String token types that correspond to token type int constants
|
Constructor and Description |
---|
JcmsTokenizer()
Creates a new instance of the
JcmsTokenizer . |
JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)
Creates a new JcmsTokenizer with a given
AttributeFactory |
Modifier and Type | Method and Description |
---|---|
void |
close() |
void |
end() |
int |
getMaxTokenLength()
Retrieve the max allowed token length
|
boolean |
incrementToken() |
void |
reset() |
void |
setMaxTokenLength(int length)
Set the max allowed token length.
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public static final int ALPHANUM
public static final int APOSTROPHE
public static final int ACRONYM
public static final int COMPANY
public static final int EMAIL
public static final int HOST
public static final int NUM
public static final int CJ
public static final int ACRONYM_DEP
public static final java.lang.String[] TOKEN_TYPES
public JcmsTokenizer()
JcmsTokenizer
. Attaches
the input
to the newly created JFlex scanner.
See http://issues.apache.org/jira/browse/LUCENE-1068public JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)
AttributeFactory
factory
- the attribute factory to usepublic void setMaxTokenLength(int length)
length
- a length, must be greated than zero, default is 255public int getMaxTokenLength()
setMaxTokenLength(int)
public final boolean incrementToken() throws java.io.IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public final void end() throws java.io.IOException
end
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class org.apache.lucene.analysis.Tokenizer
java.io.IOException
public void reset() throws java.io.IOException
reset
in class org.apache.lucene.analysis.Tokenizer
java.io.IOException
Copyright © 2001-2018 Jalios SA. All Rights Reserved.