Package com.jalios.jcms.search.analysis
Class JcmsTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- com.jalios.jcms.search.analysis.JcmsTokenizer
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
public final class JcmsTokenizer extends org.apache.lucene.analysis.TokenizerA grammar-based tokenizer constructed with JFlex, based on lucene default ClassicTokenizer.This should be a good tokenizer for most European-language documents:
- Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
- Splits words at hyphens, underscore and dash, unless there's a number in the token.
- Recognizes email addresses and internet hostnames as one token.
-
-
Field Summary
Fields Modifier and Type Field Description static intACRONYMstatic intACRONYM_DEPstatic intALPHANUMstatic intAPOSTROPHEstatic intCJstatic intCOMPANYstatic intEMAILstatic intHOSTstatic intNUMstatic java.lang.String[]TOKEN_TYPESString token types that correspond to token type int constants
-
Constructor Summary
Constructors Constructor Description JcmsTokenizer()Creates a new instance of theJcmsTokenizer.JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)Creates a new JcmsTokenizer with a givenAttributeFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidend()intgetMaxTokenLength()Retrieve the max allowed token lengthbooleanincrementToken()voidreset()voidsetMaxTokenLength(int length)Set the max allowed token length.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Field Detail
-
ALPHANUM
public static final int ALPHANUM
- See Also:
- Constant Field Values
-
APOSTROPHE
public static final int APOSTROPHE
- See Also:
- Constant Field Values
-
ACRONYM
public static final int ACRONYM
- See Also:
- Constant Field Values
-
COMPANY
public static final int COMPANY
- See Also:
- Constant Field Values
-
EMAIL
public static final int EMAIL
- See Also:
- Constant Field Values
-
HOST
public static final int HOST
- See Also:
- Constant Field Values
-
NUM
public static final int NUM
- See Also:
- Constant Field Values
-
CJ
public static final int CJ
- See Also:
- Constant Field Values
-
ACRONYM_DEP
public static final int ACRONYM_DEP
- See Also:
- Constant Field Values
-
TOKEN_TYPES
public static final java.lang.String[] TOKEN_TYPES
String token types that correspond to token type int constants
-
-
Constructor Detail
-
JcmsTokenizer
public JcmsTokenizer()
Creates a new instance of theJcmsTokenizer. Attaches theinputto the newly created JFlex scanner. See http://issues.apache.org/jira/browse/LUCENE-1068
-
JcmsTokenizer
public JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)
Creates a new JcmsTokenizer with a givenAttributeFactory- Parameters:
factory- the attribute factory to use
-
-
Method Detail
-
setMaxTokenLength
public void setMaxTokenLength(int length)
Set the max allowed token length. Any token longer than this is skipped.- Parameters:
length- a length, must be greated than zero, default is 255
-
getMaxTokenLength
public int getMaxTokenLength()
Retrieve the max allowed token length- Returns:
- a length greater than 0, default is 255
- See Also:
setMaxTokenLength(int)
-
incrementToken
public final boolean incrementToken() throws java.io.IOException- Specified by:
incrementTokenin classorg.apache.lucene.analysis.TokenStream- Throws:
java.io.IOException
-
end
public final void end() throws java.io.IOException- Overrides:
endin classorg.apache.lucene.analysis.TokenStream- Throws:
java.io.IOException
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Overrides:
closein classorg.apache.lucene.analysis.Tokenizer- Throws:
java.io.IOException
-
reset
public void reset() throws java.io.IOException- Overrides:
resetin classorg.apache.lucene.analysis.Tokenizer- Throws:
java.io.IOException
-
-