Package com.jalios.jcms.search.analysis
Class JcmsTokenizer
- java.lang.Object
- 
- org.apache.lucene.util.AttributeSource
- 
- org.apache.lucene.analysis.TokenStream
- 
- org.apache.lucene.analysis.Tokenizer
- 
- com.jalios.jcms.search.analysis.JcmsTokenizer
 
 
 
 
- 
- All Implemented Interfaces:
- java.io.Closeable,- java.lang.AutoCloseable
 
 public final class JcmsTokenizer extends org.apache.lucene.analysis.TokenizerA grammar-based tokenizer constructed with JFlex, based on lucene default ClassicTokenizer.This should be a good tokenizer for most European-language documents: - Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
- Splits words at hyphens, underscore and dash, unless there's a number in the token.
- Recognizes email addresses and internet hostnames as one token.
 
- 
- 
Field SummaryFields Modifier and Type Field Description static intACRONYMstatic intACRONYM_DEPstatic intALPHANUMstatic intAPOSTROPHEstatic intCJstatic intCOMPANYstatic intEMAILstatic intHOSTstatic intNUMstatic java.lang.String[]TOKEN_TYPESString token types that correspond to token type int constants
 - 
Constructor SummaryConstructors Constructor Description JcmsTokenizer()Creates a new instance of theJcmsTokenizer.JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory)Creates a new JcmsTokenizer with a givenAttributeFactory
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidend()intgetMaxTokenLength()Retrieve the max allowed token lengthbooleanincrementToken()voidreset()voidsetMaxTokenLength(int length)Set the max allowed token length.- 
Methods inherited from class org.apache.lucene.util.AttributeSourceaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
 
- 
 
- 
- 
- 
Field Detail- 
ALPHANUMpublic static final int ALPHANUM - See Also:
- Constant Field Values
 
 - 
APOSTROPHEpublic static final int APOSTROPHE - See Also:
- Constant Field Values
 
 - 
ACRONYMpublic static final int ACRONYM - See Also:
- Constant Field Values
 
 - 
COMPANYpublic static final int COMPANY - See Also:
- Constant Field Values
 
 - 
EMAILpublic static final int EMAIL - See Also:
- Constant Field Values
 
 - 
HOSTpublic static final int HOST - See Also:
- Constant Field Values
 
 - 
NUMpublic static final int NUM - See Also:
- Constant Field Values
 
 - 
CJpublic static final int CJ - See Also:
- Constant Field Values
 
 - 
ACRONYM_DEPpublic static final int ACRONYM_DEP - See Also:
- Constant Field Values
 
 - 
TOKEN_TYPESpublic static final java.lang.String[] TOKEN_TYPES String token types that correspond to token type int constants
 
- 
 - 
Constructor Detail- 
JcmsTokenizerpublic JcmsTokenizer() Creates a new instance of theJcmsTokenizer. Attaches theinputto the newly created JFlex scanner. See http://issues.apache.org/jira/browse/LUCENE-1068
 - 
JcmsTokenizerpublic JcmsTokenizer(org.apache.lucene.util.AttributeFactory factory) Creates a new JcmsTokenizer with a givenAttributeFactory- Parameters:
- factory- the attribute factory to use
 
 
- 
 - 
Method Detail- 
setMaxTokenLengthpublic void setMaxTokenLength(int length) Set the max allowed token length. Any token longer than this is skipped.- Parameters:
- length- a length, must be greated than zero, default is 255
 
 - 
getMaxTokenLengthpublic int getMaxTokenLength() Retrieve the max allowed token length- Returns:
- a length greater than 0, default is 255
- See Also:
- setMaxTokenLength(int)
 
 - 
incrementTokenpublic final boolean incrementToken() throws java.io.IOException- Specified by:
- incrementTokenin class- org.apache.lucene.analysis.TokenStream
- Throws:
- java.io.IOException
 
 - 
endpublic final void end() throws java.io.IOException- Overrides:
- endin class- org.apache.lucene.analysis.TokenStream
- Throws:
- java.io.IOException
 
 - 
closepublic void close() throws java.io.IOException- Specified by:
- closein interface- java.lang.AutoCloseable
- Specified by:
- closein interface- java.io.Closeable
- Overrides:
- closein class- org.apache.lucene.analysis.Tokenizer
- Throws:
- java.io.IOException
 
 - 
resetpublic void reset() throws java.io.IOException- Overrides:
- resetin class- org.apache.lucene.analysis.Tokenizer
- Throws:
- java.io.IOException
 
 
- 
 
-