com.jalios.util.lucene
Class ISOLatin1AccentFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
com.jalios.util.lucene.ISOLatin1AccentFilter
public class ISOLatin1AccentFilter
- extends org.apache.lucene.analysis.TokenFilter
A filter that replaces accented characters in the ISO Latin 1 character set
(ISO-8859-1) by their unaccented equivalent. The case will not be altered.
For instance, 'à' will be replaced by 'a'.
When indexing, acts like a synonym filter and return two tokens: the
original accented token and the new unaccented one.
Otherwise, only return the unaccented token.
- Version:
- $Revision: 27751 $
- Author:
- Olivier Jaquemet
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
input |
Constructor Summary |
ISOLatin1AccentFilter(org.apache.lucene.analysis.TokenStream input,
boolean isIndexing)
|
Method Summary |
org.apache.lucene.analysis.Token |
next()
|
static String |
removeAccents(String input)
To replace accented characters in a String by unaccented equivalents. |
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
close, end, reset |
Methods inherited from class org.apache.lucene.analysis.TokenStream |
getOnlyUseNewAPI, incrementToken, next, setOnlyUseNewAPI |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
REVISION
public static final String REVISION
- See Also:
- Constant Field Values
TOKEN_TYPE_UNACCENTED
public static final String TOKEN_TYPE_UNACCENTED
- See Also:
- Constant Field Values
isIndexing
protected final boolean isIndexing
unaccentedToken
protected org.apache.lucene.analysis.Token unaccentedToken
ISOLatin1AccentFilter
public ISOLatin1AccentFilter(org.apache.lucene.analysis.TokenStream input,
boolean isIndexing)
- Parameters:
input
- the TokenStream to filterisIndexing
- whether this filter is used during indexing or search
next
public final org.apache.lucene.analysis.Token next()
throws IOException
- Overrides:
next
in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
removeAccents
public static final String removeAccents(String input)
- To replace accented characters in a String by unaccented equivalents.
- Parameters:
input
- the string to process
- Returns:
- a new string with the accent characters replaced
Copyright © 2001-2010 Jalios SA. All Rights Reserved.