com.jalios.util.lucene
Class ISOLatin1AccentFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by com.jalios.util.lucene.ISOLatin1AccentFilter

public class ISOLatin1AccentFilter
extends org.apache.lucene.analysis.TokenFilter

A filter that replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent. The case will not be altered.

For instance, 'à' will be replaced by 'a'.

When indexing, acts like a synonym filter and return two tokens: the original accented token and the new unaccented one.
Otherwise, only return the unaccented token.

Version:
$Revision: 27751 $
Author:
Olivier Jaquemet

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
protected  boolean isIndexing
           
static String REVISION
           
static String TOKEN_TYPE_UNACCENTED
           
protected  org.apache.lucene.analysis.Token unaccentedToken
           
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
ISOLatin1AccentFilter(org.apache.lucene.analysis.TokenStream input, boolean isIndexing)
           
 
Method Summary
 org.apache.lucene.analysis.Token next()
           
static String removeAccents(String input)
          To replace accented characters in a String by unaccented equivalents.
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
getOnlyUseNewAPI, incrementToken, next, setOnlyUseNewAPI
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

REVISION

public static final String REVISION
See Also:
Constant Field Values

TOKEN_TYPE_UNACCENTED

public static final String TOKEN_TYPE_UNACCENTED
See Also:
Constant Field Values

isIndexing

protected final boolean isIndexing

unaccentedToken

protected org.apache.lucene.analysis.Token unaccentedToken
Constructor Detail

ISOLatin1AccentFilter

public ISOLatin1AccentFilter(org.apache.lucene.analysis.TokenStream input,
                             boolean isIndexing)
Parameters:
input - the TokenStream to filter
isIndexing - whether this filter is used during indexing or search
Method Detail

next

public final org.apache.lucene.analysis.Token next()
                                            throws IOException
Overrides:
next in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

removeAccents

public static final String removeAccents(String input)
To replace accented characters in a String by unaccented equivalents.

Parameters:
input - the string to process
Returns:
a new string with the accent characters replaced


Copyright © 2001-2010 Jalios SA. All Rights Reserved.