com.jalios.jcms.search
Class LuceneDataSearchEngine

java.lang.Object
  extended by com.jalios.jcms.search.LuceneDataSearchEngine
All Implemented Interfaces:
JcmsConstants, JaliosConstants
Direct Known Subclasses:
LuceneCategorySearchEngine, LuceneFileSearchEngine, LucenePublicationSearchEngine

public abstract class LuceneDataSearchEngine
extends Object
implements JcmsConstants

This class provides a base class to index JCMS Data using Lucene.

Since:
jcms-5.5.0
Version:
$Revision: 50111 $

Field Summary
protected  AlarmManager alarmMgr
           
protected  Channel channel
           
protected  String directoryName
           
static String ID_FIELD
           
protected  Object indexAccessLock
           
static String INDEXING_DATE_EXTRAINFO
           
static String INDEXING_DATE_FIELD
           
protected  List<String> langList
          Languages in which the engine index and search its content.
protected  Map<String,org.apache.lucene.store.FSDirectory> langToIndexDirMap
          Lucene FSDirectory in which the engine index and search its content.
static String MAX_BUFFERED_DOCS
          Optionnal integer property name to define the maxBufferedDocs when writing in lucene index
static String MAX_FIELD_LENGTH
          Optionnal integer property name to define the maxFieldLength when writing in lucene index
static String MAX_MERGE_DOCS
          Optionnal integer property name to define the maxMergeDocs when writing in lucene index
static String MERGE_FACTOR
          Optionnal integer property name to define the mergeFactor when writing in lucene index
protected  boolean multilingual
           
static String REVISION
           
 
Fields inherited from interface com.jalios.jcms.JcmsConstants
ADATE_SEARCH, ADMIN_NOTES_PROP, ADVANCED_TAB, ARCHIVES_DIR, ASCII_WIDTH, CATEGORY_TAB, CDATE_SEARCH, COMMON_ALARM, CONTENT_TAB, COOKIE_MAX_AGE, CTRL_TOPIC_INTERNAL, CTRL_TOPIC_REF, CTRL_TOPIC_VALUE, CTRL_TOPIC_WRITE, CUSTOM_PROP, DOCCHOOSER_HEIGHT, DOCCHOOSER_WIDTH, DOCS_DIR, EDATE_SEARCH, EMAIL_REGEXP, ERROR_MSG, FORBIDDEN_FILE_ACCESS, FORBIDDEN_REDIRECT, FORCE_REDIRECT, ICON_ARCHIVE, ICON_LOCK, ICON_LOCK_STRONG, ICON_WARN, ICON_WH_BOOK_CLOSED, ICON_WH_BOOK_OPEN, INFORMATION_MSG, JALIOS_JUNIT_PROP, JCMS_CADDY, JCMS_MSG_LIST, JSYNC_DOWNLOAD_DIR, JSYNC_SYNC_ALARM, LOG_FILE, LOG_TOPIC_SECURITY, LOGGER_PROP, LOGGER_XMLPROP, MBR_PHOTO_DIR, MDATE_SEARCH, MONITOR_XML, OP_CREATE, OP_DEEP_COPY, OP_DEEP_DELETE, OP_DELETE, OP_MERGE, OP_UPDATE, PDATE_SEARCH, PHOTO_DIR, PHOTO_ICON, PHOTO_ICON_HEIGHT, PHOTO_ICON_WIDTH, PHOTO_LARGE, PHOTO_LARGE_HEIGHT, PHOTO_LARGE_WIDTH, PHOTO_NORMAL, PHOTO_NORMAL_HEIGHT, PHOTO_NORMAL_WIDTH, PHOTO_SMALL, PHOTO_SMALL_HEIGHT, PHOTO_SMALL_WIDTH, PHOTO_TINY, PHOTO_TINY_HEIGHT, PHOTO_TINY_WIDTH, PREVIOUS_TAB, PRINT_VIEW, PRIVATE_FILE_ACCESS, PUBLIC_FILE_ACCESS, READ_RIGHT_TAB, SDATE_SEARCH, SEARCHENGINE_ALARM, SESSION_AUTHORIZED_FILENAMES_SET, STATS_REPORT_DIR, STATUS_PROP, STORE_XML, TEMPLATE_TAB, THUMBNAIL_LARGE_HEIGHT, THUMBNAIL_LARGE_WIDTH, THUMBNAIL_SMALL_HEIGHT, THUMBNAIL_SMALL_WIDTH, UDATE_SEARCH, UPDATE_RIGHT_TAB, UPLOAD_DIR, URL_REGEXP, WARNING_MSG, WEBAPP_PROP, WFEXPRESS_ALARM, WFREMINDER_ALARM, WORKFLOW_TAB, WORKFLOW_XML
 
Fields inherited from interface com.jalios.util.JaliosConstants
CRLF, MILLIS_IN_ONE_DAY, MILLIS_IN_ONE_HOUR, MILLIS_IN_ONE_MINUTE, MILLIS_IN_ONE_MONTH, MILLIS_IN_ONE_SECOND, MILLIS_IN_ONE_WEEK, MILLIS_IN_ONE_YEAR
 
Constructor Summary
protected LuceneDataSearchEngine(String directoryName, boolean multilingual)
          Construct a new Lucene Data Search Engine given a directory name.
 
Method Summary
protected  void addData(Data data)
          Add given Data to this lucene search engine.
protected  void addDataCollection(Collection<? extends Data> coll)
          Add given Collection of Data to this lucene search engine.
 void clearIndices()
          Delete all Document from all indices (overwrite existing index with a new one).
protected  void clearSearcher()
          Close current searchers and clear it for future renewal.
protected  void deleteData(Data data)
          Delete given Data from this lucene search engine.
protected  void deleteDataCollection(Collection<? extends Data> coll)
          Delete given Collection of Data from this lucene search engine.
protected abstract  com.jalios.jcms.search.DataIterator<Data> getAllDataIterator()
          This methods must be implemented by the LuceneSearchEngine.
 org.apache.lucene.store.FSDirectory getDirectory(String lang)
          Returns the lucene directory used for the specified language.
 Date getIndexingDate(Data data)
          Retrieve the Date at which the specified Data was indexed in the main language of the site.
 Date getIndexingDate(Data data, String lang)
          Returns the date at which the specified Data has been indexed for the specified language.
 Date getLastOptimizeDateSinceRestart()
           
 long getLastOptimizeDuration()
           
 Date getLastReindexDateSinceRestart()
           
 long getLastReindexDuration()
           
protected abstract  org.apache.log4j.Logger getLogger()
          This methods must be implemented by the LuceneSearchEngine.
 org.apache.lucene.document.Document getLuceneDocument(Data data, String lang)
          Returns the lucene Document corresponding to the specified Data in the index of the the specified language.
 long getOperationStartTime()
           
protected  org.apache.lucene.index.Term getPrimaryTerm(Data data)
          Retrieve the a lucene Terme suitable for use as primary key when searching/removing/updating a unique lucene document for the specified data
 int getProgressState()
           
protected  org.apache.lucene.search.Searcher getSearcher()
          Retrieve a valid lucene Searcher instance
protected  void index(org.apache.lucene.store.Directory directory, Collection<? extends Data> coll, String lang)
          Index a Collection of Data into lucene.
protected  void index(org.apache.lucene.store.Directory directory, Iterator<? extends Data> iterator, String lang)
          Index all Data returned by the specified Iterator into lucene.
protected abstract  void indexData(org.apache.lucene.index.IndexWriter writer, Data data, String lang)
          This methods must be implemented by the LuceneSearchEngine.
 boolean isOperationRunning()
           
 void optimizeIndices()
          Optimize all incides of the LuceneSearchEngine.
 void reindexAll()
          Clears the lucene indices of this searchengine, reindex all content retrieved using protected method getAllDataIterator().
 void remove(org.apache.lucene.store.Directory directory, Collection<? extends Data> coll)
          Remove a Collection of Data from the lucene index.
static void setIndexWriterOptions(org.apache.lucene.index.IndexWriter writer)
          Set IndexWriter options retrieved from hooks and channel properties.
protected  void updateData(Data data)
          Update given Data in this lucene search engine.
protected  void updateDataCollection(Collection<? extends Data> coll)
          Update given Collection of Data in this lucene search engine.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

REVISION

public static final String REVISION
See Also:
Constant Field Values

ID_FIELD

public static final String ID_FIELD
See Also:
Constant Field Values

INDEXING_DATE_FIELD

public static final String INDEXING_DATE_FIELD
See Also:
Constant Field Values

INDEXING_DATE_EXTRAINFO

public static final String INDEXING_DATE_EXTRAINFO
See Also:
Constant Field Values

channel

protected final Channel channel

alarmMgr

protected final AlarmManager alarmMgr

directoryName

protected final String directoryName

multilingual

protected final boolean multilingual

langList

protected final List<String> langList
Languages in which the engine index and search its content.


langToIndexDirMap

protected final Map<String,org.apache.lucene.store.FSDirectory> langToIndexDirMap
Lucene FSDirectory in which the engine index and search its content.


indexAccessLock

protected final Object indexAccessLock

MERGE_FACTOR

public static final String MERGE_FACTOR
Optionnal integer property name to define the mergeFactor when writing in lucene index


MAX_MERGE_DOCS

public static final String MAX_MERGE_DOCS
Optionnal integer property name to define the maxMergeDocs when writing in lucene index


MAX_BUFFERED_DOCS

public static final String MAX_BUFFERED_DOCS
Optionnal integer property name to define the maxBufferedDocs when writing in lucene index


MAX_FIELD_LENGTH

public static final String MAX_FIELD_LENGTH
Optionnal integer property name to define the maxFieldLength when writing in lucene index

Constructor Detail

LuceneDataSearchEngine

protected LuceneDataSearchEngine(String directoryName,
                                 boolean multilingual)
                          throws Exception
Construct a new Lucene Data Search Engine given a directory name.

Parameters:
directoryName - the name of the directory to create
multilingual - true to use one index per language, false to use only one index
Throws:
Exception - on any error
Method Detail

setIndexWriterOptions

public static void setIndexWriterOptions(org.apache.lucene.index.IndexWriter writer)
Set IndexWriter options retrieved from hooks and channel properties.

Parameters:
writer - the IndexWriter for option will be modified
Since:
jcms-6.0.3, jcms-6.1.2

index

protected void index(org.apache.lucene.store.Directory directory,
                     Collection<? extends Data> coll,
                     String lang)
              throws IOException
Index a Collection of Data into lucene.
This method is NOT synchronized, the caller is responsible to do it!

Throws:
IOException

index

protected void index(org.apache.lucene.store.Directory directory,
                     Iterator<? extends Data> iterator,
                     String lang)
              throws IOException
Index all Data returned by the specified Iterator into lucene.
This method is NOT synchronized, the caller is responsible to do it!

Throws:
IOException

remove

public void remove(org.apache.lucene.store.Directory directory,
                   Collection<? extends Data> coll)
            throws IOException
Remove a Collection of Data from the lucene index.
This method is NOT synchronized, the caller is responsible to do it!

Parameters:
directory - The Lucene directory in which to remove the Datas
coll - a collection of Data, must not be null
Throws:
IOException - if the directory could not be opened or deletion could not be performed

getPrimaryTerm

protected org.apache.lucene.index.Term getPrimaryTerm(Data data)
Retrieve the a lucene Terme suitable for use as primary key when searching/removing/updating a unique lucene document for the specified data

Returns:
a Term instance, must not return null

getDirectory

public org.apache.lucene.store.FSDirectory getDirectory(String lang)
Returns the lucene directory used for the specified language.
Warning!!! you should not modify the index (LuceneDataSearchEngine rely on its own modification for optimization purpose), use this method only to access the directory in readonly !.

Parameters:
lang - the language of JCMS (ISO-639) in which to retrieve the Directory
Returns:
the FSDirectory of the specified language or null if no Directory is available for this language.

getLuceneDocument

public org.apache.lucene.document.Document getLuceneDocument(Data data,
                                                             String lang)
Returns the lucene Document corresponding to the specified Data in the index of the the specified language.

Parameters:
data - the Data being looked for
lang - the language in which to check
Returns:
a lucene Document or null if could not be found
Since:
jcms-6.0.1

getIndexingDate

public Date getIndexingDate(Data data,
                            String lang)
Returns the date at which the specified Data has been indexed for the specified language.

Parameters:
data - the Data being looked for
lang - the language in which to check
Returns:
a Date or null if it could not be found
Since:
jcms-6.0.1

getIndexingDate

public Date getIndexingDate(Data data)
Retrieve the Date at which the specified Data was indexed in the main language of the site.

Parameters:
data - the Data for which to retrieve the indexing date.
Returns:
the indexing date of the Data or null if was not indexed.
Since:
jcms-6.0.1

addData

protected void addData(Data data)
Add given Data to this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) available immediately after call.


updateData

protected void updateData(Data data)
Update given Data in this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) available immediately after call.


deleteData

protected void deleteData(Data data)
Delete given Data from this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) available immediately after call.


addDataCollection

protected void addDataCollection(Collection<? extends Data> coll)
Add given Collection of Data to this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) available immediately after call.


updateDataCollection

protected void updateDataCollection(Collection<? extends Data> coll)
Update given Collection of Data in this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) available immediately after call.


deleteDataCollection

protected void deleteDataCollection(Collection<? extends Data> coll)
Delete given Collection of Data from this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) available immediately after call.


clearIndices

public void clearIndices()
Delete all Document from all indices (overwrite existing index with a new one). Warning: this operation is undoable! It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job.


optimizeIndices

public void optimizeIndices()
Optimize all incides of the LuceneSearchEngine. Warning: This is a potentially long and heavy process on large index, do not call without being sure of what you do. It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job.


getLastOptimizeDateSinceRestart

public Date getLastOptimizeDateSinceRestart()
Returns:
a date indicating the last time the optimize was done, or null if no optimization was done.

getLastOptimizeDuration

public long getLastOptimizeDuration()
Returns:
a duration in millisecond indicating the duration of the last optimize operation since restart (or 0 if none occured).

reindexAll

public void reindexAll()
                throws IOException
Clears the lucene indices of this searchengine, reindex all content retrieved using protected method getAllDataIterator(). It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job. You can access status regarding operation progress using : isOperationRunning() and getProgressState() .

Throws:
IOException - if an error occurs during indexing

getLastReindexDateSinceRestart

public Date getLastReindexDateSinceRestart()
Returns:
a date indicating the last time the reindex was done, or null if no reindex was done.

getLastReindexDuration

public long getLastReindexDuration()
Returns:
a duration in millisecond indicating the duration of the last reindex operation since restart (or 0 if none occured).

getSearcher

protected org.apache.lucene.search.Searcher getSearcher()
                                                 throws IOException
Retrieve a valid lucene Searcher instance

Parameters:
languages - The languages for which a searcher must be retrieved
Returns:
current available searcher or create a new one if an index changed as occured.
Throws:
IOException

clearSearcher

protected void clearSearcher()
Close current searchers and clear it for future renewal. Called after index change.


isOperationRunning

public boolean isOperationRunning()
Returns:
true if an operation of which its progess is being watched (reindexing, optimizing)
See Also:
getProgressState()

getProgressState

public int getProgressState()
Returns:
a percentage showing current state of operation, or 100 if no operation is running
See Also:
isOperationRunning()

getOperationStartTime

public long getOperationStartTime()
Returns:
the time at which the current operation was started, or 0 if no operation is running
See Also:
isOperationRunning()

getLogger

protected abstract org.apache.log4j.Logger getLogger()
This methods must be implemented by the LuceneSearchEngine. It must return the logger to be used for log messages.

Returns:
Logger of this engine.

getAllDataIterator

protected abstract com.jalios.jcms.search.DataIterator<Data> getAllDataIterator()
This methods must be implemented by the LuceneSearchEngine. It must return a DataIterator used to iterate on all Data to index. Used by reindexAll().


indexData

protected abstract void indexData(org.apache.lucene.index.IndexWriter writer,
                                  Data data,
                                  String lang)
                           throws IOException
This methods must be implemented by the LuceneSearchEngine. It must index the given data in the given language, into the given index writer.

Throws:
IOException


Copyright © 2001-2010 Jalios SA. All Rights Reserved.