com.jalios.jcms.search
Class LuceneFileSearchEngine

java.lang.Object
  extended by com.jalios.jcms.search.LuceneDataSearchEngine
      extended by com.jalios.jcms.search.LuceneFileSearchEngine
All Implemented Interfaces:
JcmsConstants, FileSearchEngine, JaliosConstants

public class LuceneFileSearchEngine
extends LuceneDataSearchEngine
implements FileSearchEngine, JcmsConstants

This class is an implementation of FileSearchEngine base on Lucene search engine.

Since:
jcms-4.1
Version:
$Revision: 49380 $

Field Summary
static String AUTHORID_FIELD
          Field name for the id of the FileDocument's author, e.g "j_2", or "21243_DBMember"
static String CLASSNAME_FIELD
          Field name for the className in jcms, e.g.
static String CONTENTS_FIELD
          Field name for the content of the file
static String FILE_INDEX_DIRECTORY
           
static String JALIOS_DATE_FIELD
          Field name for the Indexing Date (time in ms)
static String JCMS_ID_FIELD
          Field name for the id in jcms, e.g.
static String JCMS_PATH_FIELD
          Field name for the relative path in jcms, e.g.
static String MODIFIED_FIELD
          Field name for the last modified date of the file (time in ms) when it was indexed
static String PATH_FIELD
          Field name for the file path (file.getPath()) when it was indexed
static String PSTATUS_FIELD
          Field name for the pstatus of the FileDocument, eg : "-10", "0", "100"
static String REVISION
           
static String WORKSPACEID_FIELD
          Field name for the id of the FileDocument's workspace, e.g "j_4"
 
Fields inherited from class com.jalios.jcms.search.LuceneDataSearchEngine
alarmMgr, directoryName, ID_FIELD, indexAccessLock, INDEXING_DATE_EXTRAINFO, INDEXING_DATE_FIELD, langList, langToIndexDirMap, MAX_BUFFERED_DOCS, MAX_FIELD_LENGTH, MAX_MERGE_DOCS, MERGE_FACTOR, multilingual
 
Fields inherited from interface com.jalios.jcms.JcmsConstants
ADATE_SEARCH, ADMIN_NOTES_PROP, ADVANCED_TAB, ARCHIVES_DIR, ASCII_WIDTH, CATEGORY_TAB, CDATE_SEARCH, COMMON_ALARM, CONTENT_TAB, COOKIE_MAX_AGE, CTRL_TOPIC_INTERNAL, CTRL_TOPIC_REF, CTRL_TOPIC_VALUE, CTRL_TOPIC_WRITE, CUSTOM_PROP, DOCCHOOSER_HEIGHT, DOCCHOOSER_WIDTH, DOCS_DIR, EDATE_SEARCH, EMAIL_REGEXP, ERROR_MSG, FORBIDDEN_FILE_ACCESS, FORBIDDEN_REDIRECT, FORCE_REDIRECT, ICON_ARCHIVE, ICON_LOCK, ICON_LOCK_STRONG, ICON_WARN, ICON_WH_BOOK_CLOSED, ICON_WH_BOOK_OPEN, INFORMATION_MSG, JALIOS_JUNIT_PROP, JCMS_CADDY, JCMS_MSG_LIST, JSYNC_DOWNLOAD_DIR, JSYNC_SYNC_ALARM, LOG_FILE, LOG_TOPIC_SECURITY, LOGGER_PROP, LOGGER_XMLPROP, MBR_PHOTO_DIR, MDATE_SEARCH, MONITOR_XML, OP_CREATE, OP_DEEP_COPY, OP_DEEP_DELETE, OP_DELETE, OP_MERGE, OP_UPDATE, PDATE_SEARCH, PHOTO_DIR, PHOTO_ICON, PHOTO_ICON_HEIGHT, PHOTO_ICON_WIDTH, PHOTO_LARGE, PHOTO_LARGE_HEIGHT, PHOTO_LARGE_WIDTH, PHOTO_NORMAL, PHOTO_NORMAL_HEIGHT, PHOTO_NORMAL_WIDTH, PHOTO_SMALL, PHOTO_SMALL_HEIGHT, PHOTO_SMALL_WIDTH, PHOTO_TINY, PHOTO_TINY_HEIGHT, PHOTO_TINY_WIDTH, PREVIOUS_TAB, PRINT_VIEW, PRIVATE_FILE_ACCESS, PUBLIC_FILE_ACCESS, READ_RIGHT_TAB, SDATE_SEARCH, SEARCHENGINE_ALARM, SESSION_AUTHORIZED_FILENAMES_SET, STATS_REPORT_DIR, STATUS_PROP, STORE_XML, TEMPLATE_TAB, THUMBNAIL_LARGE_HEIGHT, THUMBNAIL_LARGE_WIDTH, THUMBNAIL_SMALL_HEIGHT, THUMBNAIL_SMALL_WIDTH, UDATE_SEARCH, UPDATE_RIGHT_TAB, UPLOAD_DIR, URL_REGEXP, WARNING_MSG, WEBAPP_PROP, WFEXPRESS_ALARM, WFREMINDER_ALARM, WORKFLOW_TAB, WORKFLOW_XML
 
Fields inherited from interface com.jalios.util.JaliosConstants
CRLF, MILLIS_IN_ONE_DAY, MILLIS_IN_ONE_HOUR, MILLIS_IN_ONE_MINUTE, MILLIS_IN_ONE_MONTH, MILLIS_IN_ONE_SECOND, MILLIS_IN_ONE_WEEK, MILLIS_IN_ONE_YEAR
 
Constructor Summary
LuceneFileSearchEngine()
           
 
Method Summary
 void add(FileDocument fileDocument)
          Add given FileDocument to this lucene search engine.
 void delete(FileDocument fileDocument)
          Delete given FileDocument from this lucene search engine.
protected  com.jalios.jcms.search.DataIterator<Data> getAllDataIterator()
          This methods must be implemented by the LuceneSearchEngine.
 org.apache.lucene.store.FSDirectory getDirectory()
          Returns the lucene directory used by this LuceneFileSearchEngine.
 org.apache.lucene.document.Document getDocument(String filename)
          Retrieve the Lucene Document bound to the specified filename.
 int getFileCount()
           
protected  org.apache.log4j.Logger getLogger()
          This methods must be implemented by the LuceneSearchEngine.
 org.apache.lucene.document.Document getLuceneDocument(FileDocument fileDoc, String content)
          Retrieve a new lucene Document for the specified file in preparation of indexing.
protected  org.apache.lucene.index.Term getPrimaryTerm(Data data)
          Override method for compatibility with legacy lucene file index which uses lucene field "id" (JCMS_ID_FIELD) for Data id, instead of the lucene field "_id_" (ID_FIELD) expected by default by LuceneDataSearchEngine.
 void index(FileDocument fileDoc, String content)
          Add the specified FileDocument to the index, with the specified content.
protected  void indexData(org.apache.lucene.index.IndexWriter writer, Data data, String lang)
          This methods index the given FileDocument in the default language, into the given index writer.
 boolean isAvailable()
           
 LinkedHashMap<String,Float> search(QueryHandler qh)
          Return the list of publication's identifier with a lucene search.
 boolean search(QueryHandler qh, HashSet<? extends Publication> pubSet, LinkedHashMap<String,Float> resultMap)
          Perform a full-text search on indexed files
 boolean search(QueryHandler qh, HashSet<? extends Publication> pubSet, QueryResultSet resultSet, boolean searchInDB)
          Perform a full-text search on indexed files
 LinkedHashMap<String,Float> search(QueryHandler qh, List<String> idList)
          Filters the given list of publication's identifier with a lucene search.
 void update(FileDocument fileDocument)
          Update given Publication in this lucene search engine.
 
Methods inherited from class com.jalios.jcms.search.LuceneDataSearchEngine
addData, addDataCollection, clearIndices, clearSearcher, deleteData, deleteDataCollection, getDirectory, getIndexingDate, getIndexingDate, getLastOptimizeDateSinceRestart, getLastOptimizeDuration, getLastReindexDateSinceRestart, getLastReindexDuration, getLuceneDocument, getOperationStartTime, getProgressState, getSearcher, index, index, isOperationRunning, optimizeIndices, reindexAll, remove, setIndexWriterOptions, updateData, updateDataCollection
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

REVISION

public static final String REVISION
See Also:
Constant Field Values

FILE_INDEX_DIRECTORY

public static final String FILE_INDEX_DIRECTORY
See Also:
Constant Field Values

PATH_FIELD

public static final String PATH_FIELD
Field name for the file path (file.getPath()) when it was indexed

See Also:
Constant Field Values

CONTENTS_FIELD

public static final String CONTENTS_FIELD
Field name for the content of the file

See Also:
Constant Field Values

MODIFIED_FIELD

public static final String MODIFIED_FIELD
Field name for the last modified date of the file (time in ms) when it was indexed

See Also:
Constant Field Values

JALIOS_DATE_FIELD

public static final String JALIOS_DATE_FIELD
Field name for the Indexing Date (time in ms)

See Also:
Constant Field Values

JCMS_PATH_FIELD

public static final String JCMS_PATH_FIELD
Field name for the relative path in jcms, e.g. "upload/docs/file.txt", or fileDoc.getFilename()

See Also:
Constant Field Values

JCMS_ID_FIELD

public static final String JCMS_ID_FIELD
Field name for the id in jcms, e.g. "c_345", or "21243_DBFileDocument

See Also:
Constant Field Values

AUTHORID_FIELD

public static final String AUTHORID_FIELD
Field name for the id of the FileDocument's author, e.g "j_2", or "21243_DBMember"

See Also:
Constant Field Values

PSTATUS_FIELD

public static final String PSTATUS_FIELD
Field name for the pstatus of the FileDocument, eg : "-10", "0", "100"

See Also:
Constant Field Values

WORKSPACEID_FIELD

public static final String WORKSPACEID_FIELD
Field name for the id of the FileDocument's workspace, e.g "j_4"

See Also:
Constant Field Values

CLASSNAME_FIELD

public static final String CLASSNAME_FIELD
Field name for the className in jcms, e.g. "c_345", or "com.jalios.jcms.DBFileDocument

See Also:
Constant Field Values
Constructor Detail

LuceneFileSearchEngine

public LuceneFileSearchEngine()
                       throws Exception
Throws:
Exception
Method Detail

getDirectory

public org.apache.lucene.store.FSDirectory getDirectory()
Returns the lucene directory used by this LuceneFileSearchEngine.
Warning!!! you should not modify the index, use this method only to access the directory in readonly !.
Note: The directory may not exists, check with IndexReader.indexExists(Directory).

Returns:
the instance of the FSDirectory used internally.

isAvailable

public boolean isAvailable()
Specified by:
isAvailable in interface FileSearchEngine
Returns:
true if the FileSearchEngine is available
Since:
jcms-4.0

getDocument

public org.apache.lucene.document.Document getDocument(String filename)
Retrieve the Lucene Document bound to the specified filename.

Parameters:
filename - relative file path e.g. "upload/docs/file.txt"
Returns:
the Lucene Document bound to the given filename or null if it could not found
Since:
jcms-4.0.1

search

public boolean search(QueryHandler qh,
                      HashSet<? extends Publication> pubSet,
                      QueryResultSet resultSet,
                      boolean searchInDB)
Perform a full-text search on indexed files

Specified by:
search in interface FileSearchEngine
Parameters:
qh - the Queryhandler in which to find search text and search options.
pubSet - a HashSet containing all the Publication to search.
if empty, search is not performed at all.
if null, all Publication found will be returned.
This set MUST NOT be modified by implementation.
resultSet - the QueryResultSet that must be filled with matching Publication
searchInDB - if false, only JStore publication are set in pubSet
Returns:
true if a search was performed in the FileSearchEngine. Useful to differenciate a query returning zero result from a query not performed due to missing paramerters (text for example)
Since:
jcms-5.5.0

search

public boolean search(QueryHandler qh,
                      HashSet<? extends Publication> pubSet,
                      LinkedHashMap<String,Float> resultMap)
Perform a full-text search on indexed files

Parameters:
qh - the Queryhandler in which to find search text and search options.
pubSet - a HashSet containing all the Publication to search.
if empty, search is not performed at all.
if null, all Publication found will be returned.
This set MUST NOT be modified by implementation.
resultMap - the map that must be filled with matching {Publication's Id, score}
Returns:
true if a search was performed in the FileSearchEngine. Useful to differenciate a query returning zero result from a query not performed due to missing paramerters (text for example)
Since:
jcms-5.5.0

search

public LinkedHashMap<String,Float> search(QueryHandler qh,
                                          List<String> idList)
Description copied from interface: FileSearchEngine
Filters the given list of publication's identifier with a lucene search.

Specified by:
search in interface FileSearchEngine
Parameters:
qh - the Queryhandler in which to find search text and search options.
idList - the list of publication's identifier
Returns:
a map of publication's matching the lucene query and their score. This map is a subset of idList and respect its order.

search

public LinkedHashMap<String,Float> search(QueryHandler qh)
Description copied from interface: FileSearchEngine
Return the list of publication's identifier with a lucene search.

Specified by:
search in interface FileSearchEngine
Parameters:
qh - the Queryhandler in which to find search text and search options.
Returns:
a map of publication's matching the lucene query and their score.

getFileCount

public int getFileCount()
Specified by:
getFileCount in interface FileSearchEngine
Returns:
the number of indexed files
Since:
jcms-4.1

getLuceneDocument

public org.apache.lucene.document.Document getLuceneDocument(FileDocument fileDoc,
                                                             String content)
Retrieve a new lucene Document for the specified file in preparation of indexing.

Parameters:
fileDoc - the FileDocument for which file is being indexed
content - the content of the file, optionnal.
Returns:
A new instance of Document suitable for indexation through

add

public void add(FileDocument fileDocument)
Add given FileDocument to this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) added immediately after call.

Specified by:
add in interface FileSearchEngine
Parameters:
fileDocument - the FileDocument to index .

update

public void update(FileDocument fileDocument)
Update given Publication in this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) updated immediately after call.

Specified by:
update in interface FileSearchEngine
Parameters:
fileDocument - the FileDocument to reindex .

delete

public void delete(FileDocument fileDocument)
Delete given FileDocument from this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) deleted immediately after call.

Specified by:
delete in interface FileSearchEngine
Parameters:
fileDocument - the FileDocument to delete from index .

index

public void index(FileDocument fileDoc,
                  String content)
Add the specified FileDocument to the index, with the specified content.

Thread safety : This method is ran against the indexing thread created for LuceneFileSearchEngine (ie using the same lock). and therefore it will block if a indexing is already being performed, and it will block indexing until finished .
Therefore invoke wisely (it should only be needed by JCMSUploadIndexer and during unitest).

Parameters:
fileDoc - the FileDocument to be indexed in lucene
content - the content that was extracted for the FileDocument

getLogger

protected org.apache.log4j.Logger getLogger()
Description copied from class: LuceneDataSearchEngine
This methods must be implemented by the LuceneSearchEngine. It must return the logger to be used for log messages.

Specified by:
getLogger in class LuceneDataSearchEngine
Returns:
Logger of this engine.

getAllDataIterator

protected com.jalios.jcms.search.DataIterator<Data> getAllDataIterator()
Description copied from class: LuceneDataSearchEngine
This methods must be implemented by the LuceneSearchEngine. It must return a DataIterator used to iterate on all Data to index. Used by LuceneDataSearchEngine.reindexAll().

Specified by:
getAllDataIterator in class LuceneDataSearchEngine

indexData

protected void indexData(org.apache.lucene.index.IndexWriter writer,
                         Data data,
                         String lang)
                  throws IOException
This methods index the given FileDocument in the default language, into the given index writer.

Specified by:
indexData in class LuceneDataSearchEngine
Throws:
IOException

getPrimaryTerm

protected org.apache.lucene.index.Term getPrimaryTerm(Data data)
Override method for compatibility with legacy lucene file index which uses lucene field "id" (JCMS_ID_FIELD) for Data id, instead of the lucene field "_id_" (ID_FIELD) expected by default by LuceneDataSearchEngine.

Overrides:
getPrimaryTerm in class LuceneDataSearchEngine
Returns:
a Term instance, must not return null


Copyright © 2001-2010 Jalios SA. All Rights Reserved.