com.jalios.jcms.search
Class LuceneFileSearchEngine

java.lang.Object
  extended by com.jalios.jcms.search.LuceneFileSearchEngine
All Implemented Interfaces:
JcmsConstants, FileSearchEngine, JaliosConstants

public class LuceneFileSearchEngine
extends Object
implements FileSearchEngine, JcmsConstants

This class is an implementation of FileSearchEngine base on Lucene search engine.

Since:
jcms-4.1
Version:
$Revision: 29913 $
Author:
Olivier Dedieu , Olivier Jaquemet

Field Summary
static String AUTHORID_FIELD
          Field name for the id of the FileDocument's author, e.g "j_2", or "21243_DBMember"
static String CLASSNAME_FIELD
          Field name for the className in jcms, e.g.
static String CONTENTS_FIELD
          Field name for the content of the file
static String FILE_INDEX_DIRECTORY
           
static String JALIOS_DATE_FIELD
          Field name for the Indexing Date (time in ms)
static String JCMS_ID_FIELD
          Field name for the id in jcms, e.g.
static String JCMS_PATH_FIELD
          Field name for the relative path in jcms, e.g.
static String MODIFIED_FIELD
          Field name for the last modified date of the file (time in ms) when it was indexed
static String PATH_FIELD
          Field name for the file path (file.getPath()) when it was indexed
static String PSTATUS_FIELD
          Field name for the pstatus of the FileDocument, eg : "-10", "0", "100"
static String REVISION
           
static String WORKSPACEID_FIELD
          Field name for the id of the FileDocument's workspace, e.g "j_4"
 
Fields inherited from interface com.jalios.jcms.JcmsConstants
ADATE_SEARCH, ADMIN_NOTES_PROP, ADVANCED_TAB, ARCHIVES_DIR, ASCII_WIDTH, CATEGORY_TAB, CDATE_SEARCH, COMMON_ALARM, CONTENT_TAB, COOKIE_MAX_AGE, CRYPT_MD5, CRYPT_UNDEFINED, CRYPT_UNIX, CTRL_TOPIC_INTERNAL, CTRL_TOPIC_REF, CTRL_TOPIC_VALUE, CTRL_TOPIC_WRITE, CUSTOM_PROP, DOCCHOOSER_HEIGHT, DOCCHOOSER_WIDTH, DOCS_DIR, EDATE_SEARCH, EMAIL_REGEXP, ERROR_MSG, FORBIDDEN_FILE_ACCESS, FORBIDDEN_REDIRECT, FORCE_REDIRECT, ICON_ARCHIVE, ICON_LOCK, ICON_LOCK_STRONG, ICON_WARN, ICON_WH_BOOK_CLOSED, ICON_WH_BOOK_OPEN, INFORMATION_MSG, JALIOS_JUNIT_PROP, JCMS_CADDY, JCMS_MSG_LIST, JSYNC_DOWNLOAD_DIR, JSYNC_SYNC_ALARM, LOG_FILE, LOG_TOPIC_SECURITY, LOGGER_PROP, LOGGER_XMLPROP, MBR_PHOTO_DIR, MDATE_SEARCH, MONITOR_XML, OP_CREATE, OP_DEEP_COPY, OP_DEEP_DELETE, OP_DELETE, OP_MERGE, OP_UPDATE, PDATE_SEARCH, PHOTO_DIR, PHOTO_ICON, PHOTO_ICON_HEIGHT, PHOTO_ICON_WIDTH, PHOTO_LARGE, PHOTO_LARGE_HEIGHT, PHOTO_LARGE_WIDTH, PHOTO_NORMAL, PHOTO_NORMAL_HEIGHT, PHOTO_NORMAL_WIDTH, PHOTO_SMALL, PHOTO_SMALL_HEIGHT, PHOTO_SMALL_WIDTH, PHOTO_TINY, PHOTO_TINY_HEIGHT, PHOTO_TINY_WIDTH, PREVIOUS_TAB, PRINT_VIEW, PRIVATE_FILE_ACCESS, PUBLIC_FILE_ACCESS, READ_RIGHT_TAB, SDATE_SEARCH, SEARCHENGINE_ALARM, SESSION_AUTHORIZED_FILENAMES_SET, STATS_REPORT_DIR, STATUS_PROP, STORE_XML, TEMPLATE_TAB, THUMBNAIL_LARGE_HEIGHT, THUMBNAIL_LARGE_WIDTH, THUMBNAIL_SMALL_HEIGHT, THUMBNAIL_SMALL_WIDTH, UDATE_SEARCH, UPDATE_RIGHT_TAB, UPLOAD_DIR, URL_REGEXP, WARNING_MSG, WEBAPP_PROP, WFEXPRESS_ALARM, WFREMINDER_ALARM, WORKFLOW_TAB, WORKFLOW_XML
 
Fields inherited from interface com.jalios.util.JaliosConstants
CRLF, MILLIS_IN_ONE_DAY, MILLIS_IN_ONE_HOUR, MILLIS_IN_ONE_MINUTE, MILLIS_IN_ONE_MONTH, MILLIS_IN_ONE_SECOND, MILLIS_IN_ONE_WEEK, MILLIS_IN_ONE_YEAR
 
Constructor Summary
LuceneFileSearchEngine()
           
 
Method Summary
 org.apache.lucene.store.FSDirectory getDirectory()
          Returns the lucene directory used by this LuceneFileSearchEngine.
 org.apache.lucene.document.Document getDocument(String filename)
          Retrieve the Lucene Document bound to the specified filename.
 int getFileCount()
           
 org.apache.lucene.document.Document getLuceneDocument(File file, String content)
          Retrieve a new lucene Document for the specified file in preparation of indexing.
 void index(File file, org.apache.lucene.document.Document doc)
          Add the specified lucene Document to the index.
 boolean isAvailable()
           
 void optimize()
          Realize a Lucene optimization of the Lucene File Index.
 void remove(File file)
          Remove the specified file from the lucene index.
 LinkedHashMap<String,Float> search(QueryHandler qh)
          Return the list of publication's identifier with a lucene search.
 boolean search(QueryHandler qh, HashSet<? extends Publication> pubSet, LinkedHashMap<String,Float> resultMap)
          Perform a full-text search on indexed files
 boolean search(QueryHandler qh, HashSet<? extends Publication> pubSet, QueryResultSet resultSet, boolean searchInDB)
          Perform a full-text search on indexed files
 LinkedHashMap<String,Float> search(QueryHandler qh, List<String> idList)
          Filters the given list of publication's identifier with a lucene search.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

REVISION

public static final String REVISION
See Also:
Constant Field Values

FILE_INDEX_DIRECTORY

public static final String FILE_INDEX_DIRECTORY
See Also:
Constant Field Values

PATH_FIELD

public static final String PATH_FIELD
Field name for the file path (file.getPath()) when it was indexed

See Also:
Constant Field Values

CONTENTS_FIELD

public static final String CONTENTS_FIELD
Field name for the content of the file

See Also:
Constant Field Values

MODIFIED_FIELD

public static final String MODIFIED_FIELD
Field name for the last modified date of the file (time in ms) when it was indexed

See Also:
Constant Field Values

JALIOS_DATE_FIELD

public static final String JALIOS_DATE_FIELD
Field name for the Indexing Date (time in ms)

See Also:
Constant Field Values

JCMS_PATH_FIELD

public static final String JCMS_PATH_FIELD
Field name for the relative path in jcms, e.g. "upload/docs/file.txt", or fileDoc.getFilename()

See Also:
Constant Field Values

JCMS_ID_FIELD

public static final String JCMS_ID_FIELD
Field name for the id in jcms, e.g. "c_345", or "21243_DBFileDocument

See Also:
Constant Field Values

AUTHORID_FIELD

public static final String AUTHORID_FIELD
Field name for the id of the FileDocument's author, e.g "j_2", or "21243_DBMember"

See Also:
Constant Field Values

PSTATUS_FIELD

public static final String PSTATUS_FIELD
Field name for the pstatus of the FileDocument, eg : "-10", "0", "100"

See Also:
Constant Field Values

WORKSPACEID_FIELD

public static final String WORKSPACEID_FIELD
Field name for the id of the FileDocument's workspace, e.g "j_4"

See Also:
Constant Field Values

CLASSNAME_FIELD

public static final String CLASSNAME_FIELD
Field name for the className in jcms, e.g. "c_345", or "com.jalios.jcms.DBFileDocument

See Also:
Constant Field Values
Constructor Detail

LuceneFileSearchEngine

public LuceneFileSearchEngine()
                       throws Exception
Throws:
Exception
Method Detail

getDirectory

public org.apache.lucene.store.FSDirectory getDirectory()
Returns the lucene directory used by this LuceneFileSearchEngine.
Warning!!! you should not modify the index, use this method only to access the directory in readonly !.
Note: The directory may not exists, check with IndexReader.indexExists(Directory).

Returns:
the instance of the FSDirectory used internally.

isAvailable

public boolean isAvailable()
Specified by:
isAvailable in interface FileSearchEngine
Returns:
true if the FileSearchEngine is available
Since:
jcms-4.0

getDocument

public org.apache.lucene.document.Document getDocument(String filename)
Retrieve the Lucene Document bound to the specified filename.

Parameters:
filename - relative file path e.g. "upload/docs/file.txt"
Returns:
the Lucene Document bound to the given filename or null if it could not found
Since:
jcms-4.0.1

search

public boolean search(QueryHandler qh,
                      HashSet<? extends Publication> pubSet,
                      QueryResultSet resultSet,
                      boolean searchInDB)
Perform a full-text search on indexed files

Specified by:
search in interface FileSearchEngine
Parameters:
qh - the Queryhandler in which to find search text and search options.
pubSet - a HashSet containing all the Publication to search.
if empty, search is not performed at all.
if null, all Publication found will be returned.
This set MUST NOT be modified by implementation.
resultSet - the QueryResultSet that must be filled with matching Publication
searchInDB - if false, only JStore publication are set in pubSet
Returns:
true if a search was performed in the FileSearchEngine. Useful to differenciate a query returning zero result from a query not performed due to missing paramerters (text for example)
Since:
jcms-5.5.0

search

public boolean search(QueryHandler qh,
                      HashSet<? extends Publication> pubSet,
                      LinkedHashMap<String,Float> resultMap)
Perform a full-text search on indexed files

Parameters:
qh - the Queryhandler in which to find search text and search options.
pubSet - a HashSet containing all the Publication to search.
if empty, search is not performed at all.
if null, all Publication found will be returned.
This set MUST NOT be modified by implementation.
resultMap - the map that must be filled with matching {Publication's Id, score}
Returns:
true if a search was performed in the FileSearchEngine. Useful to differenciate a query returning zero result from a query not performed due to missing paramerters (text for example)
Since:
jcms-5.5.0

search

public LinkedHashMap<String,Float> search(QueryHandler qh,
                                          List<String> idList)
Description copied from interface: FileSearchEngine
Filters the given list of publication's identifier with a lucene search.

Specified by:
search in interface FileSearchEngine
Parameters:
qh - the Queryhandler in which to find search text and search options.
idList - the list of publication's identifier
Returns:
a map of publication's matching the lucene query and their score. This map is a subset of idList and respect its order.

search

public LinkedHashMap<String,Float> search(QueryHandler qh)
Description copied from interface: FileSearchEngine
Return the list of publication's identifier with a lucene search.

Specified by:
search in interface FileSearchEngine
Parameters:
qh - the Queryhandler in which to find search text and search options.
Returns:
a map of publication's matching the lucene query and their score.

getFileCount

public int getFileCount()
Specified by:
getFileCount in interface FileSearchEngine
Returns:
the number of indexed files
Since:
jcms-4.1

getLuceneDocument

public org.apache.lucene.document.Document getLuceneDocument(File file,
                                                             String content)
Retrieve a new lucene Document for the specified file in preparation of indexing.

Parameters:
file - the File to index (must no be null and file must exists). this file MUST be located under the webapp root directory (usually inside the upload directory).
content - the content of the file, optionnal.
Returns:
A new instance of Document suitable for indexation through index(File, Document)
See Also:
index(File, Document)

index

public void index(File file,
                  org.apache.lucene.document.Document doc)
Add the specified lucene Document to the index.

Parameters:
file - the File to be indexed in lucene this file MUST be located under the webapp root directory (usually inside the upload directory).
doc - the lucene Document instance build (see getLuceneDocument(File, String)

remove

public void remove(File file)
Remove the specified file from the lucene index.

Parameters:
file - the File to be removed from lucene. this file MUST be located under the webapp root directory (usually inside the upload directory).

optimize

public void optimize()
Realize a Lucene optimization of the Lucene File Index.

Since:
JCMS-6.0.2


Copyright © 2001-2010 Jalios SA. All Rights Reserved.