com.jalios.jcms.search
Class LucenePublicationSearchEngine

java.lang.Object
  extended by com.jalios.jcms.search.LuceneDataSearchEngine
      extended by com.jalios.jcms.search.LucenePublicationSearchEngine
All Implemented Interfaces:
JcmsConstants, PublicationSearchEngine, JaliosConstants

public class LucenePublicationSearchEngine
extends LuceneDataSearchEngine
implements PublicationSearchEngine, JcmsConstants

This PublicationSearchEngine is reponsible for the indexing and searching of JCMS content using lucene.

Architecture and notable points:

  • 1 lucene index per language: WEB-INF/data/lucene/PublicationsIndices/<lang>/.
  • 1 Document per indexed Publication.
  • Date fields are indexed using "yyyyMMdd" format.
  • Only String and String[] fields are added to the common appendable field ALLFIELDS_FIELD used for searching.
  • Indices' optimization occurs using schedule specified by property "search-engine.optimize-schedule" (jdring's AlarmEntry cron-like format)

  • Possible Hooks/Modification:
  • Specify analyzer for each language: Analyzer getAnalyzer(String lang);
  • Specify boost for each Document, in each language: LuceneSearchEnginePolicyFilter.getPublicationBoost(Publication, String, float)
  • Specify boost for each Document'Field, in each language: LuceneSearchEnginePolicyFilter.getFieldBoost(Publication, String, String, String, float)
  • Since:
    jcms-5.5.0
    Version:
    $Revision: 27753 $
    Author:
    Olivier Jaquemet

    Nested Class Summary
     
    Nested classes/interfaces inherited from class com.jalios.jcms.search.LuceneDataSearchEngine
    LuceneDataSearchEngine.MultiSearcherWrapper
     
    Field Summary
    static String ABSTRACT_FIELD
               
    static String ADATE_FIELD
               
    static String ALLFIELDS_FIELD
               
    static String AUTHORID_FIELD
               
    static String CDATE_FIELD
               
    static DateFormat dateFormatter
               
    static String EDATE_FIELD
               
    static String FILEDOCUMENT_CONTENT_TYPE_FIELD
               
    static String FILEDOCUMENT_FILE_EXTENSION_FIELD
               
    static String FILEDOCUMENT_FILENAME_FIELD
               
    static String FILEDOCUMENT_ORIGINAL_FILENAME_FIELD
               
    protected static long HITS_TIMEOUT
               
    static String HITS_TIMEOUT_PROP
               
    static String LUCENE_ADVANCED_QUERY_ATTRIBUTE
              QueryHandler attribute name for the optionnal advanced lucene query that can be performed with this Engine.
    static String PDATE_FIELD
               
    static String PSTATUS_FIELD
               
    protected static String PUBLICATION_INDEX_DIRECTORY
               
    static String REVISION
               
    static String SDATE_FIELD
               
    static String SPELLSUGGEST_ATTRIBUTE
              This variable is the attribute's key used by the LucenePublicationSearchEngine to set the suggested search string in the QueryResultSet Attribute.
    protected  SpellSuggestEngine spellSuggestEngine
               
    static String TITLE_FIELD
               
    static String WORKSPACEID_FIELD
               
     
    Fields inherited from class com.jalios.jcms.search.LuceneDataSearchEngine
    alarmMgr, channel, directoryName, ID_FIELD, indexAccessLock, INDEXING_DATE_EXTRAINFO, INDEXING_DATE_FIELD, langList, langToIndexDirMap, MAX_BUFFERED_DOCS, MAX_FIELD_LENGTH, MAX_MERGE_DOCS, MERGE_FACTOR
     
    Fields inherited from interface com.jalios.jcms.JcmsConstants
    ADATE_SEARCH, ADMIN_NOTES_PROP, ADVANCED_TAB, ARCHIVES_DIR, ASCII_WIDTH, CATEGORY_TAB, CDATE_SEARCH, COMMON_ALARM, CONTENT_TAB, COOKIE_MAX_AGE, CRYPT_MD5, CRYPT_UNDEFINED, CRYPT_UNIX, CTRL_TOPIC_INTERNAL, CTRL_TOPIC_REF, CTRL_TOPIC_VALUE, CTRL_TOPIC_WRITE, CUSTOM_PROP, DOCCHOOSER_HEIGHT, DOCCHOOSER_WIDTH, DOCS_DIR, EDATE_SEARCH, EMAIL_REGEXP, ERROR_MSG, FORBIDDEN_FILE_ACCESS, FORBIDDEN_REDIRECT, FORCE_REDIRECT, ICON_ARCHIVE, ICON_LOCK, ICON_LOCK_STRONG, ICON_WARN, ICON_WH_BOOK_CLOSED, ICON_WH_BOOK_OPEN, INFORMATION_MSG, JALIOS_JUNIT_PROP, JCMS_CADDY, JCMS_MSG_LIST, JSYNC_DOWNLOAD_DIR, JSYNC_SYNC_ALARM, LOG_FILE, LOG_TOPIC_SECURITY, LOGGER_PROP, LOGGER_XMLPROP, MBR_PHOTO_DIR, MDATE_SEARCH, MONITOR_XML, OP_CREATE, OP_DEEP_COPY, OP_DEEP_DELETE, OP_DELETE, OP_MERGE, OP_UPDATE, PDATE_SEARCH, PHOTO_DIR, PHOTO_ICON, PHOTO_ICON_HEIGHT, PHOTO_ICON_WIDTH, PHOTO_LARGE, PHOTO_LARGE_HEIGHT, PHOTO_LARGE_WIDTH, PHOTO_NORMAL, PHOTO_NORMAL_HEIGHT, PHOTO_NORMAL_WIDTH, PHOTO_SMALL, PHOTO_SMALL_HEIGHT, PHOTO_SMALL_WIDTH, PHOTO_TINY, PHOTO_TINY_HEIGHT, PHOTO_TINY_WIDTH, PREVIOUS_TAB, PRINT_VIEW, PRIVATE_FILE_ACCESS, PUBLIC_FILE_ACCESS, READ_RIGHT_TAB, SDATE_SEARCH, SEARCHENGINE_ALARM, SESSION_AUTHORIZED_FILENAMES_SET, STATS_REPORT_DIR, STATUS_PROP, STORE_XML, TEMPLATE_TAB, THUMBNAIL_LARGE_HEIGHT, THUMBNAIL_LARGE_WIDTH, THUMBNAIL_SMALL_HEIGHT, THUMBNAIL_SMALL_WIDTH, UDATE_SEARCH, UPDATE_RIGHT_TAB, UPLOAD_DIR, URL_REGEXP, WARNING_MSG, WEBAPP_PROP, WFEXPRESS_ALARM, WFREMINDER_ALARM, WORKFLOW_TAB, WORKFLOW_XML
     
    Fields inherited from interface com.jalios.util.JaliosConstants
    CRLF, MILLIS_IN_ONE_DAY, MILLIS_IN_ONE_HOUR, MILLIS_IN_ONE_MINUTE, MILLIS_IN_ONE_MONTH, MILLIS_IN_ONE_SECOND, MILLIS_IN_ONE_WEEK, MILLIS_IN_ONE_YEAR
     
    Constructor Summary
    LucenePublicationSearchEngine()
              Initialize the Lucene Search Engine
     
    Method Summary
     void add(Collection<? extends Publication> coll)
              Add given Collection of Publication to this lucene search engine.
     void add(Publication pub)
              Add given Publication to this lucene search engine.
     void addKeywordField(org.apache.lucene.document.Document doc, Publication pub, String lang, String fieldName, String fieldValue, boolean applyBoost)
              This methods create a unstored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
     void addRawField(org.apache.lucene.document.Document doc, Publication pub, String lang, String fieldName, String fieldValue, boolean applyBoost)
              This methods create a unstored and untokenized Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
     void addStoredField(org.apache.lucene.document.Document doc, Publication pub, String lang, String fieldName, String fieldValue, boolean applyBoost)
              This methods create a stored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
     void addUnStoredField(org.apache.lucene.document.Document doc, Publication pub, String lang, String fieldName, String fieldValue, boolean applyBoost)
              This methods create a unstored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
     void clearAll()
              Clear indices in this searchEngine (undoable!).
     void delete(Collection<? extends Publication> coll)
              Delete given Collection of Publication from this lucene search engine.
     void delete(Publication pub)
              Delete given Publication from this lucene search engine.
    protected  com.jalios.jcms.search.LuceneDataSearchEngine.DataIterator<Data> getAllDataIterator()
              This methods must be implemented by the LuceneSearchEngine.
     Date getIndexingDate(Publication pub)
              Retrieve the Date at which the specified Publication was indexed in the search engine.
    protected  org.apache.log4j.Logger getLogger()
              This methods must be implemented by the LuceneSearchEngine.
     SpellSuggestEngine getSpellSuggestEngine()
               
    static boolean hasAdvancedLuceneQuery(QueryHandler qh)
              Check if an advanced lucene query has been specified in the QueryHandler attribute of the specified QueryHandler.
    protected  void indexData(org.apache.lucene.index.IndexWriter writer, Data data, String lang)
              This methods index the given publication in the given language, into the given index writer.
     LinkedHashMap<String,Float> search(QueryHandler qh)
              Return the list of publication's identifier with a lucene search.
     boolean search(QueryHandler qh, HashSet<? extends Publication> pubSet, QueryResultSet resultSet)
              Search Publication using lucene search engine.
     LinkedHashMap<String,Float> search(QueryHandler qh, List<String> idList)
              Filters the given list of publication's identifier with a lucene search.
     void update(Collection<? extends Publication> coll)
              Update given Collection of Publication in this lucene search engine.
     void update(Publication pub)
              Update given Publication in this lucene search engine.
     
    Methods inherited from class com.jalios.jcms.search.LuceneDataSearchEngine
    addData, addDataCollection, clearIndices, deleteData, deleteDataCollection, getDirectory, getIndexingDate, getIndexingDate, getLastOptimizeDateSinceRestart, getLastOptimizeDuration, getLastReindexDateSinceRestart, getLastReindexDuration, getLuceneDocument, getOperationStartTime, getProgressState, getSearcher, index, index, isOperationRunning, optimizeIndices, reindexAll, remove, setIndexWriterOptions, updateData, updateDataCollection
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    REVISION

    public static final String REVISION
    See Also:
    Constant Field Values

    SPELLSUGGEST_ATTRIBUTE

    public static final String SPELLSUGGEST_ATTRIBUTE
    This variable is the attribute's key used by the LucenePublicationSearchEngine to set the suggested search string in the QueryResultSet Attribute.

    See Also:
    Constant Field Values

    LUCENE_ADVANCED_QUERY_ATTRIBUTE

    public static final String LUCENE_ADVANCED_QUERY_ATTRIBUTE
    QueryHandler attribute name for the optionnal advanced lucene query that can be performed with this Engine.

    Example to require a custom field :

     QueryHandler qh = new QueryHandler();
     //qh.set(...)
     qh.setAttribute(LUCENE_ADVANCED_QUERY_ATTRIBUTE, "+myField:somevalue"); 
     QueryResultSet qrs = qh.getResultSet();
     

    See Also:
    Constant Field Values

    PUBLICATION_INDEX_DIRECTORY

    protected static final String PUBLICATION_INDEX_DIRECTORY
    See Also:
    Constant Field Values

    TITLE_FIELD

    public static final String TITLE_FIELD
    See Also:
    Constant Field Values

    ABSTRACT_FIELD

    public static final String ABSTRACT_FIELD
    See Also:
    Constant Field Values

    ALLFIELDS_FIELD

    public static final String ALLFIELDS_FIELD
    See Also:
    Constant Field Values

    AUTHORID_FIELD

    public static final String AUTHORID_FIELD
    See Also:
    Constant Field Values

    PSTATUS_FIELD

    public static final String PSTATUS_FIELD
    See Also:
    Constant Field Values

    WORKSPACEID_FIELD

    public static final String WORKSPACEID_FIELD
    See Also:
    Constant Field Values

    CDATE_FIELD

    public static final String CDATE_FIELD
    See Also:
    Constant Field Values

    PDATE_FIELD

    public static final String PDATE_FIELD
    See Also:
    Constant Field Values

    SDATE_FIELD

    public static final String SDATE_FIELD
    See Also:
    Constant Field Values

    EDATE_FIELD

    public static final String EDATE_FIELD
    See Also:
    Constant Field Values

    ADATE_FIELD

    public static final String ADATE_FIELD
    See Also:
    Constant Field Values

    FILEDOCUMENT_CONTENT_TYPE_FIELD

    public static final String FILEDOCUMENT_CONTENT_TYPE_FIELD
    See Also:
    Constant Field Values

    FILEDOCUMENT_FILE_EXTENSION_FIELD

    public static final String FILEDOCUMENT_FILE_EXTENSION_FIELD
    See Also:
    Constant Field Values

    FILEDOCUMENT_FILENAME_FIELD

    public static final String FILEDOCUMENT_FILENAME_FIELD
    See Also:
    Constant Field Values

    FILEDOCUMENT_ORIGINAL_FILENAME_FIELD

    public static final String FILEDOCUMENT_ORIGINAL_FILENAME_FIELD
    See Also:
    Constant Field Values

    HITS_TIMEOUT_PROP

    public static final String HITS_TIMEOUT_PROP
    See Also:
    Constant Field Values

    dateFormatter

    public static final DateFormat dateFormatter

    HITS_TIMEOUT

    protected static final long HITS_TIMEOUT

    spellSuggestEngine

    protected SpellSuggestEngine spellSuggestEngine
    Constructor Detail

    LucenePublicationSearchEngine

    public LucenePublicationSearchEngine()
                                  throws Exception
    Initialize the Lucene Search Engine

    Throws:
    Exception - if the Publication search engine could not be instanciated correctly
    Method Detail

    add

    public void add(Publication pub)
    Add given Publication to this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) added immediately after call.

    Specified by:
    add in interface PublicationSearchEngine
    Parameters:
    pub - the Publication to index .

    update

    public void update(Publication pub)
    Update given Publication in this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) updated immediately after call.

    Specified by:
    update in interface PublicationSearchEngine
    Parameters:
    pub - the Publication to reindex .

    delete

    public void delete(Publication pub)
    Delete given Publication from this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) deleted immediately after call.

    Specified by:
    delete in interface PublicationSearchEngine
    Parameters:
    pub - the Publication to reindex .

    add

    public void add(Collection<? extends Publication> coll)
    Add given Collection of Publication to this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) added immediately after call.

    Specified by:
    add in interface PublicationSearchEngine
    Parameters:
    coll - the Collection of Publication to index .

    update

    public void update(Collection<? extends Publication> coll)
    Update given Collection of Publication in this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) updated immediately after call.

    Specified by:
    update in interface PublicationSearchEngine
    Parameters:
    coll - the Collection of Publication to reindex .

    delete

    public void delete(Collection<? extends Publication> coll)
    Delete given Collection of Publication from this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) deleted immediately after call.

    Specified by:
    delete in interface PublicationSearchEngine
    Parameters:
    coll - the Collection of Publication to reindex .

    getIndexingDate

    public Date getIndexingDate(Publication pub)
    Retrieve the Date at which the specified Publication was indexed in the search engine.

    Specified by:
    getIndexingDate in interface PublicationSearchEngine
    Parameters:
    pub - the Publication for which to retrieve the indexing date.
    Returns:
    the indexing date of the publication or null if was not indexed.
    Since:
    jcms-6.0.1

    clearAll

    public void clearAll()
    Clear indices in this searchEngine (undoable!).

    Specified by:
    clearAll in interface PublicationSearchEngine

    search

    public boolean search(QueryHandler qh,
                          HashSet<? extends Publication> pubSet,
                          QueryResultSet resultSet)
    Search Publication using lucene search engine.
  • Search in all lucene indices using text of QueryHandler (qh.getText()).
  • Uses only the Lucene Analyzer of the user's language.
  • Add Publication into returned Set only if they are already in the given pubSet or if pubSet is null.
  • Caution! This method ignores all DBData.

    Specified by:
    search in interface PublicationSearchEngine
    Parameters:
    qh - the Queryhandler in which to find search text and search options.
    pubSet - a HashSet containing all the Publication to search.
    if empty, search is not performed at all.
    if null, all Publication found will be returned.
    This set MUST NOT be modified by implementation.
    resultSet - the QueryResultSet that must be filled with matching Publication
    Returns:
    true if a search was performed in the PublicationSearchEngine. Useful to differenciate a query returning zero result from a query not performed due to missing paramerters (text for example).
    Since:
    jcms-5.5.0

  • search

    public LinkedHashMap<String,Float> search(QueryHandler qh,
                                              List<String> idList)
    Filters the given list of publication's identifier with a lucene search.

    Specified by:
    search in interface PublicationSearchEngine
    Parameters:
    qh - the Queryhandler in which to find search text and search options.
    idList - the list of publication's identifier
    Returns:
    a map of publication's matching the lucene query and their score. This map is a subset of idList and respect its order.
    Since:
    jcms-6.0.0

    search

    public LinkedHashMap<String,Float> search(QueryHandler qh)
    Description copied from interface: PublicationSearchEngine
    Return the list of publication's identifier with a lucene search.

    Specified by:
    search in interface PublicationSearchEngine
    Parameters:
    qh - the Queryhandler in which to find search text and search options.
    Returns:
    a map of publication's matching the lucene query and their score.

    getSpellSuggestEngine

    public SpellSuggestEngine getSpellSuggestEngine()

    getLogger

    protected org.apache.log4j.Logger getLogger()
    Description copied from class: LuceneDataSearchEngine
    This methods must be implemented by the LuceneSearchEngine. It must return the logger to be used for log messages.

    Specified by:
    getLogger in class LuceneDataSearchEngine
    Returns:
    Logger of this engine.

    getAllDataIterator

    protected com.jalios.jcms.search.LuceneDataSearchEngine.DataIterator<Data> getAllDataIterator()
    Description copied from class: LuceneDataSearchEngine
    This methods must be implemented by the LuceneSearchEngine. It must return a DataIterator used to iterate on all Data to index. Used by LuceneDataSearchEngine.reindexAll().

    Specified by:
    getAllDataIterator in class LuceneDataSearchEngine

    indexData

    protected void indexData(org.apache.lucene.index.IndexWriter writer,
                             Data data,
                             String lang)
                      throws IOException
    This methods index the given publication in the given language, into the given index writer.

    Specified by:
    indexData in class LuceneDataSearchEngine
    Throws:
    IOException

    addStoredField

    public void addStoredField(org.apache.lucene.document.Document doc,
                               Publication pub,
                               String lang,
                               String fieldName,
                               String fieldValue,
                               boolean applyBoost)
    This methods create a stored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.

    Parameters:
    doc - the lucene Document in which field will be added
    pub - the publication for which field is added
    lang - the language in which field is added, if relevant
    fieldName - the name of the field in the lucene index
    fieldValue - the value of the field in the lucene index
    applyBoost - whether to apply the boost, useful for appendable field in which case the boost should only be applied for the first element.

    addUnStoredField

    public void addUnStoredField(org.apache.lucene.document.Document doc,
                                 Publication pub,
                                 String lang,
                                 String fieldName,
                                 String fieldValue,
                                 boolean applyBoost)
    This methods create a unstored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.

    Parameters:
    doc - the lucene Document in which field will be added
    pub - the publication for which field is added
    lang - the language in which field is added, if relevant
    fieldName - the name of the field in the lucene index
    fieldValue - the value of the field in the lucene index
    applyBoost - whether to apply the boost, useful for appendable field in which case the boost should only be applied for the first element.

    addKeywordField

    public void addKeywordField(org.apache.lucene.document.Document doc,
                                Publication pub,
                                String lang,
                                String fieldName,
                                String fieldValue,
                                boolean applyBoost)
    This methods create a unstored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.

    Parameters:
    doc - the lucene Document in which field will be added
    pub - the publication for which field is added
    lang - the language in which field is added, if relevant
    fieldName - the name of the field in the lucene index
    fieldValue - the value of the field in the lucene index
    applyBoost - whether to apply the boost, useful for appendable field in which case the boost should only be applied for the first element.

    addRawField

    public void addRawField(org.apache.lucene.document.Document doc,
                            Publication pub,
                            String lang,
                            String fieldName,
                            String fieldValue,
                            boolean applyBoost)
    This methods create a unstored and untokenized Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.

    Parameters:
    doc - the lucene Document in which field will be added
    pub - the publication for which field is added
    lang - the language in which field is added, if relevant
    fieldName - the name of the field in the lucene index
    fieldValue - the value of the field in the lucene index
    applyBoost - whether to apply the boost, useful for appendable field in which case the boost should only be applied for the first element.

    hasAdvancedLuceneQuery

    public static boolean hasAdvancedLuceneQuery(QueryHandler qh)
    Check if an advanced lucene query has been specified in the QueryHandler attribute of the specified QueryHandler.

    Parameters:
    qh - the QueryHandler to check
    Returns:
    true if queryhandler contains attribute LucenePublicationSearchEngine.LUCENE_ADVANCED_QUERY_ATTRIBUTE
    Since:
    jcms-6.2


    Copyright © 2001-2010 Jalios SA. All Rights Reserved.