Class LucenePublicationSearchEngine

  • All Implemented Interfaces:
    JcmsConstants, PublicationSearchEngine, JaliosConstants

    public class LucenePublicationSearchEngine
    extends LuceneDataSearchEngine
    implements PublicationSearchEngine, JcmsConstants
    This PublicationSearchEngine is reponsible for the indexing and searching of JCMS content using lucene.

    Architecture and notable points:
    • 1 lucene index per language: WEB-INF/data/lucene/PublicationsIndices/<lang>/.
    • 1 Document per indexed Publication.
    • Date fields are indexed using "yyyyMMdd" format.
    • Only String and String[] fields are added to the common appendable field ALLFIELDS_FIELD used for searching.
    • Indices' optimization occurs using schedule specified by property "search-engine.optimize-schedule" (jdring's AlarmEntry cron-like format)

    Possible Hooks/Modification:
    • Specify analyzer for each language: Analyzer getAnalyzer(String lang);
    Since:
    jcms-5.5.0
    • Field Detail

      • SPELLSUGGEST_ATTRIBUTE

        public static final java.lang.String SPELLSUGGEST_ATTRIBUTE
        This variable is the attribute's key used by the LucenePublicationSearchEngine to set the suggested search string in the QueryResultSet Attribute.
        See Also:
        Constant Field Values
      • PUBLICATION_ENGINE_NAME

        protected static final java.lang.String PUBLICATION_ENGINE_NAME
        See Also:
        Constant Field Values
      • PUBLICATION_INDEX_DIRECTORY

        protected static final java.lang.String PUBLICATION_INDEX_DIRECTORY
        See Also:
        Constant Field Values
      • OPAUTHORID_FIELD

        public static final java.lang.String OPAUTHORID_FIELD
        See Also:
        Constant Field Values
      • WORKSPACEID_FIELD

        public static final java.lang.String WORKSPACEID_FIELD
        See Also:
        Constant Field Values
      • CATEGORYID_FIELD

        public static final java.lang.String CATEGORYID_FIELD
        See Also:
        Constant Field Values
      • CLASSIFICATIONLEVEL_FIELD

        public static final java.lang.String CLASSIFICATIONLEVEL_FIELD
        See Also:
        Constant Field Values
      • FILEDOCUMENT_CONTENT_TYPE_FIELD

        public static final java.lang.String FILEDOCUMENT_CONTENT_TYPE_FIELD
        Field name for the extension of the file , e.g. "txt", or IOUtil.getExtension(fileDoc.getFile())
        See Also:
        Constant Field Values
      • FILEDOCUMENT_FILE_EXTENSION_FIELD

        public static final java.lang.String FILEDOCUMENT_FILE_EXTENSION_FIELD
        Field name for the extension of the file , e.g. "txt", or IOUtil.getExtension(fileDoc.getFile())
        See Also:
        Constant Field Values
      • FILEDOCUMENT_FILENAME_FIELD

        public static final java.lang.String FILEDOCUMENT_FILENAME_FIELD
        Field name for the relative path in jcms, e.g. "upload/docs/file.txt", or fileDoc.getFilename()
        See Also:
        Constant Field Values
      • FILEDOCUMENT_ORIGINAL_FILENAME_FIELD

        public static final java.lang.String FILEDOCUMENT_ORIGINAL_FILENAME_FIELD
        Field name for the original filename as upload by user eg "My file.txt", or fileDoc.getOriginalFilename()
        See Also:
        Constant Field Values
      • FILEDOCUMENT_MODIFIED_FIELD

        public static final java.lang.String FILEDOCUMENT_MODIFIED_FIELD
        Field name for the last modified date of the file (time in ms) when it was indexed
        See Also:
        Constant Field Values
      • FILEDOCUMENT_INDEXING_DATE_FIELD

        public static final java.lang.String FILEDOCUMENT_INDEXING_DATE_FIELD
        Field name for the date at which the FileDocument has been indexed with a non empty content
        See Also:
        Constant Field Values
      • FILEDOCUMENT_CONTENT

        public static final java.lang.String FILEDOCUMENT_CONTENT
        Field name for the content of the file as extracted with additionnal parser in FileProcessor
        See Also:
        Constant Field Values
      • HITS_TIMEOUT_PROP

        public static final java.lang.String HITS_TIMEOUT_PROP
        See Also:
        Constant Field Values
      • dateFormatter

        public static final java.text.DateFormat dateFormatter
      • HITS_TIMEOUT

        protected static final long HITS_TIMEOUT
      • LIMIT_REACHED_DURING_JSTORE_SEARCH

        public static final java.lang.String LIMIT_REACHED_DURING_JSTORE_SEARCH
        QueryHandler attribute set (to boolean value) if the maximum number of lucene result authorized (as specified by property query.lucene.pub.max-results) was reached during JStore search.
        Since:
        JCMS-6660
        See Also:
        Constant Field Values
      • TOTAL_HITS_DURING_JSTORE_SEARCH

        public static final java.lang.String TOTAL_HITS_DURING_JSTORE_SEARCH
        QueryHandler attribute set to Long value indicating the total number of hits found in the lucene index for current JStore search.
        Since:
        JCMS-6660, Long used since JCMS-7415 (was Integer before)
        See Also:
        Constant Field Values
      • LIMIT_REACHED_DURING_DB_SEARCH

        public static final java.lang.String LIMIT_REACHED_DURING_DB_SEARCH
        QueryHandler attribute set (to boolean value) if the maximum number of lucene result authorized (as specified by property query.lucene.pub.max-results) was reached during DB search.
        Since:
        JCMS-6660
        See Also:
        Constant Field Values
      • TOTAL_HITS_DURING_DB_SEARCH

        public static final java.lang.String TOTAL_HITS_DURING_DB_SEARCH
        QueryHandler attribute set to Long value indicating the total number of hits found in the lucene index for current DB search.
        Since:
        JCMS-6660, Long used since JCMS-7415 (was Integer before)
        See Also:
        Constant Field Values
    • Constructor Detail

      • LucenePublicationSearchEngine

        public LucenePublicationSearchEngine()
                                      throws java.lang.Exception
        Initialize the Lucene Search Engine
        Throws:
        java.lang.Exception - if the Publication search engine could not be instanciated correctly
    • Method Detail

      • add

        public void add​(Publication pub)
        Add given Publication to this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) added immediately after call.
        Specified by:
        add in interface PublicationSearchEngine
        Parameters:
        pub - the Publication to index .
      • update

        public void update​(Publication pub)
        Update given Publication in this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) updated immediately after call.
        Specified by:
        update in interface PublicationSearchEngine
        Parameters:
        pub - the Publication to reindex .
      • delete

        public void delete​(Publication pub)
        Delete given Publication from this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) deleted immediately after call.
        Specified by:
        delete in interface PublicationSearchEngine
        Parameters:
        pub - the Publication to reindex .
      • add

        public void add​(java.util.Collection<? extends Publication> coll)
        Add given Collection of Publication to this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) added immediately after call.
        Specified by:
        add in interface PublicationSearchEngine
        Parameters:
        coll - the Collection of Publication to index .
      • update

        public void update​(java.util.Collection<? extends Publication> coll)
        Update given Collection of Publication in this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) updated immediately after call.
        Specified by:
        update in interface PublicationSearchEngine
        Parameters:
        coll - the Collection of Publication to reindex .
      • delete

        public void delete​(java.util.Collection<? extends Publication> coll)
        Delete given Collection of Publication from this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) deleted immediately after call.
        Specified by:
        delete in interface PublicationSearchEngine
        Parameters:
        coll - the Collection of Publication to reindex .
      • getIndexingDate

        public java.util.Date getIndexingDate​(Publication pub)
        Retrieve the Date at which the specified Publication was indexed in the search engine.
        Specified by:
        getIndexingDate in interface PublicationSearchEngine
        Parameters:
        pub - the Publication for which to retrieve the indexing date.
        Returns:
        the indexing date of the publication or null if was not indexed.
        Since:
        jcms-6.0.1
      • search

        public boolean search​(QueryHandler qh,
                              java.util.HashSet<? extends Publication> pubSet,
                              QueryResultSet resultSet)
        Search for JStore Publication using lucene search engine.

        • Perform lucene text search using of QueryHandler.getText() which is required.
        • Add Publication into returned Set only if they are already in the given pubSet or if pubSet is null.
        • Caution! This method ignores all JcmsDB Publication.
        Specified by:
        search in interface PublicationSearchEngine
        Parameters:
        qh - the Queryhandler in which to find search text and search options.
        pubSet - a HashSet containing all the Publication to search.
        if empty, search is not performed at all.
        if null, all Publication found will be returned.
        This set MUST NOT be modified by implementation.
        resultSet - the QueryResultSet that must be filled with matching Publication
        Returns:
        true if a search was performed in the PublicationSearchEngine. Useful to differenciate a query returning zero result from a query not performed due to missing paramerters (text for example).
        Since:
        jcms-5.5.0
      • getMaximumResults

        public static int getMaximumResults()
        Retrieve the maximum number of results allowed to be return for a search.

        Can be configured using property query.lucene.pub.max-results

        Returns:
        a maximum number of result retrieve (result beyong this limit are ignored)
        Since:
        jcms-10.0.0
      • search

        public java.util.LinkedHashMap<java.lang.String,​java.lang.Float> search​(QueryHandler qh)
        Search for JcmsDB Publication using lucene search engine.

        • Perform lucene text search using of QueryHandler.getText() which is required.
        • Add Publication into returned Set only if they are already in the given pubSet or if pubSet is null.
        • Caution! This method ignores all JStore Publication.
        Specified by:
        search in interface PublicationSearchEngine
        Parameters:
        qh - the Queryhandler in which to find search text and search options.
        Returns:
        a map of publication id and their score.
      • getLogger

        protected org.apache.log4j.Logger getLogger()
        Description copied from class: LuceneDataSearchEngine
        This methods must be implemented by the LuceneSearchEngine. It must return the logger to be used for log messages.
        Specified by:
        getLogger in class LuceneDataSearchEngine
        Returns:
        Logger of this engine.
      • indexData

        protected void indexData​(org.apache.lucene.index.IndexWriter writer,
                                 Data data,
                                 java.lang.String lang)
                          throws java.io.IOException
        This methods index the given publication in the given language, into the given index writer.
        Specified by:
        indexData in class LuceneDataSearchEngine
        Throws:
        java.io.IOException
      • addTextFieldStored

        public void addTextFieldStored​(org.apache.lucene.document.Document doc,
                                       Publication pub,
                                       java.lang.String lang,
                                       java.lang.String fieldName,
                                       java.lang.String fieldValue)
        This methods create a stored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
        Parameters:
        doc - the lucene Document in which field will be added
        pub - the publication for which field is added
        lang - the language in which field is added, if relevant
        fieldName - the name of the field in the lucene index
        fieldValue - the value of the field in the lucene index
      • addTextFieldNotStored

        public void addTextFieldNotStored​(org.apache.lucene.document.Document doc,
                                          Publication pub,
                                          java.lang.String lang,
                                          java.lang.String fieldName,
                                          java.lang.String fieldValue)
        This methods create a unstored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
        Parameters:
        doc - the lucene Document in which field will be added
        pub - the publication for which field is added
        lang - the language in which field is added, if relevant
        fieldName - the name of the field in the lucene index
        fieldValue - the value of the field in the lucene index
      • addStringFieldStored

        public void addStringFieldStored​(org.apache.lucene.document.Document doc,
                                         Publication pub,
                                         java.lang.String lang,
                                         java.lang.String fieldName,
                                         java.lang.String fieldValue)
        This methods create a stored Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
        Parameters:
        doc - the lucene Document in which field will be added
        pub - the publication for which field is added
        lang - the language in which field is added, if relevant
        fieldName - the name of the field in the lucene index
        fieldValue - the value of the field in the lucene index
      • addStringFieldNotStored

        public void addStringFieldNotStored​(org.apache.lucene.document.Document doc,
                                            Publication pub,
                                            java.lang.String lang,
                                            java.lang.String fieldName,
                                            java.lang.String fieldValue)
        This methods create a unstored and untokenized Lucene Field with the given field's value of the given Publication in the given language, and add into the given Document.
        Parameters:
        doc - the lucene Document in which field will be added
        pub - the publication for which field is added
        lang - the language in which field is added, if relevant
        fieldName - the name of the field in the lucene index
        fieldValue - the value of the field in the lucene index