Class LuceneDataSearchEngine

    • Field Detail

      • INDEXING_DATE_FIELD

        public static final java.lang.String INDEXING_DATE_FIELD
        See Also:
        Constant Field Values
      • INDEXING_DATE_EXTRAINFO

        public static final java.lang.String INDEXING_DATE_EXTRAINFO
        See Also:
        Constant Field Values
      • channel

        protected final Channel channel
      • engineName

        protected final java.lang.String engineName
      • directoryName

        protected final java.lang.String directoryName
      • multilingual

        protected final boolean multilingual
      • langList

        protected final java.util.List<java.lang.String> langList
        Languages in which the engine index and search its content.

        • all site languages for multilingual engine
        • default site language for monolingual engine
      • langToIndexDirMap

        protected final java.util.Map<java.lang.String,​org.apache.lucene.store.FSDirectory> langToIndexDirMap
        Lucene FSDirectory in which the engine index and search its content.

        • one Directory per site languages for multilingual engine
        • only one Directory in the default site language, for monolingual engine
      • langToIndexWriterMap

        protected final java.util.Map<java.lang.String,​org.apache.lucene.index.IndexWriter> langToIndexWriterMap
        Lucene IndexWriter reused for all write operation.
      • indexAccessLock

        protected final java.lang.Object indexAccessLock
    • Constructor Detail

      • LuceneDataSearchEngine

        protected LuceneDataSearchEngine​(java.lang.String engineName,
                                         java.lang.String directoryName,
                                         boolean multilingual)
                                  throws java.lang.Exception
        Construct a new Lucene Data Search Engine given a directory name.
        Parameters:
        engineName - the name of the engine (eg "Publication", "Member", "Category")
        directoryName - the name of the directory to create (eg Publication
        multilingual - true to use one index per language, false to use only one index
        Throws:
        java.lang.Exception - on any error
    • Method Detail

      • createSnapshot

        public void createSnapshot​(java.io.File targetDirectory)
                            throws java.io.IOException
        Create snapshots of all index in this engine, inside the specified target directory.
        Parameters:
        targetDirectory - the directory in which to store the snapshot
        Example : WEB-INF/data/lucene-snapshots/snapshot-2021-04-29-1640
        Throws:
        java.io.IOException
        Since:
        jcms-10.0.5 / JCMS-8395
        See Also:
        IndexSnapshotManager.createSnapshot()
      • index

        protected void index​(org.apache.lucene.index.IndexWriter writer,
                             java.util.Collection<? extends Data> coll,
                             java.lang.String lang)
                      throws java.io.IOException
        Expert : Index a Collection of Data into lucene.
        This method is NOT synchronized, the caller is responsible to do it!
        Parameters:
        writer - The Lucene directory writer with which data should be added
        coll - a collection of Data, must not be null
        lang - the language in which data are being added
        Throws:
        java.io.IOException - on io error
      • index

        protected void index​(org.apache.lucene.index.IndexWriter writer,
                             java.util.Iterator<? extends Data> iterator,
                             int iteratorSize,
                             java.lang.String lang)
                      throws java.io.IOException
        Expert : Index all Data returned by the specified Iterator into lucene.
        This method is NOT synchronized, the caller is responsible to do it!
        Parameters:
        writer - The Lucene directory writer with which data should be added
        iterator - a iterator of Data, must not be null
        lang - the language in which data are being added
        Throws:
        java.io.IOException - on io error
      • remove

        protected void remove​(org.apache.lucene.index.IndexWriter writer,
                              java.util.Collection<? extends Data> coll,
                              java.lang.String lang)
                       throws java.io.IOException
        Expert: Remove a Collection of Data from the lucene index.
        This method is NOT synchronized, the caller is responsible to do it!
        Parameters:
        writer - The Lucene directory writer with which data should be removed
        coll - a collection of Data, must not be null
        Throws:
        java.io.IOException - if the directory could not be opened or deletion could not be performed
      • getPrimaryTerm

        protected org.apache.lucene.index.Term getPrimaryTerm​(Data data)
        Retrieve the a lucene Terme suitable for use as primary key when searching/removing/updating a unique lucene document for the specified data
        Returns:
        a Term instance, must not return null
      • getDirectory

        public org.apache.lucene.store.FSDirectory getDirectory​(java.lang.String lang)
        Expert: Returns the lucene directory used for the specified language.
        Warning!!! you should not modify the index (LuceneDataSearchEngine rely on its own modification for optimization purpose), use this method only to access the directory in readonly !.
        Parameters:
        lang - the language of JCMS (ISO-639) in which to retrieve the Directory
        Returns:
        the FSDirectory of the specified language or null if no Directory is available for this language.
      • getIndexWriter

        public org.apache.lucene.index.IndexWriter getIndexWriter​(java.lang.String lang)
        Expert: Returns the lucene writer used for the specified language.
        Parameters:
        lang - the language of JCMS (ISO-639) in which to retrieve the IndexWriter
        Returns:
        the IndexWriter of the specified language or null if no Directory is available for this language.
      • getSearcherManager

        public org.apache.lucene.search.SearcherManager getSearcherManager​(java.lang.String lang)
        Expert: Returns the lucene SearcherManager used for the specified language.
        Parameters:
        lang - the language of JCMS (ISO-639) in which to retrieve the IndexWriter
        Returns:
        the SearcherManager of the specified language or null if no Directory is available for this language.
      • getAnalyzer

        public org.apache.lucene.analysis.Analyzer getAnalyzer​(java.lang.String lang,
                                                               boolean isIndexing)
        Expert : Retrieve the lucene Analyzer to use during search and indexing.
        Parameters:
        lang - the ISO-639 language code of the text analyzed, may be null
        isIndexing - true if the returned analyzer is to be used for indexing, false in any other case (e.g. during search etc..)
        Returns:
        an instance of Analyzer to use, never return null.
        Since:
        9.0.3
      • getSimilarity

        public org.apache.lucene.search.similarities.Similarity getSimilarity()
        Expert : Returns the Similarity implementation to be used by Searcher of this LuceneDataSearchEngine.
        Returns:
        the Similarity instance to use this search engine, never returns null.
      • getLuceneDocument

        public org.apache.lucene.document.Document getLuceneDocument​(Data data,
                                                                     java.lang.String lang)
        Returns the lucene Document corresponding to the specified Data in the index of the the specified language.
        Parameters:
        data - the Data being looked for
        lang - the language in which to check
        Returns:
        a lucene Document or null if could not be found
        Since:
        jcms-6.0.1
      • getIndexingDate

        public java.util.Date getIndexingDate​(Data data,
                                              java.lang.String lang)
        Returns the date at which the specified Data has been indexed for the specified language.
        Parameters:
        data - the Data being looked for
        lang - the language in which to check
        Returns:
        a Date or null if it could not be found
        Since:
        jcms-6.0.1
      • getIndexingDate

        public java.util.Date getIndexingDate​(Data data)
        Retrieve the Date at which the specified Data was indexed in the main language of the site.
        Parameters:
        data - the Data for which to retrieve the indexing date.
        Returns:
        the indexing date of the Data or null if was not indexed.
        Since:
        jcms-6.0.1
      • addData

        protected void addData​(Data data)
        Add given Data to this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) available immediately after call.
      • updateData

        protected void updateData​(Data data)
        Update given Data in this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) available immediately after call.
      • deleteData

        protected void deleteData​(Data data)
        Delete given Data from this lucene search engine. This method is asynchronous, the given data may not be (and will certainly not be) available immediately after call.
      • addDataCollection

        protected void addDataCollection​(java.util.Collection<? extends Data> coll)
        Add given Collection of Data to this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) available immediately after call.
      • updateDataCollection

        protected void updateDataCollection​(java.util.Collection<? extends Data> coll)
        Update given Collection of Data in this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) available immediately after call.
      • deleteDataCollection

        protected void deleteDataCollection​(java.util.Collection<? extends Data> coll)
        Delete given Collection of Data from this lucene search engine. This method is asynchronous, the given datas may not be (and will certainly not be) available immediately after call.
      • clearIndices

        protected void clearIndices()
        Delete all Document from all indices (overwrite existing index with a new one). Warning: this operation is undoable! It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job.
      • optimizeIndices

        public void optimizeIndices()
        Optimize all incides of the LuceneSearchEngine. Warning: This is a potentially long and heavy process on large index, do not call without being sure of what you do. It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job.
      • getLastOptimizeDateSinceRestart

        public java.util.Date getLastOptimizeDateSinceRestart()
        Returns:
        a date indicating the last time the optimize was done, or null if no optimization was done.
      • getLastOptimizeDuration

        public long getLastOptimizeDuration()
        Returns:
        a duration in millisecond indicating the duration of the last optimize operation since restart (or 0 if none occured).
      • reindexAll

        public void reindexAll()
                        throws java.io.IOException
        Clears the lucene indices of this searchengine, reindex all content retrieved using protected method getAllDataIterator(). It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job. You can access status regarding operation progress using : isOperationRunning() and getProgressState() .
        Throws:
        java.io.IOException - if an error occurs during indexing
      • reindex

        public void reindex​(LuceneDataSearchEngine.ReindexOptions options)
                     throws java.io.IOException
        Reindex all Data matching specified reindex option. It is run against the indexing thread, it will not return as long as the indexing process is not done, and will block the indexing thread from running when doing its job. You can access status regarding operation progress using : isOperationRunning() and getProgressState() .
        Parameters:
        options -
        Throws:
        java.io.IOException
        Since:
        jcms-10.0.5 / JCMS-8170
      • getLastReindexDateSinceRestart

        public java.util.Date getLastReindexDateSinceRestart()
        Returns:
        a date indicating the last time the reindex was done, or null if no reindex was done.
      • getLastReindexDuration

        public long getLastReindexDuration()
        Returns:
        a duration in millisecond indicating the duration of the last reindex operation since restart (or 0 if none occured).
      • acquireSearcher

        protected org.apache.lucene.search.IndexSearcher acquireSearcher​(java.lang.String[] languages,
                                                                         java.lang.String defaultLanguage)
                                                                  throws java.io.IOException
        Acquire a Searcher to search in the specified language
        Parameters:
        languages - the language in which search is requested
        defaultLanguage - language used if no languages were explicitely specified and if current language is not available either. (ISO-639 language code)
        Returns:
        a IndexSearcher instance
        Throws:
        java.io.IOException
      • releaseSearcher

        protected void releaseSearcher​(java.lang.String[] languages,
                                       java.lang.String defaultLanguage,
                                       org.apache.lucene.search.IndexSearcher searcher)
        Release the specified searcher, which was created for specified language.
        Parameters:
        languages - the language in which search was requested
        defaultLanguage - language used if no languages were explicitely specified and if current language is not available either. (ISO-639 language code)
        searcher - the searcher that was acquired through acquireSearcher(String[], String)
      • clearSearcher

        protected void clearSearcher()
        Close current searchers and clear it for future renewal. Called after index change.
      • getIndexingLatch

        public LuceneDataSearchEngine.IndexingLatch getIndexingLatch​(Data data)
        Retrieve a new IndexingLatch useful to be notified of the end of the next indexing operation to take place on the specified Data.

           Data original = ...
           Data update = origina.getUpdateInstance();
           // update.set(...)
           IndexingLatch latch = searchEngine.getIndexingLatch(original);
           update.performUpdate()...;
           latch.await();
         
        Parameters:
        data - the Data that must be monitored for indexing
        Returns:
        an IndexingLatch instance, or null if specified Data was null
        Since:
        jcms-8.0.2, jcms-9, JCMS-3805
      • isOperationRunning

        public boolean isOperationRunning()
        Returns:
        true if an operation of which its progess is being watched (reindexing, optimizing)
        See Also:
        getProgressState()
      • getProgressState

        public int getProgressState()
        Returns:
        a percentage showing current state of operation, or 100 if no operation is running
        See Also:
        isOperationRunning()
      • getOperationStartTime

        public long getOperationStartTime()
        Returns:
        the time at which the current operation was started, or 0 if no operation is running
        See Also:
        isOperationRunning()
      • getLogger

        protected abstract org.apache.log4j.Logger getLogger()
        This methods must be implemented by the LuceneSearchEngine. It must return the logger to be used for log messages.
        Returns:
        Logger of this engine.
      • indexData

        protected abstract void indexData​(org.apache.lucene.index.IndexWriter writer,
                                          Data data,
                                          java.lang.String lang)
                                   throws java.io.IOException
        This methods must be implemented by the LuceneSearchEngine. It must index the given data in the given language, into the given index writer.
        Throws:
        java.io.IOException