Interface FileParser

  • All Superinterfaces:
    FileActionComponent
    All Known Implementing Classes:
    SampleProcessor

    public interface FileParser
    extends FileActionComponent
    A class that implements this interface is a parser for at least one kind of files. It is supposed to extract (parse) the text from the supported files.
    Version:
    $Revision: 106102 $
    • Method Detail

      • extractText

        java.lang.String extractText​(java.io.File file,
                                     java.util.Map<java.lang.String,​java.lang.Object> ctxt)
                              throws ProcessingException
        Parsers have to implement at least this method which extract the text from the specified file and returns it.
        Parameters:
        file - the file to parse
        ctxt - a Map to share informations between processings.
        Returns:
        the text to index (can be empty) or null if no action was processed (do not return null in case of error, instead throw a ProcessingException).
        Throws:
        ProcessingException - if the text could not be extracted.
      • extractText

        default void extractText​(java.io.File inFile,
                                 java.io.File outFile,
                                 java.util.Map<java.lang.String,​java.lang.Object> ctxt)
                          throws ProcessingException
        Parsers may implement this method for very efficient text extraction.

        It allows parser to provide more efficient parsing for large file (i.e. consuming less memory without loading all text in memory).

        The default implementation provided delegates to simpler and historical extractText(File, Map) method.

        Parameters:
        inFile - the file to parse
        outFile - the UTF-8 text file in which extracted text must be written.
        ctxt - a Map to share informations between processings.
        Throws:
        ProcessingException - if the text could not be extracted or saved.
        Since:
        JCMS-5312