Class WebPageMetaDataExtractorUtils


  • public final class WebPageMetaDataExtractorUtils
    extends java.lang.Object
    Utils to extract a webpage metadata (Title, description, images...)
    Since:
    jcms-9.0.4 && jcms-10
    Author:
    Kevin Bransard
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String extractContent​(org.jsoup.nodes.Document document, java.lang.String attrName, java.lang.String... cssQueries)
      Returns the extracted content for given cssQueries and given attribute name
      static WebPageMetaData getWebPageMetaData​(java.lang.String url, java.lang.String userAgent)
      Returns metadata as WebPageMetaData object by connecting to given url
      static WebPageMetaData getWebPageMetaDataFromHtml​(java.lang.String html)
      Returns metadata as WebPageMetaData object by traversing given html source
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • getWebPageMetaDataFromHtml

        public static WebPageMetaData getWebPageMetaDataFromHtml​(java.lang.String html)
        Returns metadata as WebPageMetaData object by traversing given html source
        Parameters:
        html - the html to get meta data from
        Returns:
        a WebPageMetaData object
        Since:
        jcms-9.0.4
      • getWebPageMetaData

        public static WebPageMetaData getWebPageMetaData​(java.lang.String url,
                                                         java.lang.String userAgent)
        Returns metadata as WebPageMetaData object by connecting to given url
        Parameters:
        url - the url to get meta data from
        userAgent - the user agent to access url (a default user-agent will be used if null)
        Returns:
        a WebPageMetaData object
        Since:
        jcms-9.0.4
      • extractContent

        public static java.lang.String extractContent​(org.jsoup.nodes.Document document,
                                                      java.lang.String attrName,
                                                      java.lang.String... cssQueries)
        Returns the extracted content for given cssQueries and given attribute name
        Parameters:
        document - the Document
        attrName - the attribute name to search for elements returned by the css queries (Can be empty)
        cssQueries - the css queries performed to search for elements
        Returns:
        a value based on cssQueries and attribute name