Class MetadataDocument

  • Direct Known Subclasses:
    OAIMetadataDocument

    public class MetadataDocument
    extends Object
    This class holds all information harvested and provides methods for DocumentProcessor to create a XContentBuilder instance from it.
    Author:
    Uwe Schindler
    • Constructor Detail

      • MetadataDocument

        public MetadataDocument​(HarvesterConfig iconfig)
        Constructor, that creates an empty instance for the supplied index configuration.
    • Method Detail

      • loadFromElasticSearchHit

        public void loadFromElasticSearchHit​(org.elasticsearch.search.SearchHit hit)
                                      throws Exception
        "Harvests" a Elasticsearch SearchHit from index for re-parsing. Extracts XML blob, identifier and datestamp from Document. Stored fields are not restored. They are regenerated by re-executing all XPath and Templates. HarvesterConfig is used for index specific conversions.
        Throws:
        Exception
      • setFinalDOM

        public void setFinalDOM​(Document dom)
        Sets XML final (transformed) xml contents as DOM tree. Invalidates cache.
      • getFinalDOM

        public Document getFinalDOM()
        Returns XML contents as DOM tree.
      • getConverter

        public MetadataDocument.XMLConverter getConverter()
        Returns a converter instance that does transformation and validation according to index config.
      • setDeleted

        public void setDeleted​(boolean deleted)
        Marks a harvested document as deleted. A deleted document is not indexed and will be explicitely deleted from index. A deleted document should not contain XML data, if there is XML data it will be ignored.
      • isDeleted

        public boolean isDeleted()
        Returns deletion status.
        See Also:
        setDeleted(boolean)
      • setDatestamp

        public void setDatestamp​(Instant datestamp)
        Set the datestamp (last modification time of document file).
      • setIdentifier

        public void setIdentifier​(String identifier)
        Set the document identifier.
      • getKeyValuePairs

        public KeyValuePairs getKeyValuePairs()
                                       throws Exception
        Converts this instance to a Elasticsearch JSON node
        Returns:
        XContentBuilder or null, if doc was deleted.
        Throws:
        Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
        IllegalStateException - if index configuration is unknown
      • createEmptyKeyValuePairs

        protected KeyValuePairs createEmptyKeyValuePairs()
                                                  throws Exception
        Helper method that generates an empty XContentBuilder instance. The standard fields are set to the doc properties (identifier, datestamp)
        Returns:
        XContentBuilder or null, if doc was deleted.
        Throws:
        Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
        IllegalStateException - if identifier is empty.
      • finalizeKeyValuePairs

        protected void finalizeKeyValuePairs​(KeyValuePairs kv)
                                      throws Exception
        Helper method that finalizes the JSON document
        Throws:
        Exception
      • addFields

        protected void addFields​(KeyValuePairs kv)
                          throws Exception
        Helper method that adds all fields to the given XContentBuilder instance. This method executes all XPath/Templates and converts the results.
        Throws:
        Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
      • processFilters

        protected boolean processFilters()
                                  throws Exception
        Helper method that evaluates all filters. This method executes the XPath and converts the results to a boolean. The results of all filters are combined according to the ACCEPT/DENY type.
        Throws:
        Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
      • addSystemVariables

        protected void addSystemVariables​(Map<QName,​Object> vars)
        Helper method to register all standard variables for the XPath/Templates evaluation. Overwrite this method to register any special variables dependent on the MetadataDocument implementation. The variables must be registered in the supplied Map.
      • processXPathVariables

        protected final void processXPathVariables()
                                            throws Exception
        Helper method to process all user supplied variables for the XPath/Templates evaluation. The variables are stored in thread local storage.
        Throws:
        Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
      • walkNodeTexts

        protected void walkNodeTexts​(StringBuilder sb,
                                     Node n,
                                     boolean topLevel)
        Helper method to walk through a DOM tree node (n) and collect strings.

        For internal use only!

      • addField

        protected void addField​(KeyValuePairs kv,
                                FieldConfig f,
                                String val)
                         throws Exception
        Helper method to add a field in the correct format to given XContentBuilder. The format is defined by the FieldConfig. The value is given as string.

        For internal use only!

        Throws:
        Exception - if an exception occurs during transformation (various types of exceptions can be thrown).