de.pangaea.metadataportal.harvester
Class MetadataDocument

java.lang.Object
  extended by de.pangaea.metadataportal.harvester.MetadataDocument
Direct Known Subclasses:
OAIMetadataDocument

public class MetadataDocument
extends Object

This class holds all information harvested and provides methods for IndexBuilder to create a Lucene Document instance from it.

Author:
Uwe Schindler

Nested Class Summary
 class MetadataDocument.XMLConverter
          This class handles the transformation from any source to the "official" metadata format and can even validate it
 
Field Summary
protected  Date datestamp
           
protected  boolean deleted
           
protected  SingleIndexConfig iconfig
          The index configuration.
protected  String identifier
           
 
Constructor Summary
MetadataDocument(SingleIndexConfig iconfig)
          Constructor, that creates an empty instance for the supplied index configuration.
 
Method Summary
protected  void addDefaultField(org.apache.lucene.document.Document ldoc)
          Helper method that adds the default field to the given Lucene Document instance.
protected  void addField(org.apache.lucene.document.Document ldoc, FieldConfig f, String val)
          Helper method to add a field in the correct format to given Lucene Document.
protected  void addFields(org.apache.lucene.document.Document ldoc)
          Helper method that adds all fields to the given Lucene Document instance.
protected  void addSystemVariables(Map<QName,Object> vars)
          Helper method to register all standard variables for the XPath/Templates evaluation.
protected  org.apache.lucene.document.Document createEmptyDocument()
          Helper method that generates an empty Lucene Document instance.
static MetadataDocument createInstanceFromLucene(SingleIndexConfig iconf, org.apache.lucene.document.Document ldoc)
          This static method "harvests" a stored Lucene Document from index for re-parsing.
protected  NodeList evaluateTemplate(ExpressionConfig expr)
          Helper method to evaluate a template.
protected  String evaluateTemplateAsXHTML(FieldConfig expr)
          Helper method to evaluate a template and return result as XHTML.
 MetadataDocument.XMLConverter getConverter()
          Returns a converter instance that does transformation and validation according to index config.
 Date getDatestamp()
           
 Document getFinalDOM()
          Returns XML contents as DOM tree.
 String getIdentifier()
           
 org.apache.lucene.document.Document getLuceneDocument()
          Converts this instance to a Lucene Document.
 String getXML()
          Returns XML contents as String (a cache is used).
 boolean isDeleted()
          Returns deletion status.
 void loadFromLucene(org.apache.lucene.document.Document ldoc)
          "Harvests" a stored Lucene Document from index for re-parsing.
protected  void processDocumentBoost(org.apache.lucene.document.Document ldoc)
          Helper method that evaluates the document boost for the Lucene Document instance.
protected  boolean processFilters()
          Helper method that evaluates all filters.
protected  void processXPathVariables()
          Helper method to process all user supplied variables for the XPath/Templates evaluation.
 void setDatestamp(Date datestamp)
          Set the datestamp (last modification time of document file).
 void setDeleted(boolean deleted)
          Marks a harvested document as deleted.
 void setFinalDOM(Document dom)
          Sets XML final (transformed) xml contents as DOM tree.
 void setIdentifier(String identifier)
          Set the document identifier.
 String toString()
           
protected  void walkNodeTexts(StringBuilder sb, Node n, boolean topLevel)
          Helper method to walk through a DOM tree node (n) and collect strings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

deleted

protected boolean deleted
See Also:
setDeleted(boolean)

datestamp

protected Date datestamp
See Also:
setDatestamp(java.util.Date)

identifier

protected String identifier
See Also:
setIdentifier(java.lang.String)

iconfig

protected SingleIndexConfig iconfig
The index configuration.

Constructor Detail

MetadataDocument

public MetadataDocument(SingleIndexConfig iconfig)
Constructor, that creates an empty instance for the supplied index configuration. Sub classes must always supply this exact constructor for working with Rebuilder and createInstanceFromLucene(de.pangaea.metadataportal.config.SingleIndexConfig, org.apache.lucene.document.Document).

Method Detail

createInstanceFromLucene

public static final MetadataDocument createInstanceFromLucene(SingleIndexConfig iconf,
                                                              org.apache.lucene.document.Document ldoc)
                                                       throws Exception
This static method "harvests" a stored Lucene Document from index for re-parsing. The class name for the correct MetadataDocument class extension is read from field IndexConstants.FIELDNAME_MDOC_IMPL. When the correct instance is created, it sets the SingleIndexConfig and calls loadFromLucene(org.apache.lucene.document.Document).

This method is used by the Rebuilder.

Returns:
An instance of a subclass of MetadataDocument
Throws:
Exception

loadFromLucene

public void loadFromLucene(org.apache.lucene.document.Document ldoc)
                    throws Exception
"Harvests" a stored Lucene Document from index for re-parsing. Extracts XML blob, identifier and datestamp from Document. Stored fields are not restored. They are regenerated by re-executing all XPath and Templates. SingleIndexConfig is used for index specific conversions.

Throws:
Exception

getXML

public String getXML()
              throws Exception
Returns XML contents as String (a cache is used).

Throws:
Exception

setFinalDOM

public void setFinalDOM(Document dom)
Sets XML final (transformed) xml contents as DOM tree. Invalidates cache.


getFinalDOM

public Document getFinalDOM()
Returns XML contents as DOM tree.


getConverter

public MetadataDocument.XMLConverter getConverter()
Returns a converter instance that does transformation and validation according to index config.


setDeleted

public void setDeleted(boolean deleted)
Marks a harvested document as deleted. A deleted document is not indexed and will be explicitely deleted from index. A deleted document should not contain XML data, if there is XML data it will be ignored.


isDeleted

public boolean isDeleted()
Returns deletion status.

See Also:
setDeleted(boolean)

setDatestamp

public void setDatestamp(Date datestamp)
Set the datestamp (last modification time of document file).


getDatestamp

public Date getDatestamp()
See Also:
setDatestamp(java.util.Date)

setIdentifier

public void setIdentifier(String identifier)
Set the document identifier.


getIdentifier

public String getIdentifier()
See Also:
setIdentifier(java.lang.String)

toString

public String toString()
Overrides:
toString in class Object

getLuceneDocument

public org.apache.lucene.document.Document getLuceneDocument()
                                                      throws Exception
Converts this instance to a Lucene Document.

Returns:
Lucene Document or null, if doc was deleted.
Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
IllegalStateException - if index configuration is unknown

createEmptyDocument

protected org.apache.lucene.document.Document createEmptyDocument()
                                                           throws Exception
Helper method that generates an empty Lucene Document instance. The standard fields are set to the doc properties (identifier, datestamp)

Returns:
Lucene Document or null, if doc was deleted.
Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).
IllegalStateException - if identifier is empty.

addDefaultField

protected void addDefaultField(org.apache.lucene.document.Document ldoc)
                        throws Exception
Helper method that adds the default field to the given Lucene Document instance. This method executes the XPath for the default field.

Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).

addFields

protected void addFields(org.apache.lucene.document.Document ldoc)
                  throws Exception
Helper method that adds all fields to the given Lucene Document instance. This method executes all XPath/Templates and converts the results.

Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).

processDocumentBoost

protected void processDocumentBoost(org.apache.lucene.document.Document ldoc)
                             throws Exception
Helper method that evaluates the document boost for the Lucene Document instance. This method executes the XPath and converts the results to a float (default is 1.0f).

Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).

processFilters

protected boolean processFilters()
                          throws Exception
Helper method that evaluates all filters. This method executes the XPath and converts the results to a boolean. The results of all filters are combined according to the ACCEPT/DENY type.

Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).

addSystemVariables

protected void addSystemVariables(Map<QName,Object> vars)
Helper method to register all standard variables for the XPath/Templates evaluation. Overwrite this method to register any special variables dependent on the MetadataDocument implementation. The variables must be registered in the supplied Map.


processXPathVariables

protected final void processXPathVariables()
                                    throws Exception
Helper method to process all user supplied variables for the XPath/Templates evaluation. The variables are stored in thread local storage.

Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).

evaluateTemplate

protected NodeList evaluateTemplate(ExpressionConfig expr)
                             throws TransformerException
Helper method to evaluate a template. This method is called by variables and fields, when a template is used instead of a XPath.

For internal use only!

Throws:
TransformerException

evaluateTemplateAsXHTML

protected String evaluateTemplateAsXHTML(FieldConfig expr)
                                  throws TransformerException,
                                         IOException
Helper method to evaluate a template and return result as XHTML. This method is called by fields with datatype XHTML.

For internal use only!

Throws:
TransformerException
IOException

walkNodeTexts

protected void walkNodeTexts(StringBuilder sb,
                             Node n,
                             boolean topLevel)
Helper method to walk through a DOM tree node (n) and collect strings.

For internal use only!


addField

protected void addField(org.apache.lucene.document.Document ldoc,
                        FieldConfig f,
                        String val)
                 throws Exception
Helper method to add a field in the correct format to given Lucene Document. The format is defined by the FieldConfig. The value is given as string.

For internal use only!

Throws:
Exception - if an exception occurs during transformation (various types of exceptions can be thrown).


Copyright ©2007-2009 panFMP Developers c/o Uwe Schindler