de.pangaea.metadataportal.harvester
Class SingleFileEntitiesHarvester

java.lang.Object
  extended by de.pangaea.metadataportal.harvester.Harvester
      extended by de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
Direct Known Subclasses:
DirectoryHarvester, ExternalIndexHarvester, WebCrawlingHarvester, ZipFileHarvester

public abstract class SingleFileEntitiesHarvester
extends Harvester

Abstract harvester class for single file entities (like files from web page or from a local directory). The harvester makes it possible to add XML documents given by a Source to the index. These are harvested, but if an fatal parse error occurs, the harvester will then stop harvesting (like it would be with OAI-PMH), ignore the document, or delete it (if existent in index) depending on the harvester property "parseErrorAction".

This panFMP harvester supports the following harvester properties in adidition to the default ones:

Author:
Uwe Schindler

Field Summary
 
Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, harvestMessageStep, iconfig, index, log
 
Constructor Summary
SingleFileEntitiesHarvester()
           
 
Method Summary
protected  void addDocument(String identifier, Date lastModified, Source xml)
          Adds a document to the Harvester.index working in the background.
protected  void addDocument(String identifier, long lastModified, Source xml)
          Adds a document to the Harvester.index working in the background.
protected  void cancelMissingDocumentDelete()
          disable the property "deleteMissingDocuments" for this instance.
 void close(boolean cleanShutdown)
          Closes harvester.
protected  void enumerateValidHarvesterPropertyNames(Set<String> props)
          This method is used by subclasses to enumerate all available harvester properties that are implemented by them.
 void open(SingleIndexConfig iconfig)
          Opens harvester for harvesting documents into the index described by the given SingleIndexConfig.
 
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, getValidHarvesterPropertyNames, harvest, isClosed, isDocumentOutdated, isDocumentOutdated, main, runHarvester, runHarvester, setHarvestingDateReference
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SingleFileEntitiesHarvester

public SingleFileEntitiesHarvester()
Method Detail

open

public void open(SingleIndexConfig iconfig)
          throws Exception
Description copied from class: Harvester
Opens harvester for harvesting documents into the index described by the given SingleIndexConfig. Opens Harvester.index for usage in Harvester.harvest() method.

Overrides:
open in class Harvester
Throws:
Exception - if an exception occurs during opening (various types of exceptions can be thrown).

close

public void close(boolean cleanShutdown)
           throws Exception
Description copied from class: Harvester
Closes harvester. All ressources are freed and the Harvester.index is closed.

Overrides:
close in class Harvester
Parameters:
cleanShutdown - enables writing of status information to the index for the next harvesting. If an error occured during harvesting this should not be done.
Throws:
Exception - if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the currect document.

addDocument

protected final void addDocument(String identifier,
                                 Date lastModified,
                                 Source xml)
                          throws Exception
Adds a document to the Harvester.index working in the background. If a parsing error occurs the document is handled according to parseErrorAction. It is also added to the valid identifiers (if unseen documents should be deleted).

Parameters:
identifier - is the document's identifier in the index
lastModified - is the last-modification date which is used to calculate the next harvesting start date. If document is older that the last harvesting, it is skipped.
xml - is the transformer source of the document, null to only update document status (lastModified) and adding to valid identifiers
Throws:
Exception
See Also:
Harvester.addDocument(MetadataDocument)

addDocument

protected void addDocument(String identifier,
                           long lastModified,
                           Source xml)
                    throws Exception
Adds a document to the Harvester.index working in the background.

Throws:
Exception
See Also:
addDocument(String,Date,Source)

cancelMissingDocumentDelete

protected void cancelMissingDocumentDelete()
disable the property "deleteMissingDocuments" for this instance. This can be used, when the container (like a ZIP file was not modified), and all containing documents are not enumerated. To prevent deletion of all these documents call this.


enumerateValidHarvesterPropertyNames

protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Description copied from class: Harvester
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the supplied Set. The public API for client code requesting property names is Harvester.getValidHarvesterPropertyNames().

Overrides:
enumerateValidHarvesterPropertyNames in class Harvester
See Also:
Harvester.getValidHarvesterPropertyNames()


Copyright ©2007-2009 panFMP Developers c/o Uwe Schindler