de.pangaea.metadataportal.harvester
Class Harvester

java.lang.Object
  extended by de.pangaea.metadataportal.harvester.Harvester
Direct Known Subclasses:
OAIHarvesterBase, Rebuilder, SingleFileEntitiesHarvester

public abstract class Harvester
extends Object

Harvester interface to panFMP. This class is the abstract superclass of all harvesters. It also supplies an entry point for the command line interface.

All panFMP harvesters support the following harvester properties:

Author:
Uwe Schindler

Field Summary
protected  Date fromDateReference
          Date from which should be harvested (in time reference of the original server)
protected  int harvestCount
          Count of harvested documents.
protected  int harvestMessageStep
          Step at which addDocument(de.pangaea.metadataportal.harvester.MetadataDocument) prints log messages.
protected  SingleIndexConfig iconfig
          Index configuration
protected  IndexBuilder index
          Instance of IndexBuilder that converts and updates the Lucene index in other threads.
protected  org.apache.commons.logging.Log log
          Logger instance (shared by all subclasses).
 
Constructor Summary
Harvester()
          Default constructor.
 
Method Summary
protected  void addDocument(MetadataDocument mdoc)
          Adds a document to the index working in the background.
 void close(boolean cleanShutdown)
          Closes harvester.
protected  MetadataDocument createMetadataDocumentInstance()
          Creates an instance of MetadataDocument and initializes it with the index config.
protected  void enumerateValidHarvesterPropertyNames(Set<String> props)
          This method is used by subclasses to enumerate all available harvester properties that are implemented by them.
 Set<String> getValidHarvesterPropertyNames()
          Return the Set of harvester property names that this harvester supports.
abstract  void harvest()
          This method is called by the harvester after open(de.pangaea.metadataportal.config.SingleIndexConfig)'ing it.
 boolean isClosed()
          Checks if harvester is closed.
protected  boolean isDocumentOutdated(Date lastModified)
          Checks, if the supplied Datestamp needs harvesting.
protected  boolean isDocumentOutdated(long lastModified)
          Checks, if the supplied Datestamp needs harvesting.
static void main(String[] args)
          External entry point to the harvester interface.
 void open(SingleIndexConfig iconfig)
          Opens harvester for harvesting documents into the index described by the given SingleIndexConfig.
static void runHarvester(Config conf, String index)
          Harvests one (index='indexname' or more index='*') indexes.
protected static void runHarvester(Config conf, String index, Class<? extends Harvester> harvesterClass)
          Harvests one (index="indexname") or more (index="*") indexes.
protected  void setHarvestingDateReference(Date harvestingDateReference)
          Reference date of this harvesting event (in time reference of the original server).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected org.apache.commons.logging.Log log
Logger instance (shared by all subclasses).


index

protected IndexBuilder index
Instance of IndexBuilder that converts and updates the Lucene index in other threads.


iconfig

protected SingleIndexConfig iconfig
Index configuration


harvestCount

protected int harvestCount
Count of harvested documents. Incremented by addDocument(de.pangaea.metadataportal.harvester.MetadataDocument).


harvestMessageStep

protected int harvestMessageStep
Step at which addDocument(de.pangaea.metadataportal.harvester.MetadataDocument) prints log messages. Can be changed by the harvester property harvestMessageStep.


fromDateReference

protected Date fromDateReference
Date from which should be harvested (in time reference of the original server)

Constructor Detail

Harvester

public Harvester()
Default constructor.

Method Detail

main

public static void main(String[] args)
External entry point to the harvester interface. Called from the Java command line with two parameters (config file, index name)


runHarvester

public static void runHarvester(Config conf,
                                String index)
Harvests one (index='indexname' or more index='*') indexes. The harvester implementation is defined by the given configuration.


runHarvester

protected static void runHarvester(Config conf,
                                   String index,
                                   Class<? extends Harvester> harvesterClass)
Harvests one (index="indexname") or more (index="*") indexes. The harvester implementation is defined by the given configuration or if harvesterClass is not null, the specified harvester will be used. This is used by Rebuilder. Public code should use runHarvester(Config,String).


open

public void open(SingleIndexConfig iconfig)
          throws Exception
Opens harvester for harvesting documents into the index described by the given SingleIndexConfig. Opens index for usage in harvest() method.

Throws:
Exception - if an exception occurs during opening (various types of exceptions can be thrown).

isClosed

public boolean isClosed()
Checks if harvester is closed.


close

public void close(boolean cleanShutdown)
           throws Exception
Closes harvester. All ressources are freed and the index is closed.

Parameters:
cleanShutdown - enables writing of status information to the index for the next harvesting. If an error occured during harvesting this should not be done.
Throws:
Exception - if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the currect document.

createMetadataDocumentInstance

protected MetadataDocument createMetadataDocumentInstance()
Creates an instance of MetadataDocument and initializes it with the index config. This method should be overwritten, if a harvester uses another class.


addDocument

protected void addDocument(MetadataDocument mdoc)
                    throws IndexBuilderBackgroundFailure,
                           InterruptedException
Adds a document to the index working in the background.

Throws:
IndexBuilderBackgroundFailure - if an error occurred in background thread. Exceptions can be thrown asynchronous and may not affect the currect document. The real exception is thrown again in close(boolean).
InterruptedException - if wait operation was interrupted.

isDocumentOutdated

protected final boolean isDocumentOutdated(Date lastModified)
Checks, if the supplied Datestamp needs harvesting. This method can be used to find out, if a documents needs harvesting.

See Also:
isDocumentOutdated(long)

isDocumentOutdated

protected boolean isDocumentOutdated(long lastModified)
Checks, if the supplied Datestamp needs harvesting. This method can be used to find out, if a documents needs harvesting.

See Also:
isDocumentOutdated(Date)

setHarvestingDateReference

protected void setHarvestingDateReference(Date harvestingDateReference)
Reference date of this harvesting event (in time reference of the original server). This date is used on the next harvesting in variable fromDateReference. As long as this is null, the harvester will not write or update the value in the index directory.


enumerateValidHarvesterPropertyNames

protected void enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the supplied Set. The public API for client code requesting property names is getValidHarvesterPropertyNames().

See Also:
getValidHarvesterPropertyNames()

getValidHarvesterPropertyNames

public final Set<String> getValidHarvesterPropertyNames()
Return the Set of harvester property names that this harvester supports. This method is called on Config loading to check if all property names in the config file are correct. You cannot override this method in your own implementation, as this method is responsible for returning an unmodifieable Set. For custom harvesters, append your property names in enumerateValidHarvesterPropertyNames(java.util.Set).

See Also:
enumerateValidHarvesterPropertyNames(java.util.Set)

harvest

public abstract void harvest()
                      throws Exception
This method is called by the harvester after open(de.pangaea.metadataportal.config.SingleIndexConfig)'ing it. Overwrite this method in your harvester class. This method should harvest files from somewhere, generate MetadataDocuments and add them with addDocument(de.pangaea.metadataportal.harvester.MetadataDocument).

Throws:
Exception - of any type.


Copyright ©2007-2009 panFMP Developers c/o Uwe Schindler