|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.pangaea.metadataportal.harvester.Harvester
public abstract class Harvester
Harvester interface to panFMP. This class is the abstract superclass of all harvesters. It also supplies an entry point for the command line interface.
All panFMP harvesters support the following harvester properties:
harvestMessageStep: After how many documents should a status message be printed out by the method addDocument(de.pangaea.metadataportal.harvester.MetadataDocument)? (default: 100)maxBufferedIndexChanges: how many documents should be harvested before the index changes are written to disk? If HarvesterCommitEvents are used,
the changes are also committed (seen by search service) after this number of changes (default: 1000)numConverterThreads: how many threads should convert documents (XPath queries and XSL templates)? (default: 1)
Raise this value, if the indexer waits to often for more documents and you have more than one processor. The optimal value is one lower than the number of processors. If you have very simple
metadata documents (simple XML schmema) and few fields, lower values may be enough. The optimal value could only be found by testing.maxConverterQueue: size of queue for converter threads. (default 250 metadata documents)maxIndexerQueue: size of queue for indexer thread. (default 250 metadata documents)autoOptimize: should the index be optimzed after harvesting is finished? (default: false)validate: validate harvested documents against schema given in configuration? (default: true, if schema given)compressXML: compress the harvested XML blob when storing in index? (default: true)conversionErrorAction: What to do if a conversion error occurs (e.g. number format error)?
Can be STOP, IGNOREDOCUMENT, DELETEDOCUMENT (default is to stop conversion)
| Field Summary | |
|---|---|
protected Date |
fromDateReference
Date from which should be harvested (in time reference of the original server) |
protected int |
harvestCount
Count of harvested documents. |
protected int |
harvestMessageStep
Step at which addDocument(de.pangaea.metadataportal.harvester.MetadataDocument) prints log messages. |
protected SingleIndexConfig |
iconfig
Index configuration |
protected IndexBuilder |
index
Instance of IndexBuilder that converts and updates the Lucene index in other threads. |
protected org.apache.commons.logging.Log |
log
Logger instance (shared by all subclasses). |
| Constructor Summary | |
|---|---|
Harvester()
Default constructor. |
|
| Method Summary | |
|---|---|
protected void |
addDocument(MetadataDocument mdoc)
Adds a document to the index working in the background. |
void |
close(boolean cleanShutdown)
Closes harvester. |
protected MetadataDocument |
createMetadataDocumentInstance()
Creates an instance of MetadataDocument and initializes it with the index config. |
protected void |
enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. |
Set<String> |
getValidHarvesterPropertyNames()
Return the Set of harvester property names that this harvester supports. |
abstract void |
harvest()
This method is called by the harvester after open(de.pangaea.metadataportal.config.SingleIndexConfig)'ing it. |
boolean |
isClosed()
Checks if harvester is closed. |
protected boolean |
isDocumentOutdated(Date lastModified)
Checks, if the supplied Datestamp needs harvesting. |
protected boolean |
isDocumentOutdated(long lastModified)
Checks, if the supplied Datestamp needs harvesting. |
static void |
main(String[] args)
External entry point to the harvester interface. |
void |
open(SingleIndexConfig iconfig)
Opens harvester for harvesting documents into the index described by the given SingleIndexConfig. |
static void |
runHarvester(Config conf,
String index)
Harvests one ( index='indexname' or more index='*') indexes. |
protected static void |
runHarvester(Config conf,
String index,
Class<? extends Harvester> harvesterClass)
Harvests one ( index="indexname") or more (index="*") indexes. |
protected void |
setHarvestingDateReference(Date harvestingDateReference)
Reference date of this harvesting event (in time reference of the original server). |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected org.apache.commons.logging.Log log
protected IndexBuilder index
IndexBuilder that converts and updates the Lucene index in other threads.
protected SingleIndexConfig iconfig
protected int harvestCount
addDocument(de.pangaea.metadataportal.harvester.MetadataDocument).
protected int harvestMessageStep
addDocument(de.pangaea.metadataportal.harvester.MetadataDocument) prints log messages. Can be changed by
the harvester property harvestMessageStep.
protected Date fromDateReference
| Constructor Detail |
|---|
public Harvester()
| Method Detail |
|---|
public static void main(String[] args)
public static void runHarvester(Config conf,
String index)
index='indexname' or more index='*') indexes. The harvester
implementation is defined by the given configuration.
protected static void runHarvester(Config conf,
String index,
Class<? extends Harvester> harvesterClass)
index="indexname") or more (index="*") indexes. The harvester
implementation is defined by the given configuration or if
harvesterClass is not null, the specified harvester will be used.
This is used by Rebuilder.
Public code should use runHarvester(Config,String).
public void open(SingleIndexConfig iconfig)
throws Exception
SingleIndexConfig.
Opens index for usage in harvest() method.
Exception - if an exception occurs during opening (various types of exceptions can be thrown).public boolean isClosed()
public void close(boolean cleanShutdown)
throws Exception
index is closed.
cleanShutdown - enables writing of status information to the index for the next harvesting. If an error occured during harvesting this should not be done.
Exception - if an exception occurs during closing (various types of exceptions can be thrown).
Exceptions can be thrown asynchronous and may not affect the currect document.protected MetadataDocument createMetadataDocumentInstance()
protected void addDocument(MetadataDocument mdoc)
throws IndexBuilderBackgroundFailure,
InterruptedException
index working in the background.
IndexBuilderBackgroundFailure - if an error occurred in background thread.
Exceptions can be thrown asynchronous and may not affect the currect document.
The real exception is thrown again in close(boolean).
InterruptedException - if wait operation was interrupted.protected final boolean isDocumentOutdated(Date lastModified)
isDocumentOutdated(long)protected boolean isDocumentOutdated(long lastModified)
isDocumentOutdated(Date)protected void setHarvestingDateReference(Date harvestingDateReference)
fromDateReference.
As long as this is null, the harvester will not write or update the value in the index directory.
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Set.
The public API for client code requesting property names is getValidHarvesterPropertyNames().
getValidHarvesterPropertyNames()public final Set<String> getValidHarvesterPropertyNames()
Set of harvester property names that this harvester supports.
This method is called on Config loading to check if all property names in the config file are correct.
You cannot override this method in your own implementation, as this method is
responsible for returning an unmodifieable Set.
For custom harvesters, append your property names in enumerateValidHarvesterPropertyNames(java.util.Set) .
enumerateValidHarvesterPropertyNames(java.util.Set)
public abstract void harvest()
throws Exception
open(de.pangaea.metadataportal.config.SingleIndexConfig)'ing it. Overwrite this
method in your harvester class.
This method should harvest files from somewhere, generate MetadataDocuments and add
them with addDocument(de.pangaea.metadataportal.harvester.MetadataDocument).
Exception - of any type.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||