|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.pangaea.metadataportal.harvester.Harvester
de.pangaea.metadataportal.harvester.OAIHarvesterBase
public abstract class OAIHarvesterBase
Abstract base class for OAI harvesting support in panFMP. Use one of the subclasses for harvesting OAI-PMH or OAI Static Repositories.
This harvester supports the following additional harvester properties:
setSpec: OAI set to harvest (default: none)retryCount: how often retry on HTTP errors? (default: 5) retryAfterSeconds: time between retries in seconds (default: 60)timeoutAfterSeconds: HTTP Timeout for harvesting in secondsmetadataPrefix: OAI metadata prefix to harvest
| Field Summary | |
|---|---|
static int |
DEFAULT_RETRY_COUNT
|
static int |
DEFAULT_RETRY_TIME
|
static int |
DEFAULT_TIMEOUT
|
protected boolean |
filterIncomingSets
The harvester should filter incoming documents according to its set metadata. |
protected String |
metadataPrefix
the used metadata prefix from the configuration |
static String |
OAI_NS
|
static String |
OAI_STATICREPOSITORY_NS
|
protected int |
retryCount
the retryCount from configuration |
protected int |
retryTime
the retryTime from configuration |
protected Set<String> |
sets
the sets to harvest from the configuration, null to harvest all |
protected int |
timeout
the timeout from configuration |
| Fields inherited from class de.pangaea.metadataportal.harvester.Harvester |
|---|
fromDateReference, harvestCount, harvestMessageStep, iconfig, index, log |
| Constructor Summary | |
|---|---|
OAIHarvesterBase()
|
|
| Method Summary | |
|---|---|
void |
addDocument(MetadataDocument mdoc)
Adds a document to the Harvester.index working in the background. |
void |
close(boolean cleanShutdown)
Closes harvester. |
protected MetadataDocument |
createMetadataDocumentInstance()
Creates an instance of MetadataDocument and initializes it with the index config. |
protected boolean |
doParse(ExtendedDigester dig,
String url,
AtomicReference<Date> checkModifiedDate)
Harvests a URL using the suplied digester. |
protected void |
enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. |
protected EntityResolver |
getEntityResolver(EntityResolver parent)
Returns an EntityResolver that resolves all HTTP-URLS using getInputSource(java.net.URL, java.util.concurrent.atomic.AtomicReference. |
protected InputSource |
getInputSource(URL url,
AtomicReference<Date> checkModifiedDate)
Returns a SAX InputSource for retrieving stream data of an URL. |
protected org.apache.commons.digester.ObjectCreationFactory |
getMetadataDocumentFactory()
Returns a factory for creating the MetadataDocuments in Digester code (using FactoryCreateRule). |
void |
open(SingleIndexConfig iconfig)
Opens harvester for harvesting documents into the index described by the given SingleIndexConfig. |
protected void |
reset()
Resets the internal variables. |
| Methods inherited from class de.pangaea.metadataportal.harvester.Harvester |
|---|
getValidHarvesterPropertyNames, harvest, isClosed, isDocumentOutdated, isDocumentOutdated, main, runHarvester, runHarvester, setHarvestingDateReference |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String OAI_NS
public static final String OAI_STATICREPOSITORY_NS
public static final int DEFAULT_RETRY_TIME
public static final int DEFAULT_RETRY_COUNT
public static final int DEFAULT_TIMEOUT
protected String metadataPrefix
protected Set<String> sets
null to harvest all
protected int retryCount
protected int retryTime
protected int timeout
protected boolean filterIncomingSets
true.
| Constructor Detail |
|---|
public OAIHarvesterBase()
| Method Detail |
|---|
public void open(SingleIndexConfig iconfig)
throws Exception
HarvesterSingleIndexConfig.
Opens Harvester.index for usage in Harvester.harvest() method.
open in class HarvesterException - if an exception occurs during opening (various types of exceptions can be thrown).
public void addDocument(MetadataDocument mdoc)
throws IndexBuilderBackgroundFailure,
InterruptedException
HarvesterHarvester.index working in the background.
addDocument in class HarvesterIndexBuilderBackgroundFailure - if an error occurred in background thread.
Exceptions can be thrown asynchronous and may not affect the currect document.
The real exception is thrown again in Harvester.close(boolean).
InterruptedException - if wait operation was interrupted.protected MetadataDocument createMetadataDocumentInstance()
Harvester
createMetadataDocumentInstance in class Harvesterprotected org.apache.commons.digester.ObjectCreationFactory getMetadataDocumentFactory()
MetadataDocuments in Digester code (using FactoryCreateRule).
createMetadataDocumentInstance()
protected boolean doParse(ExtendedDigester dig,
String url,
AtomicReference<Date> checkModifiedDate)
throws Exception
dig - the digester instance.url - the URL is parsed by this digester instance.checkModifiedDate - for static repositories, it is possible to give a reference to a Date for checking the last modification, in this case
false is returned, if the URL was not modified. If it was modified, the reference contains a new Date object with the new modification date.
Supply null for no checking of last modification, a last modification date is then not returned back (as there is no reference).
true if harvested, false if not modified and no harvesting was done.
Exceptionprotected EntityResolver getEntityResolver(EntityResolver parent)
EntityResolver that resolves all HTTP-URLS using getInputSource(java.net.URL, java.util.concurrent.atomic.AtomicReference) .
parent - an EntityResolver that receives all unprocessed requestsgetInputSource(java.net.URL, java.util.concurrent.atomic.AtomicReference)
protected InputSource getInputSource(URL url,
AtomicReference<Date> checkModifiedDate)
throws IOException
InputSource for retrieving stream data of an URL. It is optimized for compression of the HTTP(S) protocol and timeout checking.
url - the URL to opencheckModifiedDate - for static repositories, it is possible to give a reference to a Date for checking the last modification, in this case
null is returned, if the URL was not modified. If it was modified, the reference contains a new Date object with the new modification date.
Supply null for no checking of last modification, a last modification date is then not returned back (as there is no reference).
IOExceptiongetEntityResolver(org.xml.sax.EntityResolver)protected void reset()
public void close(boolean cleanShutdown)
throws Exception
HarvesterHarvester.index is closed.
close in class HarvestercleanShutdown - enables writing of status information to the index for the next harvesting. If an error occured during harvesting this should not be done.
Exception - if an exception occurs during closing (various types of exceptions can be thrown).
Exceptions can be thrown asynchronous and may not affect the currect document.protected void enumerateValidHarvesterPropertyNames(Set<String> props)
HarvesterSet.
The public API for client code requesting property names is Harvester.getValidHarvesterPropertyNames().
enumerateValidHarvesterPropertyNames in class HarvesterHarvester.getValidHarvesterPropertyNames()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||