- Direct Known Subclasses:
public abstract class SingleFileEntitiesHarvester extends HarvesterAbstract harvester class for single file entities (like files from web page or from a local directory). The harvester makes it possible to add XML documents given by a
Sourceto the index. These are harvested, but if an fatal parse error occurs, the harvester will then stop harvesting (like it would be with OAI-PMH), ignore the document, or delete it (if existent in index) depending on the harvester property "parseErrorAction".
This panFMP harvester supports the following harvester properties in adidition to the default ones:
parseErrorAction: What to do if a parse error occurs? Can be
DELETEDOCUMENT(default is to ignore the document)
deleteMissingDocuments: remove documents after harvesting that were deleted from source (maybe a heavy operation). (default: true)
- Uwe Schindler
All Methods Instance Methods Concrete Methods Modifier and Type Method Description
addDocument(String identifier, long lastModified, Source xml)
addDocument(String identifier, Instant lastModified, Source xml)
cancelMissingDocumentDelete()disable the property "deleteMissingDocuments" for this instance.
close(boolean cleanShutdown)Closes harvester.
enumerateValidHarvesterPropertyNames(Set<String> props)This method is used by subclasses to enumerate all available harvester properties that are implemented by them.
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, harvest, isAllIndexes, isClosed, isDocumentOutdated, main, open, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers
public void close(boolean cleanShutdown) throws ExceptionDescription copied from class:
HarvesterCloses harvester. All resources are freed and the
cleanShutdown- enables writing of status information to the Elasticsearch instance for the next harvesting. If an error occurred during harvesting this should not be done.
Exception- if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the correct document.
protected final void addDocument(String identifier, long lastModified, Source xml) throws ExceptionAdds a document to the
Harvester.processorworking in the background. If a parsing error occurs the document is handled according to
parseErrorAction. It is also added to the valid identifiers (if unseen documents should be deleted).
identifier- is the document's identifier in the index
lastModified- is the last-modification date which is used to calculate the next harvesting start date. If document is older that the last harvesting, it is skipped.
xml- is the transformer source of the document,
nullto only update document status (lastModified) and adding to valid identifiers
- See Also:
protected void cancelMissingDocumentDelete()disable the property "deleteMissingDocuments" for this instance. This can be used, when the container (like a ZIP file was not modified), and all containing documents are not enumerated. To prevent deletion of all these documents call this.
enumerateValidHarvesterPropertyNamesDescription copied from class:
HarvesterThis method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the supplied
Set. The public API for client code requesting property names is