Class SingleFileEntitiesHarvester
- java.lang.Object
-
- de.pangaea.metadataportal.harvester.Harvester
-
- de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
-
- Direct Known Subclasses:
DirectoryHarvester,ElasticsearchHarvester,PanFMP1IndexHarvester,PushWrapperHarvester,WebCrawlingHarvester,ZipFileHarvester
public abstract class SingleFileEntitiesHarvester extends Harvester
Abstract harvester class for single file entities (like files from web page or from a local directory). The harvester makes it possible to add XML documents given by aSourceto the index. These are harvested, but if an fatal parse error occurs, the harvester will then stop harvesting (like it would be with OAI-PMH), ignore the document, or delete it (if existent in index) depending on the harvester property "parseErrorAction".This panFMP harvester supports the following harvester properties in adidition to the default ones:
parseErrorAction: What to do if a parse error occurs? Can beSTOP,IGNOREDOCUMENT,DELETEDOCUMENT(default is to ignore the document)deleteMissingDocuments: remove documents after harvesting that were deleted from source (maybe a heavy operation). (default: true)
- Author:
- Uwe Schindler
-
-
Field Summary
-
Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor
-
-
Constructor Summary
Constructors Modifier Constructor Description SingleFileEntitiesHarvester(HarvesterConfig iconfig)protectedSingleFileEntitiesHarvester(HarvesterConfig iconfig, DocumentErrorAction parseErrorAction)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidaddDocument(String identifier, long lastModified, Source xml)Adds a document to theHarvester.processorworking in the background.protected voidaddDocument(String identifier, Instant lastModified, Source xml)Adds a document to theHarvester.processorworking in the background.protected voidcancelMissingDocumentDelete()disable the property "deleteMissingDocuments" for this instance.voidclose(boolean cleanShutdown)Closes harvester.protected voidenumerateValidHarvesterPropertyNames(Set<String> props)This method is used by subclasses to enumerate all available harvester properties that are implemented by them.-
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, harvest, isAllIndexes, isClosed, isDocumentOutdated, main, open, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers
-
-
-
-
Constructor Detail
-
SingleFileEntitiesHarvester
public SingleFileEntitiesHarvester(HarvesterConfig iconfig)
-
SingleFileEntitiesHarvester
protected SingleFileEntitiesHarvester(HarvesterConfig iconfig, DocumentErrorAction parseErrorAction)
-
-
Method Detail
-
close
public void close(boolean cleanShutdown) throws ExceptionDescription copied from class:HarvesterCloses harvester. All resources are freed and theHarvester.processoris closed.- Overrides:
closein classHarvester- Parameters:
cleanShutdown- enables writing of status information to the Elasticsearch instance for the next harvesting. If an error occurred during harvesting this should not be done.- Throws:
Exception- if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the correct document.
-
addDocument
protected final void addDocument(String identifier, long lastModified, Source xml) throws Exception
Adds a document to theHarvester.processorworking in the background. If a parsing error occurs the document is handled according toparseErrorAction. It is also added to the valid identifiers (if unseen documents should be deleted).- Parameters:
identifier- is the document's identifier in the indexlastModified- is the last-modification date which is used to calculate the next harvesting start date. If document is older that the last harvesting, it is skipped.xml- is the transformer source of the document,nullto only update document status (lastModified) and adding to valid identifiers- Throws:
Exception- See Also:
Harvester.addDocument(MetadataDocument)
-
addDocument
protected void addDocument(String identifier, Instant lastModified, Source xml) throws Exception
Adds a document to theHarvester.processorworking in the background.- Throws:
Exception- See Also:
addDocument(String,Instant,Source)
-
cancelMissingDocumentDelete
protected void cancelMissingDocumentDelete()
disable the property "deleteMissingDocuments" for this instance. This can be used, when the container (like a ZIP file was not modified), and all containing documents are not enumerated. To prevent deletion of all these documents call this.
-
enumerateValidHarvesterPropertyNames
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Description copied from class:HarvesterThis method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the suppliedSet. The public API for client code requesting property names isHarvester.getValidHarvesterPropertyNames().- Overrides:
enumerateValidHarvesterPropertyNamesin classHarvester- See Also:
Harvester.getValidHarvesterPropertyNames()
-
-