java.lang.Object
- de.pangaea.metadataportal.harvester.Harvester
- - de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester

Direct Known Subclasses:

DirectoryHarvester, ElasticsearchHarvester, PanFMP1IndexHarvester, PushWrapperHarvester, WebCrawlingHarvester, ZipFileHarvester
```
public abstract class SingleFileEntitiesHarvester
extends Harvester
```
Abstract harvester class for single file entities (like files from web page or from a local directory). The harvester makes it possible to add XML documents given by a Source to the index. These are harvested, but if an fatal parse error occurs, the harvester will then stop harvesting (like it would be with OAI-PMH), ignore the document, or delete it (if existent in index) depending on the harvester property "parseErrorAction".
This panFMP harvester supports the following harvester properties in adidition to the default ones:
- parseErrorAction: What to do if a parse error occurs? Can be STOP, IGNOREDOCUMENT, DELETEDOCUMENT (default is to ignore the document)
- deleteMissingDocuments: remove documents after harvesting that were deleted from source (maybe a heavy operation). (default: true)
Author:

Uwe Schindler

Field Summary
- Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
  fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor

Constructor Summary

Constructors
Modifier	Constructor	Description
	`SingleFileEntitiesHarvester(HarvesterConfig iconfig)`
`protected`	`SingleFileEntitiesHarvester(HarvesterConfig iconfig, DocumentErrorAction parseErrorAction)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected void`	`addDocument(String identifier, long lastModified, Source xml)`	Adds a document to the `Harvester.processor` working in the background.
`protected void`	`addDocument(String identifier, Instant lastModified, Source xml)`	Adds a document to the `Harvester.processor` working in the background.
`protected void`	`cancelMissingDocumentDelete()`	disable the property "deleteMissingDocuments" for this instance.
`void`	`close(boolean cleanShutdown)`	Closes harvester.
`protected void`	`enumerateValidHarvesterPropertyNames(Set<String> props)`	This method is used by subclasses to enumerate all available harvester properties that are implemented by them.

Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, harvest, isAllIndexes, isClosed, isDocumentOutdated, main, open, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - SingleFileEntitiesHarvester
```
public SingleFileEntitiesHarvester(HarvesterConfig iconfig)
```
  - SingleFileEntitiesHarvester
```
protected SingleFileEntitiesHarvester(HarvesterConfig iconfig,
                                      DocumentErrorAction parseErrorAction)
```
- Method Detail
  - close
```
public void close(boolean cleanShutdown)
           throws Exception
```
    Description copied from class: Harvester
    
    Closes harvester. All resources are freed and the Harvester.processor is closed.
    
    Overrides:
    
    close in class Harvester
    
    Parameters:
    
    cleanShutdown - enables writing of status information to the Elasticsearch instance for the next harvesting. If an error occurred during harvesting this should not be done.
    
    Throws:
    
    Exception - if an exception occurs during closing (various types of exceptions can be thrown). Exceptions can be thrown asynchronous and may not affect the correct document.
  - addDocument
```
protected final void addDocument(String identifier,
                                 long lastModified,
                                 Source xml)
                          throws Exception
```
    Adds a document to the Harvester.processor working in the background. If a parsing error occurs the document is handled according to parseErrorAction. It is also added to the valid identifiers (if unseen documents should be deleted).
    
    Parameters:
    
    identifier - is the document's identifier in the index
    
    lastModified - is the last-modification date which is used to calculate the next harvesting start date. If document is older that the last harvesting, it is skipped.
    
    xml - is the transformer source of the document, null to only update document status (lastModified) and adding to valid identifiers
    
    Throws:
    
    Exception
    
    See Also:
    
    Harvester.addDocument(MetadataDocument)
  - addDocument
```
protected void addDocument(String identifier,
                           Instant lastModified,
                           Source xml)
                    throws Exception
```
    Adds a document to the Harvester.processor working in the background.
    
    Throws:
    
    Exception
    
    See Also:
    
    addDocument(String,Instant,Source)
  - cancelMissingDocumentDelete
```
protected void cancelMissingDocumentDelete()
```
    disable the property "deleteMissingDocuments" for this instance. This can be used, when the container (like a ZIP file was not modified), and all containing documents are not enumerated. To prevent deletion of all these documents call this.
  - enumerateValidHarvesterPropertyNames
```
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
```
    Description copied from class: Harvester
    
    This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the supplied Set. The public API for client code requesting property names is Harvester.getValidHarvesterPropertyNames().
    
    Overrides:
    
    enumerateValidHarvesterPropertyNames in class Harvester
    
    See Also:
    
    Harvester.getValidHarvesterPropertyNames()

Class SingleFileEntitiesHarvester

Field Summary

Fields inherited from class de.pangaea.metadataportal.harvester.Harvester

Constructor Summary

Method Summary

Methods inherited from class de.pangaea.metadataportal.harvester.Harvester

Methods inherited from class java.lang.Object

Constructor Detail

SingleFileEntitiesHarvester

SingleFileEntitiesHarvester

Method Detail

close

addDocument

addDocument

cancelMissingDocumentDelete

enumerateValidHarvesterPropertyNames