java.lang.Object
- de.pangaea.metadataportal.harvester.Harvester
- - de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
  - - de.pangaea.metadataportal.harvester.ZipFileHarvester

```
public class ZipFileHarvester
extends SingleFileEntitiesHarvester
```
Harvester for unzipping ZIP files and reading their contents. Identifiers look like: "zip:<identifierPrefix><entryFilename>"
This harvester supports the following additional harvester properties:
- zipFile: filename or URL of ZIP file to harvest
- identifierPrefix: This prefix is appended before all identifiers (that are the identifiers of the documents) (default: "")
- filenameFilter: regex to match the entry filename (default: none)
- useZipFileDate: if "yes", check the modification date of the ZIP file and re-harvest in complete; if "no", look at each file in the archive and store its modification date in index. For ZIP files from network connections that seldom change use "yes" as it prevents scanning the ZIP file in complete. "No" is recommended for large local files with much modifications in only some files (default: yes)
- retryCount: how often retry on HTTP errors? (default: 5)
- retryAfterSeconds: time between retries in seconds (default: 60)
- timeoutAfterSeconds: HTTP Timeout for harvesting in seconds
Author:

Uwe Schindler

Field Summary

Fields
Modifier and Type	Field	Description
`static int`	`DEFAULT_RETRY_COUNT`
`static int`	`DEFAULT_RETRY_TIME`
`static int`	`DEFAULT_TIMEOUT`
`protected int`	`retryCount`	the retryCount from configuration
`protected int`	`retryTime`	the retryTime from configuration
`protected int`	`timeout`	the timeout from configuration
`static String`	`USER_AGENT`

Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor

Constructor Summary

Constructors
Constructor Description

ZipFileHarvester(HarvesterConfig iconfig)

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected void`	`enumerateValidHarvesterPropertyNames(Set<String> props)`	This method is used by subclasses to enumerate all available harvester properties that are implemented by them.
`void`	`harvest()`	This method is called by the harvester after `Harvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)`'ing it.
`void`	`open(ElasticsearchConnection es, String targetIndex)`	Opens harvester for harvesting documents described by the given `HarvesterConfig`.

Methods inherited from class de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
addDocument, addDocument, cancelMissingDocumentDelete, close

Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, isAllIndexes, isClosed, isDocumentOutdated, main, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_RETRY_TIME
```
public static final int DEFAULT_RETRY_TIME
```
    See Also:
    
    Constant Field Values
  - DEFAULT_RETRY_COUNT
```
public static final int DEFAULT_RETRY_COUNT
```
    See Also:
    
    Constant Field Values
  - DEFAULT_TIMEOUT
```
public static final int DEFAULT_TIMEOUT
```
    See Also:
    
    Constant Field Values
  - USER_AGENT
```
public static final String USER_AGENT
```
  - retryCount
```
protected final int retryCount
```
    the retryCount from configuration
  - retryTime
```
protected final int retryTime
```
    the retryTime from configuration
  - timeout
```
protected final int timeout
```
    the timeout from configuration
- Constructor Detail
  - ZipFileHarvester
```
public ZipFileHarvester(HarvesterConfig iconfig)
```
- Method Detail
  - open
```
public void open(ElasticsearchConnection es,
                 String targetIndex)
          throws Exception
```
    Description copied from class: Harvester
    
    Opens harvester for harvesting documents described by the given HarvesterConfig. Opens Harvester.processor for usage in Harvester.harvest() method.
    
    Overrides:
    
    open in class Harvester
    
    Throws:
    
    Exception - if an exception occurs during opening (various types of exceptions can be thrown).
  - harvest
```
public void harvest()
             throws Exception
```
    Description copied from class: Harvester
    
    This method is called by the harvester after Harvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)'ing it. Overwrite this method in your harvester class. This method should harvest files from somewhere, generate MetadataDocuments and add them with Harvester.addDocument(de.pangaea.metadataportal.processor.MetadataDocument).
    
    Specified by:
    
    harvest in class Harvester
    
    Throws:
    
    Exception - of any type.
  - enumerateValidHarvesterPropertyNames
```
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
```
    Description copied from class: Harvester
    
    This method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the supplied Set. The public API for client code requesting property names is Harvester.getValidHarvesterPropertyNames().
    
    Overrides:
    
    enumerateValidHarvesterPropertyNames in class SingleFileEntitiesHarvester
    
    See Also:
    
    Harvester.getValidHarvesterPropertyNames()

Class ZipFileHarvester

Field Summary

Fields inherited from class de.pangaea.metadataportal.harvester.Harvester

Constructor Summary

Method Summary

Methods inherited from class de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester

Methods inherited from class de.pangaea.metadataportal.harvester.Harvester

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_RETRY_TIME

DEFAULT_RETRY_COUNT

DEFAULT_TIMEOUT

USER_AGENT

retryCount

retryTime

timeout

Constructor Detail

ZipFileHarvester

Method Detail

open

harvest

enumerateValidHarvesterPropertyNames