Class ZipFileHarvester
- java.lang.Object
-
- de.pangaea.metadataportal.harvester.Harvester
-
- de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
-
- de.pangaea.metadataportal.harvester.ZipFileHarvester
-
public class ZipFileHarvester extends SingleFileEntitiesHarvester
Harvester for unzipping ZIP files and reading their contents. Identifiers look like: "zip:<identifierPrefix><entryFilename>"This harvester supports the following additional harvester properties:
zipFile: filename or URL of ZIP file to harvestidentifierPrefix: This prefix is appended before all identifiers (that are the identifiers of the documents) (default: "")filenameFilter: regex to match the entry filename (default: none)useZipFileDate: if "yes", check the modification date of the ZIP file and re-harvest in complete; if "no", look at each file in the archive and store its modification date in index. For ZIP files from network connections that seldom change use "yes" as it prevents scanning the ZIP file in complete. "No" is recommended for large local files with much modifications in only some files (default: yes)retryCount: how often retry on HTTP errors? (default: 5)retryAfterSeconds: time between retries in seconds (default: 60)timeoutAfterSeconds: HTTP Timeout for harvesting in seconds
- Author:
- Uwe Schindler
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_RETRY_COUNTstatic intDEFAULT_RETRY_TIMEstatic intDEFAULT_TIMEOUTprotected intretryCountthe retryCount from configurationprotected intretryTimethe retryTime from configurationprotected inttimeoutthe timeout from configurationstatic StringUSER_AGENT-
Fields inherited from class de.pangaea.metadataportal.harvester.Harvester
fromDateReference, harvestCount, HARVESTER_METADATA_FIELD_LAST_HARVESTED, harvestMessageStep, iconfig, log, processor
-
-
Constructor Summary
Constructors Constructor Description ZipFileHarvester(HarvesterConfig iconfig)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidenumerateValidHarvesterPropertyNames(Set<String> props)This method is used by subclasses to enumerate all available harvester properties that are implemented by them.voidharvest()This method is called by the harvester afterHarvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)'ing it.voidopen(ElasticsearchConnection es, String targetIndex)Opens harvester for harvesting documents described by the givenHarvesterConfig.-
Methods inherited from class de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
addDocument, addDocument, cancelMissingDocumentDelete, close
-
Methods inherited from class de.pangaea.metadataportal.harvester.Harvester
addDocument, createMetadataDocumentInstance, deleteDocument, finishReindex, getValidHarvesterPropertyNames, isAllIndexes, isClosed, isDocumentOutdated, main, prepareReindex, runHarvester, runHarvester, setHarvestingDateReference, setValidIdentifiers
-
-
-
-
Field Detail
-
DEFAULT_RETRY_TIME
public static final int DEFAULT_RETRY_TIME
- See Also:
- Constant Field Values
-
DEFAULT_RETRY_COUNT
public static final int DEFAULT_RETRY_COUNT
- See Also:
- Constant Field Values
-
DEFAULT_TIMEOUT
public static final int DEFAULT_TIMEOUT
- See Also:
- Constant Field Values
-
USER_AGENT
public static final String USER_AGENT
-
retryCount
protected final int retryCount
the retryCount from configuration
-
retryTime
protected final int retryTime
the retryTime from configuration
-
timeout
protected final int timeout
the timeout from configuration
-
-
Constructor Detail
-
ZipFileHarvester
public ZipFileHarvester(HarvesterConfig iconfig)
-
-
Method Detail
-
open
public void open(ElasticsearchConnection es, String targetIndex) throws Exception
Description copied from class:HarvesterOpens harvester for harvesting documents described by the givenHarvesterConfig. OpensHarvester.processorfor usage inHarvester.harvest()method.
-
harvest
public void harvest() throws ExceptionDescription copied from class:HarvesterThis method is called by the harvester afterHarvester.open(de.pangaea.metadataportal.processor.ElasticsearchConnection, java.lang.String)'ing it. Overwrite this method in your harvester class. This method should harvest files from somewhere, generateMetadataDocuments and add them withHarvester.addDocument(de.pangaea.metadataportal.processor.MetadataDocument).
-
enumerateValidHarvesterPropertyNames
protected void enumerateValidHarvesterPropertyNames(Set<String> props)
Description copied from class:HarvesterThis method is used by subclasses to enumerate all available harvester properties that are implemented by them. Overwrite this method in your own implementation and append all harvester names to the suppliedSet. The public API for client code requesting property names isHarvester.getValidHarvesterPropertyNames().- Overrides:
enumerateValidHarvesterPropertyNamesin classSingleFileEntitiesHarvester- See Also:
Harvester.getValidHarvesterPropertyNames()
-
-