|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.pangaea.metadataportal.harvester.Harvester
de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester
de.pangaea.metadataportal.harvester.ZipFileHarvester
public class ZipFileHarvester
Harvester for unzipping ZIP files and reading their contents. Identifiers look like: "zip:<identifierPrefix><entryFilename>"
This harvester supports the following additional harvester properties:
zipFile: filename or URL of ZIP file to harvestidentifierPrefix: This prefix is appended before all identifiers (that are the identifiers of the documents) (default: "")filenameFilter: regex to match the entry filename (default: none)useZipFileDate: if "yes", check the modification date of the ZIP file and re-harvest in complete;
if "no", look at each file in the archive and store its modification date in index. For ZIP files from network connections that seldom change
use "yes" as it prevents scanning the ZIP file in complete. "No" is recommended for large local files with much modifications in only some files (default: yes)retryCount: how often retry on HTTP errors? (default: 5) retryAfterSeconds: time between retries in seconds (default: 60)timeoutAfterSeconds: HTTP Timeout for harvesting in seconds
| Field Summary | |
|---|---|
static int |
DEFAULT_RETRY_COUNT
|
static int |
DEFAULT_RETRY_TIME
|
static int |
DEFAULT_TIMEOUT
|
protected int |
retryCount
the retryCount from configuration |
protected int |
retryTime
the retryTime from configuration |
protected int |
timeout
the timeout from configuration |
| Fields inherited from class de.pangaea.metadataportal.harvester.Harvester |
|---|
fromDateReference, harvestCount, harvestMessageStep, iconfig, index, log |
| Constructor Summary | |
|---|---|
ZipFileHarvester()
|
|
| Method Summary | |
|---|---|
protected void |
enumerateValidHarvesterPropertyNames(Set<String> props)
This method is used by subclasses to enumerate all available harvester properties that are implemented by them. |
void |
harvest()
This method is called by the harvester after Harvester.open(de.pangaea.metadataportal.config.SingleIndexConfig)'ing it. |
void |
open(SingleIndexConfig iconfig)
Opens harvester for harvesting documents into the index described by the given SingleIndexConfig. |
| Methods inherited from class de.pangaea.metadataportal.harvester.SingleFileEntitiesHarvester |
|---|
addDocument, addDocument, cancelMissingDocumentDelete, close |
| Methods inherited from class de.pangaea.metadataportal.harvester.Harvester |
|---|
addDocument, createMetadataDocumentInstance, getValidHarvesterPropertyNames, isClosed, isDocumentOutdated, isDocumentOutdated, main, runHarvester, runHarvester, setHarvestingDateReference |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int DEFAULT_RETRY_TIME
public static final int DEFAULT_RETRY_COUNT
public static final int DEFAULT_TIMEOUT
protected int retryCount
protected int retryTime
protected int timeout
| Constructor Detail |
|---|
public ZipFileHarvester()
| Method Detail |
|---|
public void open(SingleIndexConfig iconfig)
throws Exception
HarvesterSingleIndexConfig.
Opens Harvester.index for usage in Harvester.harvest() method.
open in class SingleFileEntitiesHarvesterException - if an exception occurs during opening (various types of exceptions can be thrown).
public void harvest()
throws Exception
HarvesterHarvester.open(de.pangaea.metadataportal.config.SingleIndexConfig)'ing it. Overwrite this
method in your harvester class.
This method should harvest files from somewhere, generate MetadataDocuments and add
them with Harvester.addDocument(de.pangaea.metadataportal.harvester.MetadataDocument).
harvest in class HarvesterException - of any type.protected void enumerateValidHarvesterPropertyNames(Set<String> props)
HarvesterSet.
The public API for client code requesting property names is Harvester.getValidHarvesterPropertyNames().
enumerateValidHarvesterPropertyNames in class SingleFileEntitiesHarvesterHarvester.getValidHarvesterPropertyNames()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||