Changelog from development snapshot

This page shows all changes included in the development snapshot (the coming v2.0 trunk). The list is extracted from the SVN commit log:

Revision 766 by thetaphi on 2018-04-19T09:20:12Z:

Update Elasticsearch

Revision 765 by thetaphi on 2018-04-16T13:32:44Z:

Final version of the XALANJ-2419 fix

Revision 764 by thetaphi on 2018-04-15T23:20:26Z:

More fixes for serializer

Revision 763 by thetaphi on 2018-04-15T21:04:01Z:

Add a patched version pof serializer.jar so it gets compatible with supplementary characters. Unfortunately XALANJ had no release including XALANJ-2419 bugfix!

Revision 762 by thetaphi on 2018-03-01T09:56:33Z:

Update Elasticsearch to 5.6.8

Revision 761 by thetaphi on 2018-02-08T17:16:11Z:

Update to Elasticsearch 5.6.7

Revision 760 by thetaphi on 2018-01-22T15:08:53Z:

Update Elasticsearch to 5.6.6

Revision 759 by thetaphi on 2017-12-13T11:32:25Z:

Small simplification

Revision 758 by thetaphi on 2017-12-08T17:58:53Z:

Update Elasticsearch to 5.6.5

Revision 757 by thetaphi on 2017-11-09T13:09:01Z:

Update Elasticsearch

Revision 756 by thetaphi on 2017-10-05T09:43:00Z:

Update Elasticsearch to 5.6.2

Revision 755 by thetaphi on 2017-09-19T11:28:58Z:

Update Elasticsearch to 5.6.1

Revision 754 by thetaphi on 2017-09-15T17:23:16Z:

Update Elasticsearch to 5.6.0

Revision 753 by thetaphi on 2017-08-18T13:47:12Z:

Update Elasticsearch to 5.5.2

Revision 752 by thetaphi on 2017-08-08T07:35:12Z:

Use G1GC by default

Revision 751 by thetaphi on 2017-07-25T16:49:38Z:

Update to Elasticsearch 5.5.1

Revision 750 by thetaphi on 2017-07-07T22:28:00Z:

Update Elasticsearch to 5.5.0

Revision 749 by thetaphi on 2017-06-29T16:59:08Z:

Restore attribute to json conversion by default (after xmlns fix was applied)

Revision 748 by thetaphi on 2017-06-29T15:55:40Z:

Ignore more namespaces during JSON attribute marshalling

Revision 747 by thetaphi on 2017-06-29T15:24:42Z:

Fix attribute handling in XML to JSON converter, disable attribute handling in JSON type by default

Revision 746 by thetaphi on 2017-06-28T09:59:13Z:

Update Elasticsearch to 5.4.3

Revision 745 by thetaphi on 2017-06-21T17:02:53Z:

Update to Elasticsearch 5.4.2

Revision 744 by thetaphi on 2017-06-19T17:11:26Z:

Simplify date parsing

Revision 743 by thetaphi on 2017-06-18T12:22:28Z:

Change ISO year representation to be compatible to SimpleDateFormat

Revision 742 by thetaphi on 2017-06-17T10:21:47Z:

fix NPE in OAI harvester

Revision 741 by thetaphi on 2017-06-17T00:52:41Z:

Fix Javadocs

Revision 740 by thetaphi on 2017-06-17T00:51:28Z:

Update panFMP to use Java 8's java.time classes (mainly instants for datestamps).

Revision 739 by thetaphi on 2017-06-04T12:07:33Z:

Update to Elasticsearch 5.4.1

Revision 738 by thetaphi on 2017-05-04T19:47:35Z:

Cleanup Exception handling

Revision 737 by thetaphi on 2017-05-04T16:50:32Z:

Update Ivy version

Revision 736 by thetaphi on 2017-05-04T16:43:50Z:

Use a lambda here

Revision 735 by thetaphi on 2017-04-28T18:22:13Z:

Update to Elasticsearch 5.3.2

Revision 734 by thetaphi on 2017-04-22T12:34:57Z:

Update to Elasticsearch 5.3.1

Revision 733 by thetaphi on 2017-03-31T14:27:58Z:

Update to Elasticsearch 5.3.0

Revision 732 by thetaphi on 2017-03-28T08:00:36Z:

Add support for boolean datatype

Revision 731 by thetaphi on 2017-03-02T16:41:50Z:

Update Elasticsearch to 5.2.2

Revision 730 by thetaphi on 2017-02-15T11:17:23Z:

Update Elasticsearch to 5.2.1

Revision 729 by thetaphi on 2017-02-15T10:26:51Z:

Update forbiddenapis

Revision 728 by thetaphi on 2017-02-10T13:09:43Z:

Remove useless check

Revision 727 by thetaphi on 2017-02-10T09:41:29Z:

Small API change using helper method

Revision 726 by thetaphi on 2017-02-08T22:09:24Z:

Disable debug logging

Revision 725 by thetaphi on 2017-02-08T18:56:28Z:

Remove obsolete harvester property

Revision 724 by thetaphi on 2017-02-08T18:11:45Z:

Update to Elasticsearch 5.2.0:
- Java 8
- Remove custom delete by query for identifiers
- Update to Log4j v2
- Update configs for new data types
- Use sourceAsMap and filtering instead of stored fields

Revision 722 by thetaphi on 2017-01-14T23:17:33Z:

Update Elasticsearch to 2.4.4

Revision 721 by thetaphi on 2016-12-16T18:50:05Z:

Update ES to 2.4.3

Revision 720 by thetaphi on 2016-11-26T23:26:56Z:

Update ES to 2.4.2

Revision 719 by thetaphi on 2016-09-29T17:22:21Z:

Update Elasticsearch to 2.4.1

Revision 718 by thetaphi on 2016-09-01T17:15:56Z:

Update Elasticsearch to 2.4.0

Revision 717 by thetaphi on 2016-08-04T21:50:57Z:

Update to ES 2.3.5

Revision 716 by thetaphi on 2016-07-10T11:45:09Z:

Update Elasticsearch & forbiddenapis

Revision 715 by thetaphi on 2016-06-29T18:26:58Z:

Add hook for rebuilding

Revision 714 by thetaphi on 2016-06-29T17:33:57Z:

Fix bug when adding multiple kv-pairs

Revision 713 by thetaphi on 2016-05-18T23:03:58Z:

Elasticsearch version 2.3.3

Revision 712 by thetaphi on 2016-04-26T14:05:23Z:

Update ES to 2.3.2

Revision 711 by thetaphi on 2016-04-05T07:43:28Z:

Update ES to 2.3.1

Revision 710 by thetaphi on 2016-03-30T21:42:44Z:

Update ES to 2.3.0

Revision 709 by thetaphi on 2016-03-15T23:21:18Z:

Update ES to 2.2.1

Revision 708 by thetaphi on 2016-02-03T10:12:44Z:

Update Elasticsearch to 2.2.0

Revision 707 by thetaphi on 2016-01-12T10:01:40Z:

Fix missing property check

Revision 706 by thetaphi on 2015-12-20T23:54:25Z:

Fix typo

Revision 705 by thetaphi on 2015-12-20T23:47:27Z:

Refactoring, hide privat set

Revision 704 by thetaphi on 2015-12-20T23:34:13Z:

On full OAI harvesting by default track valid identifiers, too

Revision 703 by thetaphi on 2015-12-20T22:16:04Z:

Log and fail on bulk execution errors

Revision 702 by thetaphi on 2015-12-20T19:10:49Z:

Improve logging and make shutdown of DocumentProcessor more safe

Revision 701 by thetaphi on 2015-12-20T18:37:23Z:

Minor cleanups

Revision 700 by thetaphi on 2015-12-19T10:44:30Z:

Update Elasticsearch to 2.1.1

Revision 699 by thetaphi on 2015-12-02T18:31:20Z:

Wait infinite for uncompleted bulk requests (bug in ES)

Revision 698 by thetaphi on 2015-12-02T17:12:35Z:

Improve thread pool rejection (don't ignore executions after shutdown)

Revision 697 by thetaphi on 2015-11-25T14:53:20Z:

Remove scrolling

Revision 696 by thetaphi on 2015-11-24T22:23:55Z:

Update Elasticsearch to 2.1.0

Revision 694 by thetaphi on 2015-10-30T22:45:42Z:

Switch dev version to 2.1

Revision 693 by thetaphi on 2015-10-30T22:45:14Z:

Update to Elasticsearch 2.0

Revision 689 by thetaphi on 2015-10-15T21:05:12Z:

Update elasticsearch

Revision 687 by thetaphi on 2015-10-14T15:53:45Z:

Update forbidden-apis to 2.0

Revision 684 by thetaphi on 2015-09-16T10:25:51Z:

Update Elasticsearch to 1.7.2

Revision 677 by thetaphi on 2015-08-01T09:11:50Z:

Update Elasticsearch to 1.7.1

Revision 676 by thetaphi on 2015-07-25T22:51:27Z:

Update Elasticsearch to 1.7.0

Revision 675 by thetaphi on 2015-06-11T10:13:26Z:

Update elasticsearch version

Revision 674 by thetaphi on 2015-05-27T11:50:19Z:

Define STring output encoding as UTF-16 to work around surrogate bug in XALAN/XERCES serializer.jar

Revision 673 by thetaphi on 2015-05-16T15:20:28Z:

Cleanups, also make relative file URIs by resolving using URI class (makes encoding + forward slashes)

Revision 672 by thetaphi on 2015-05-16T14:27:01Z:

fix typo in attribute

Revision 671 by thetaphi on 2015-05-16T14:24:33Z:

Add* signatures to forbidden-apis

Revision 670 by thetaphi on 2015-05-16T13:50:02Z:

Migrate to NIO.2

Revision 669 by thetaphi on 2015-05-15T17:43:33Z:

Fix previous commit

Revision 668 by thetaphi on 2015-05-15T17:36:42Z:

Print processed document status only when pool was running

Revision 667 by thetaphi on 2015-05-15T16:39:54Z:

Add a method to delete documents in Harvester subclasses

Revision 666 by thetaphi on 2015-05-15T13:44:15Z:

Remove non-bulk processDocument in favour of accessor to get Elasticsearch ActionRequest

Revision 665 by thetaphi on 2015-05-15T11:31:11Z:

Remove useless logging in direct requests

Revision 664 by thetaphi on 2015-05-15T10:50:49Z:

Remove useless flush call

Revision 663 by thetaphi on 2015-05-15T10:22:44Z:

Use actionGet() instead Future.get()

Revision 662 by thetaphi on 2015-05-15T10:10:40Z:

Cleanups with indexing and RequestBuilders

Revision 661 by thetaphi on 2015-05-14T17:46:28Z:

Rewrite DocumentProcessor:
- no document queues anymore, just use a fixed thread pool and a bounded task queue
- use BulkProcessor
- add configuration to allow sending multiple bulk requests to Elasticsearch

Revision 660 by thetaphi on 2015-05-14T12:32:01Z:

Fix forbidden-apis

Revision 659 by thetaphi on 2015-05-14T12:31:10Z:

Fix cleanup bug

Revision 658 by thetaphi on 2015-05-14T12:30:17Z:

First version using Elasticsearch's BulkProcessor

Revision 657 by thetaphi on 2015-05-14T10:54:42Z:


Revision 656 by thetaphi on 2015-05-13T22:50:02Z:

On close, if something failed, clean up the mdocBuffer, so we can for sure enqueue the final EOF docs

Revision 655 by thetaphi on 2015-05-13T22:21:09Z:

Change shutdown algorithm

Revision 654 by thetaphi on 2015-05-13T21:58:03Z:

Don't use submit if no Future is needed

Revision 653 by thetaphi on 2015-05-13T17:40:11Z:

Use an ExecutorService as threadpool

Revision 652 by thetaphi on 2015-05-13T16:39:09Z:

Some refactoring, also make thread counting safe

Revision 651 by thetaphi on 2015-05-13T13:43:19Z:

Remove no longer used

Revision 650 by thetaphi on 2015-05-08T12:39:08Z:

Make content type for _source field configureable, switch to CBOR instead of JSON (by default)

Revision 649 by thetaphi on 2015-04-28T23:53:40Z:

Update Elasticsearch to 1.5.2 and forbiddenapis to 1.8

Revision 648 by thetaphi on 2015-04-10T09:46:09Z:

Update Elasticsearch to 1.5.1

Revision 647 by thetaphi on 2015-03-24T12:06:56Z:

Rewrite delete by query to do a scanning search, that issues a bulk delete for each found item. This removed bugs in clouds, where shards may get unconsistent results (see

Revision 646 by thetaphi on 2015-03-24T12:04:52Z:

Update to ElasticSearch 1.5. Fix one deprecation. TODO: deleteByQuery is also deprecated

Revision 645 by thetaphi on 2015-02-20T16:52:29Z:

Update elasticsearch to 1.4.4

Revision 644 by thetaphi on 2015-02-12T21:45:27Z:

Update Elasticsearch

Revision 643 by thetaphi on 2015-01-29T10:32:06Z:

Simplify thread-local SimpleCookieHandler (use Java 6+ impl, per thread)

Revision 642 by thetaphi on 2015-01-01T16:55:58Z:

Prevent creation of synthetic accessor methods (by using pkg-private access modifier)

Revision 641 by thetaphi on 2014-12-28T13:58:18Z:

Rename rule to have better suited name

Revision 640 by thetaphi on 2014-12-28T13:56:13Z:

Whitespace fix

Revision 639 by thetaphi on 2014-12-28T13:55:47Z:

Fix documentation

Revision 638 by thetaphi on 2014-12-28T13:47:59Z:

Fix bug with settings and their prefix. Add new Digester rule to only populate with element name (instead whole path), so no filtering is needed

Revision 637 by thetaphi on 2014-12-28T12:55:55Z:

Allow index settings parsed from file

Revision 636 by thetaphi on 2014-12-17T09:07:07Z:

Update ES

Revision 635 by thetaphi on 2014-11-28T17:29:04Z:

add support for plugins

Revision 634 by thetaphi on 2014-11-28T16:49:53Z:

update elasticsearch

Revision 633 by thetaphi on 2014-11-28T13:14:21Z:

Update forbidden apis again

Revision 632 by thetaphi on 2014-11-28T13:12:00Z:

Update forbidden APIS

Revision 631 by thetaphi on 2014-11-16T17:41:15Z:

Make scroll requests more safe

Revision 630 by thetaphi on 2014-11-12T20:29:05Z:

Fix bug in serializing KeyValuePairs: Single items of type KeyValuePairs were not handled correctly

Revision 629 by thetaphi on 2014-11-05T18:45:57Z:

Update ES

Revision 628 by thetaphi on 2014-10-27T18:24:00Z:

Allow to set a maximum bulk size (in number of bytes of the JSON document sources)

Revision 627 by thetaphi on 2014-10-15T01:00:36Z:

Use JAXB to unmarshal XML->JSON nodes (to preserve type)

Revision 626 by thetaphi on 2014-10-14T22:51:42Z:

Add support for INTEGER values (mapped internally to Long). This allows safety when rounding might occur.

Revision 625 by thetaphi on 2014-10-08T20:49:21Z:

Fix XHTML bug

Revision 624 by thetaphi on 2014-10-07T12:56:49Z:

Don't record identifiers from deleted documents in OAIStaticRepository. Don't use valid identifiers if no change

Revision 623 by thetaphi on 2014-10-07T12:52:44Z:

Allow to ignore datestamps in OAI-PMH. If datestamps are ignroed, it never does incremental harvesting. This allows to do a complete re-harvest.

Revision 622 by thetaphi on 2014-10-07T10:35:53Z:

Don't resolve URLs / dirs in ctor

Revision 621 by thetaphi on 2014-10-07T09:45:59Z:

Make config parsing in harvesters up to ctor, make fields final; add identifierPrefix for OAI

Revision 620 by thetaphi on 2014-10-01T14:18:52Z:

Update Elasticsearch

Revision 619 by thetaphi on 2014-09-30T21:52:48Z:

formatting cleanup

Revision 618 by thetaphi on 2014-09-30T19:44:03Z:

Allow more Exceptions on addDocument()

Revision 617 by thetaphi on 2014-09-30T14:54:25Z:

Update count on direct pushing of docs

Revision 616 by thetaphi on 2014-09-30T13:36:19Z:

Small update in logging

Revision 615 by thetaphi on 2014-09-29T18:12:40Z:

Update Elasticsearch

Revision 614 by thetaphi on 2014-09-11T17:38:34Z:

Fix bug with xml fields

Revision 613 by thetaphi on 2014-09-11T10:28:47Z:

Fix warnings

Revision 612 by thetaphi on 2014-08-27T15:45:01Z:

Update nekohtml

Revision 610 by thetaphi on 2014-08-26T14:04:57Z:

Fix mapping of "src" property

Revision 609 by thetaphi on 2014-08-26T13:41:33Z:

Allow src= also for <field-template/>

Revision 608 by thetaphi on 2014-08-14T08:23:38Z:

Update XALAN, finally :-)

Revision 607 by thetaphi on 2014-08-14T07:52:06Z:

Update ES

Revision 606 by thetaphi on 2014-07-28T22:16:18Z:

Update ES.

Revision 605 by thetaphi on 2014-07-24T21:26:36Z:

Update to ES 1.3.0

Revision 604 by thetaphi on 2014-07-09T20:34:09Z:

Update Elasticsearch

Revision 603 by thetaphi on 2014-06-19T16:39:47Z:

split properties for tools, reformat build.xml

Revision 602 by thetaphi on 2014-06-19T09:40:38Z:

change constant

Revision 601 by thetaphi on 2014-06-18T17:31:37Z:

After createIndex wait for cluster to get yellow.

Revision 600 by thetaphi on 2014-06-17T23:17:21Z:

Small fixes. Initially create index with right realName, not alternate. Cleanup Exceptions.

Revision 599 by thetaphi on 2014-06-16T18:02:58Z:

Add support to remove aliases, not found in config

Revision 598 by thetaphi on 2014-06-16T17:17:02Z:

Add support for aliases

Revision 597 by thetaphi on 2014-06-16T14:52:32Z:

Fix typo

Revision 596 by thetaphi on 2014-06-16T14:41:31Z:

Make configs more immutable, remove null map

Revision 595 by thetaphi on 2014-06-16T14:26:52Z:

Move to Java 7 diamond

Revision 594 by thetaphi on 2014-06-16T14:18:04Z:

Code cleanup by Eclipse

Revision 593 by thetaphi on 2014-06-16T14:13:14Z:

Factor out code to get aliased index

Revision 592 by thetaphi on 2014-06-16T13:44:11Z:


Revision 591 by thetaphi on 2014-06-16T12:35:31Z:

direct delegation to correct method

Revision 590 by thetaphi on 2014-06-16T12:32:44Z:

Small refactoring

Revision 589 by thetaphi on 2014-06-16T11:30:08Z:

efactor config, move Elasticsearch stuff out of TargetIndexConfig

Revision 588 by thetaphi on 2014-06-16T09:46:48Z:

More UTF-8 changes

Revision 587 by thetaphi on 2014-06-16T09:44:20Z:

Use UTF8 in XML.

Revision 586 by thetaphi on 2014-06-16T09:43:30Z:

Revert DIF schema removal. Add DIF schema as example file locally (license???)

Revision 585 by thetaphi on 2014-06-16T08:51:57Z:

Support for index settings and configuration of alternate names.

Revision 584 by thetaphi on 2014-06-15T23:32:02Z:

Cleanup formatting

Revision 583 by thetaphi on 2014-06-15T23:21:40Z:

Cleanup formatting

Revision 582 by thetaphi on 2014-06-15T23:02:10Z:

Merge mappings before sending to ES, send mapping with index create

Revision 581 by thetaphi on 2014-06-15T16:07:48Z:

Fix NPE when upgrading from older versions

Revision 580 by thetaphi on 2014-06-15T15:43:34Z:

Only allow targetIndex ids for rebuilder (we cannot rebuild parts of an elasticsearch index, would cause data loss)

Revision 579 by thetaphi on 2014-06-15T10:22:49Z:

Refactor harvester metadata (for now always save it, if empty, too)

Revision 578 by thetaphi on 2014-06-15T09:23:05Z:

Remove the waiting for clusterstate, for now just leave flush to work agains ES bug. Also change the metadata to be non-stored, but with _source enabled

Revision 577 by thetaphi on 2014-06-14T23:08:41Z:

Add some cluster state checks... TODO: Investigate

Revision 576 by thetaphi on 2014-06-14T21:52:06Z:

Refactor target index creation out of DocumentProcessor and make rebuilder create new index. This is done by automatically creating aliases.

Revision 575 by thetaphi on 2014-06-13T18:34:37Z:

improve default mapping

Revision 574 by thetaphi on 2014-06-03T22:28:24Z:

Update Elasticsearch

Revision 573 by thetaphi on 2014-05-22T18:08:17Z:

Update to Elasticsearch 1.2.0

Revision 572 by thetaphi on 2014-05-10T17:44:08Z:

Fix logging directory

Revision 571 by thetaphi on 2014-04-30T13:10:42Z:

Change log message

Revision 570 by thetaphi on 2014-04-29T23:10:48Z:

Simplify the default mapping with dynamic fields. Disable "_all". Also put provided mapping before defining internal types.

Revision 569 by thetaphi on 2014-04-29T22:35:30Z:

Remove "include_in_all" from default mapping

Revision 568 by thetaphi on 2014-04-29T14:52:08Z:

Rename class and improve Javadocs

Revision 567 by thetaphi on 2014-04-29T12:03:12Z:

Use a KeyValuePairs object before building JSON to handle duplicate field names. This also improves the XML converter.

Revision 566 by thetaphi on 2014-04-24T15:13:15Z:

Add forbidden-api checks

Revision 565 by thetaphi on 2014-04-24T15:00:25Z:

Some charset cleanup

Revision 564 by thetaphi on 2014-04-24T14:52:20Z:

Fix some smaller problems

Revision 563 by thetaphi on 2014-04-24T13:35:40Z:

fix typo

Revision 562 by thetaphi on 2014-04-24T10:26:42Z:

Update scripts to actually work with the binary distribution

Revision 561 by thetaphi on 2014-04-23T18:42:03Z:

Cleanup line endings #2

Revision 560 by thetaphi on 2014-04-23T18:40:09Z:

Cleanup line endings

Revision 559 by thetaphi on 2014-04-23T18:36:31Z:

Remove old scripts, update other ones (preliminary, does not yet work)

Revision 558 by thetaphi on 2014-04-23T18:33:03Z:

Remove useless example folder

Revision 556 by thetaphi on 2014-04-23T15:12:24Z:

Merge ES branch into trunk

Revision 552 by thetaphi on 2014-04-23T14:36:56Z:

add release signing

Revision 449 by thetaphi on 2013-06-27T12:10:18Z:

Update of javadocs patch macro

Revision 448 by thetaphi on 2013-06-26T18:54:49Z:

Fix javadocs frame injection bug, set encoding of source files and javadocs

Revision 447 by thetaphi on 2013-04-15T17:14:13Z:

Move 1.1 branch back to trunk

Revision 445 by thetaphi on 2013-04-15T17:07:33Z:

test commit

Revision 444 by thetaphi on 2013-01-29T13:51:40Z:


Revision 443 by thetaphi on 2012-12-25T13:00:52Z:

Update to Lucene 3.6.2

Revision 442 by thetaphi on 2012-07-21T20:42:23Z:

Update to Lucene 3.6.1

Revision 441 by thetaphi on 2012-07-17T08:16:00Z:

Fix filtering of sets when OAI reporitory does not report sets assigned to metadata. Set filtering is only needed for static repositories and if harvesting more than one set for network repositories.

Revision 440 by thetaphi on 2012-04-13T21:59:02Z:

Lucene 3.6.0 final version update (from Snapshot used before). This will be the last Lucene 3.x version.

Revision 439 by thetaphi on 2012-04-03T07:13:41Z:

Update Lucene snapshot

Revision 438 by thetaphi on 2012-02-18T01:28:58Z:

Separate document boost from norms. The change is backwards compatible, but it's still recommended to reindex (./ to make use of absolute boosts (also works with numeric queries)

Revision 437 by thetaphi on 2012-02-17T18:46:54Z:

Update to 3.x branch and "remove" my own deprecations :-)

Revision 436 by thetaphi on 2011-11-25T23:55:51Z:

Upgrade to Lucene Core 3.5.0

Revision 435 by thetaphi on 2011-09-15T07:13:00Z:

Upgrade to Lucene 3.4.0

Revision 434 by thetaphi on 2011-09-01T22:56:57Z:

update nekohtml to 1.9.15

Revision 433 by thetaphi on 2011-07-14T14:46:04Z:

Remove deprecations, upgrade NumericField usage (with index backwards compatibility)

Revision 432 by thetaphi on 2011-07-01T06:53:03Z:

Upgrade to Lucene 3.3.0

Revision 431 by thetaphi on 2011-06-03T15:42:02Z:

Upgrade Lucene to version 3.2.0

Revision 430 by thetaphi on 2011-03-30T17:28:08Z:

Upgrade Lucene to 3.1.0. Deprecation warnings will be fixed later!

Revision 427 by thetaphi on 2011-02-01T14:12:15Z:

fix TODO.txt

Revision 424 by thetaphi on 2011-02-01T14:05:09Z:

update Lucene's javadocs location

Revision 422 by thetaphi on 2011-02-01T13:36:14Z:

Change version information for 1.1 branch

Revision 421 by thetaphi on 2011-02-01T13:33:13Z:

Create branch for 1.1 before the new Solr-based 2.0 development starts

Revision 419 by thetaphi on 2011-01-31T18:54:04Z:

update xerces to 2.11.0

Revision 418 by thetaphi on 2010-12-09T13:49:55Z:

update jetty and sfl4j

Revision 417 by thetaphi on 2010-12-06T11:07:00Z:

Update Lucene to 3.0.3

Revision 416 by thetaphi on 2010-07-15T02:54:07Z:

update jetty

Revision 415 by thetaphi on 2010-06-21T23:20:43Z:

add XERCES 2.10.0

Revision 414 by thetaphi on 2010-06-18T16:17:27Z:

Update Lucene to 3.0.2 (released today)

Revision 412 by thetaphi on 2010-05-17T17:24:02Z:

fix lots of upper/lower case problems with default locale; update slf4j

Revision 409 by thetaphi on 2010-03-02T15:02:36Z:

update lucene to 3.0.1

Revision 408 by thetaphi on 2010-02-09T10:22:17Z:

initial version with warmer

Revision 407 by thetaphi on 2009-11-27T13:49:33Z:

Add support for indexVersionCompatibility (default is Version.LUCENE_24 for backwards compatibility).

Revision 406 by thetaphi on 2009-11-27T11:49:21Z:

First version for Lucene 3.0, that is still backwards compatible (it uses Version.LUCENE_24 for analyzers and query parsers).
The support for compressed fields is preserved, but the index format for that changes. It is recommended to rebuild the index, because already compressed fields may suddenly get bigger before reindexed again (see Lucene 3.0 release notes).
Later commits will have support for configuring the version number of analyzers and query parsers, until then its fixed to LUCENE_24 for BW compatibility.

Revision 405 by thetaphi on 2009-11-23T08:48:51Z:

Change version number for trunk

Revision 403 by thetaphi on 2009-11-22T22:09:11Z:

Only use Lucene Core documentation

Revision 400 by thetaphi on 2009-11-22T16:28:25Z:

update some libs before release of 1.0

Revision 399 by thetaphi on 2009-11-21T16:08:58Z:

add support for md5 and sha1 checksums during package build

Revision 398 by thetaphi on 2009-11-08T19:10:47Z:

Upgrade to Lucene 2.9.1

Revision 397 by thetaphi on 2009-10-29T12:29:32Z:

fix javadoc generation

Revision 396 by thetaphi on 2009-10-26T17:22:22Z:

fix typo and dead code

Revision 395 by thetaphi on 2009-10-26T17:18:10Z:

Auto do maxScore/Score for sorted results

Revision 394 by thetaphi on 2009-09-24T21:25:19Z:

Update to final version of Lucene 2.9.0 - ready to publish panFMP version 1.0!

Revision 393 by thetaphi on 2009-09-19T00:03:48Z:

Update to Lucene 2.9.0-RC5

Revision 392 by thetaphi on 2009-09-14T16:03:43Z:

upgrade jetty to 6.1.20

Revision 391 by thetaphi on 2009-09-13T15:03:25Z:

- Update Lucene to 2.9-RC4
- Ignore one deprecation warning (Field.STORE.COMPRESS related)

Revision 390 by thetaphi on 2009-09-09T17:01:08Z:

Update to Lucene 2.9 RC3

Revision 389 by thetaphi on 2009-09-01T13:23:23Z:

remove default stop-word list

Revision 387 by thetaphi on 2009-08-28T21:06:51Z:

Update Lucene to 2.9.0-rc2

Revision 386 by thetaphi on 2009-08-28T08:14:52Z:

Update Lucene to 2.9.0-rc1

Revision 385 by thetaphi on 2009-08-26T06:07:20Z:

Update Lucene to latest trunk (2009-08-26).

Revision 384 by thetaphi on 2009-08-15T06:48:14Z:

Update Lucene to latest trunk.

Revision 383 by thetaphi on 2009-08-12T11:54:33Z:

Update XML file URL for COPEPOD example

Revision 382 by thetaphi on 2009-08-12T11:49:22Z:

Add support for additional XSL params in index configuration. They can be passed as attributes to <cfg:transform/>

Revision 381 by thetaphi on 2009-08-12T08:09:46Z:

Do not fail on invalid cookies, just print warning and ignore.

Revision 380 by thetaphi on 2009-08-11T22:10:42Z:

Add basic support for Cookies in OAI-/WebCrawlingHarvester. This is needed for GeoNetworkOpenSource (which sets a session ID needed later). Cookies are only recorded/enabled for the running thread and only when running affected harvesters.

Revision 379 by thetaphi on 2009-08-11T08:37:32Z:

- Update Lucene to latest Hudson trunk 2.9 build (the old query parser produces now a lot of deprecation warnings, but it is not yet sure if it gets really deprecated - I will fix this, as soon as Lucene 2.9 is released). Further work may be the move to the new QueryParser currently staying in Lucene Contrib
- Update Jetty
- Update Nekohtml

Revision 378 by thetaphi on 2009-07-16T06:26:28Z:

- Use allowDocsOutOfOrder=false for collectors and remove the usage of SorterTemplate.
- Fixes a compiler warning with the new harvester.
- Update lucene-core.jar to the current trunk version.

Revision 377 by thetaphi on 2009-07-13T12:08:48Z:

Add a new harvester, that harvests foreign panFMP indexes (from another installation).
The foreign indexes can use another XML schema and field structure, because a mapping
can be done using XSLT (as with other harvesters). It is also possible to only harvest
a subset of documents by specifying a query string. QueryParsers and Analyzers of the source index be specified for that.

Revision 376 by thetaphi on 2009-07-02T06:57:55Z:

update Lucene to latest trunk (fix some bugs)

Revision 375 by thetaphi on 2009-06-29T08:21:45Z:

Update Jetty to 6.1.18

Revision 374 by thetaphi on 2009-06-24T10:28:37Z:

JavaDoc update in DateRangeQuery

Revision 373 by thetaphi on 2009-06-24T10:26:13Z:

Documentation updates #2

Revision 372 by thetaphi on 2009-06-24T10:24:26Z:

Documentation updates

Revision 371 by thetaphi on 2009-06-24T08:35:53Z:

Add missing @Override

Revision 370 by thetaphi on 2009-06-24T08:01:26Z:

New Lucene Trunk Version, TrieRangeQuery is now in Lucene-Core (with new name, so contrib-queries is no longer needed). This commit also respects other Lucene API improvements/changes (Collector).

Revision 369 by thetaphi on 2009-06-02T09:26:07Z:

- new Lucene JARs
- Changes to directory implementations. It now supports AUTO to choose NIO on all platforms excluding windows

Revision 368 by thetaphi on 2009-05-29T16:58:41Z:

Use NativeFSLockFactory, which has no problems with local filesystems.
May fail with NFS filesystems, which should not be used for Lucene.

Revision 367 by thetaphi on 2009-05-06T15:36:31Z:

nicer toString() for dates in TrieRangeQuery

Revision 366 by thetaphi on 2009-04-25T21:58:38Z:

Remove usage of HitCollector and replace by Collector (new Lucene API)

Revision 365 by thetaphi on 2009-04-24T07:47:19Z:

update lucene JARs

Revision 364 by thetaphi on 2009-04-17T07:28:43Z:

update Lucene, some new deprecations appeared, must be fixed (Collector, COMPRESS)

Revision 363 by thetaphi on 2009-04-11T20:18:09Z:

New Lucene TrieRange version (updated to Lucene Trunk 2009-04-10):
The internal encoding of numeric and date fields changed in index again.
You need to rebuild indexes using
If you do not do this, range queries will return no or only few results.

Revision 362 by thetaphi on 2009-03-31T12:41:15Z:

fix NPE in rebuilder

Revision 361 by thetaphi on 2009-03-20T23:10:13Z:

fix typo

Revision 360 by thetaphi on 2009-03-15T17:18:13Z:

update to Jetty 6.1.15

Revision 359 by thetaphi on 2009-03-10T19:29:49Z:

Automatically optimize index after rebuild (even if config does not enable auto-optimize)

Revision 358 by thetaphi on 2009-02-23T07:47:43Z:

update lucene jars to latest snapshot

Revision 357 by thetaphi on 2009-02-14T10:56:31Z:

Again an update, that changes index encoding of numeric values. To use indexes created before this update, you have to reindex them (using the script). Datestamp metadata may get lost, but this is no problem.
New features:
- Update to snapshot build of Lucene 2.9, that has a completely reimplemented trie package. This change is not backwards compatible, because of that, you need to rebuild.
- Change in config.xml: property numericalTrieImplamentation renamed to triePrecisionStep, the new variable contains the step in the bit precision when generating trie encoded numeric values. Default is 8 (as before, which was "8bit"). Now every number between 1 and 64 is possible, lower values create bigger indexes, but faster queries (see javadoc).

Revision 356 by thetaphi on 2009-02-05T11:08:03Z:

add a static main() method to LenientDateParser for testing.

Revision 355 by thetaphi on 2009-02-02T13:17:59Z:

Improved sorting in LuceneHitCollector (uses new SorterTemplate utility from Lucene 2.9 to sort two arrays in parallel).

Revision 354 by thetaphi on 2009-02-01T12:42:57Z:

Link trie package from Lucene in a better way in JavaDocs (makes update of final URL with build.xml easier)

Revision 353 by thetaphi on 2009-01-31T22:53:22Z:

link Lucene's Hudson Javadocs

Revision 352 by thetaphi on 2009-01-31T22:29:55Z:

Improve Javadocs

Revision 351 by thetaphi on 2009-01-29T09:38:25Z:

Update Lucene JARs to Hudson nightly build:
- new TrieRangeQuery version
- optimized index reopen when sorting enabled on queries
Other updates:
- FSDirectory now configureable in config.xml
- some improvements in config parser

Revision 350 by thetaphi on 2009-01-24T00:42:53Z:

- extra check for score
- no synchronization needed, as no MT search anymore

Revision 349 by thetaphi on 2009-01-15T08:37:32Z:

Do not use norms and tf for string and numeric/datetime fields. To support this for numeric and datetime fields, lucene-queries.jar is also updated.

Revision 348 by thetaphi on 2009-01-13T00:50:45Z:

Update some libs:
- Digester to 2.0, now for Java 1.5, small changes in code for that
- nekohtml to 1.9.11
Remove libs:
- commons-collections-3.2.1.jar
Small fixes in config etc.

Revision 347 by thetaphi on 2009-01-12T08:46:43Z:

Update Lucene to development snapshot of 2009-01-12 (includes efficient sortable numeric/datetime fields and TrieRangeQuery optimization)

Revision 346 by thetaphi on 2009-01-11T11:56:27Z:

- remove copyright year from source file headers
- change year to 2009 in documentation and build files

Revision 345 by thetaphi on 2008-12-05T09:27:22Z:

replace lucene-queries-2.9-dev.jar by hudson version

Revision 344 by thetaphi on 2008-12-04T15:27:15Z:

fix compile error in example.

Revision 343 by thetaphi on 2008-12-04T14:09:11Z:

!!! WARNING !!! Backwards incompatible change!
TrieRangeQuery was given to Apache Lucene as a contrib package (see During the move to there, it was optimized, the trie encoding was changed to be compacter and you have the possibility to tune search speed by using more indexed precisions (using more disk space).
This patch removes TrieRangeQuery and TrieUtils, adds a new dependency to the not yet released version of lucene-queries.jar contrib (version 2.9-dev, built locally/by Apache's Hudson).
The backwards incompatible change is the use of a new trie encoding in the index. Indexes created with earlier versions of panFMP are not working anymore, when used with numeric/datetime fields. To make them work again, you can reharvest them after dropping or use the index rebuilder ( If you rebuild the index, datestamps of the metadata get lost (it will print out a warning for each document). As the metadata datestamp is not used anywhere in panFMP, this not a problem.
If you do not rebuild/reharvest indexes, you will get spurious NumberFormatExceptions.
Other fixes:
- This patch also fixes sorting of numeric fields (now possible again). A further patch/issue (not yet done) for Lucene will do sorting not string-based on the encoded trie values, but use a more memory effective FieldCache of longs.
- Throw correct exceptions in SearchService (copy/paste error)

Revision 342 by thetaphi on 2008-11-29T22:39:15Z:

upgrade jetty to 6.1.14

Revision 341 by thetaphi on 2008-11-21T13:21:42Z:

rename Hash to UUID in axis webservice

Revision 340 by thetaphi on 2008-11-21T11:55:09Z:

Add more documentation in comments to the default config file

Revision 339 by thetaphi on 2008-11-21T10:37:42Z:

Incompatible change in SearchService API:
- storeQuery() now returns a UUID instead of a String. You may need to change your code, see documentation.
- readStoredQuery uses UUIDs, too.

Revision 338 by thetaphi on 2008-11-20T14:41:34Z:

Show the <cfg:transform src="..."/> element in example config.xml

Revision 337 by thetaphi on 2008-11-16T22:48:50Z:

ExtendedDigester changes:
- Remove usage of a Stack in favor of a linked List for the namespaces
- Refactor replaying of prefix mappings for SaxRule
- New access methods, unneeded ones removed

Revision 336 by thetaphi on 2008-11-10T17:33:23Z:

some cleanup with sax parser for differentiating between xinclude (Config) and not-xinclude (elsewhere)

Revision 335 by thetaphi on 2008-11-09T22:16:57Z:

Use new parent Config in SingleIndexConfig constructor to initialize harvester properties without the special class InheritedProperties (removed). Do the index check at end of config loading.

Revision 334 by thetaphi on 2008-11-09T17:21:36Z:

- add some more final declarations
- reset the namespace map in digester on clear() and startDocument() to have always a clear document start without unneeded prefixes to be declared

Revision 333 by thetaphi on 2008-11-09T14:02:59Z:

- Rename <cfg:transform/> "href" attribute to "src"
- Remove unneeded import

Revision 332 by thetaphi on 2008-11-07T15:56:09Z:

make some fields in configuration final

Revision 331 by thetaphi on 2008-11-07T14:56:36Z:

Possibility to set the XSL template in index configuration by a simple href attribute
or by including the template as before.
Both possibilities are supported, short templates may be directly included into the config
document or given by filename, which is optimized, when you always have to set the same
template, which is cached.
There was also some refactoring, the parent element of IndexConfig is directly
set in constructor.

Revision 330 by thetaphi on 2008-11-07T08:39:11Z:

- Centralize & uniform trimming of harvester properties and search properties
- Some cleanups in Config code

Revision 329 by thetaphi on 2008-11-07T01:07:49Z:

remove unneeded Axis Ant file

Revision 328 by thetaphi on 2008-11-07T00:03:06Z:

Update to Jetty 6.1.12

Revision 327 by thetaphi on 2008-11-06T20:02:44Z:

Fix NPE in Axis Webservice

Revision 326 by thetaphi on 2008-11-04T20:09:45Z:

- replace empty datestamp variable by ""
- some checks added
- optimized and unmodifiable set/map constants

Revision 325 by thetaphi on 2008-11-04T12:32:07Z:

Fix bug with empty datestamp and rewrite variable registration for XMLConverter

Revision 324 by thetaphi on 2008-11-04T11:58:07Z:

- refactoring the XMLConverter
- new methods for checking last modified datestamp
- factory for MetadataDocument inside Harvester.

Revision 323 by thetaphi on 2008-11-04T00:26:46Z:

- Missing index builder variables in TransformerHandler
- add datestamp variable
- use the final identifier as identifier, not the source systemId

Revision 322 by thetaphi on 2008-11-03T18:02:44Z:

New feature: Set index builder variables also in XSL for transforming metadata.
Currently this works for all ib:-variables, but not the date stamp.

Revision 321 by thetaphi on 2008-11-03T15:58:45Z:

wrong variable in initializer -> NPE

Revision 320 by thetaphi on 2008-10-30T22:50:29Z:

add missing formats

Revision 319 by thetaphi on 2008-10-30T17:44:01Z:

- Use UTC instead of GMT (should not change anything)
- Remove German Date/Time formats from LenientDateParser
- Correct order of Date parsing

Revision 318 by thetaphi on 2008-10-30T00:51:40Z:

fix deadlock in IndexBuilder. Problem was "unconditioned wait".

Revision 317 by thetaphi on 2008-10-29T18:10:45Z:

revert last commit (this will not work correctly)

Revision 316 by thetaphi on 2008-10-29T16:55:14Z:

cleanup sessions in LRUMap and cache background task

Revision 315 by thetaphi on 2008-10-29T10:47:19Z:

Optimize and compact StringBuilder appends.

Revision 310 by thetaphi on 2008-10-23T12:47:52Z:

fix encoding issue

Revision 309 by thetaphi on 2008-10-23T08:33:26Z:

Add missing javadocs in utils package. Some small changes in code of ExtendedDigester.

Revision 308 by thetaphi on 2008-10-22T21:31:41Z:

Add JavaDoc for TrieUtils and TrieRangeQuery, cite the paper.

Revision 307 by thetaphi on 2008-10-22T17:40:42Z:

Update of Todo list

Revision 306 by thetaphi on 2008-10-22T12:37:09Z:

small fixes in examples (names, incorrect web.xml, readme)

Revision 305 by thetaphi on 2008-10-22T06:36:14Z:

slf4j update

Revision 304 by thetaphi on 2008-10-21T12:33:26Z:

Add examples to panFMP distribution:
- a PHP example using SOAP API
- two Java Servlets (Paging and Collector API)
Both examples use the example configuarion (DIF metadata) and have XSLs to map to HTML.

Revision 303 by thetaphi on 2008-10-10T22:24:20Z:

Update to final release of Lucene 2.4.0

Revision 302 by thetaphi on 2008-10-08T16:30:41Z:

update nekohtml (1.9.9) and slf4j (1.5.3)

Revision 301 by thetaphi on 2008-10-08T13:28:43Z:

rename a harvester property, as now changes to index are only commit at end of harvesting.
This change is not backwards compatible, you may have to edit your config file:
<cfg:changesBeforeIndexCommit>1000</cfg:changesBeforeIndexCommit> -> gets:

Revision 300 by thetaphi on 2008-10-08T09:12:25Z:

small refactoring with LoggingErrorListener

Revision 299 by thetaphi on 2008-10-03T11:53:07Z:

small performance optimization: use System.currentTimeMillis() instead of new Date().getTime()

Revision 298 by thetaphi on 2008-10-01T16:16:23Z:

use remove() in ThreadLocal

Revision 297 by thetaphi on 2008-09-30T09:01:50Z:

disable saving of TF for trie fields.

Revision 296 by thetaphi on 2008-09-26T20:14:58Z:

Add JavaDoc #2

Revision 295 by thetaphi on 2008-09-26T09:27:45Z:

Add JavaDoc.

Revision 294 by thetaphi on 2008-09-25T15:51:19Z:

update to Lucene 2.4.0-rc2:
- fix Checker
- fix deprecation of autoCommit=false with IndexWriter ctor.

Revision 293 by thetaphi on 2008-09-25T14:07:41Z:

javadoc fix

Revision 292 by thetaphi on 2008-09-25T14:02:29Z:

some final optimizations in AutoCloseIndexReader

Revision 291 by thetaphi on 2008-09-24T13:41:04Z:

New implementation of LuceneCache with TimerTasks (cache cleanup in background):
- new search config variables
- index readers are kept open after reloading for configureable time or until GC removes them
- use WeakReference to hold old readers in IndexConfig

Revision 290 by thetaphi on 2008-09-24T08:52:28Z:

remove "lastharvested" index file if CheckIndex finds error

Revision 289 by thetaphi on 2008-09-24T08:25:13Z:

remove empty lines from logging PrintStream

Revision 288 by thetaphi on 2008-09-23T21:15:37Z:

correct fixing of index

Revision 287 by thetaphi on 2008-09-23T17:54:38Z:

- rename ReadOnlyAutoCloseIndexReader to AutoCloseIndexReader
- fix Checker for Lucene 2.4 (may change when final version is out, see

Revision 286 by thetaphi on 2008-09-23T14:32:19Z:

- upgrade to Lucene 2.4 (RC1)
- some restructuring
- not backwards compatible renaming of IndexConfig methods

Revision 284 by thetaphi on 2008-09-23T08:33:41Z:

force IndexWriter to optimize synchronous

Revision 283 by thetaphi on 2008-09-22T18:16:01Z:

fix some small leaks

Revision 282 by thetaphi on 2008-09-21T20:41:05Z:

new code to keep reopened or closed IndexReaders open until finalization. This helps preventing the problem, that one thread reopens all index readers at the same time another thread does a search, which then crashes.
Hope, this is bug free, may need some testing.

Revision 281 by thetaphi on 2008-09-19T10:32:21Z:

doc update

Revision 280 by thetaphi on 2008-09-19T10:22:27Z:

support change of default query parser operator

Revision 279 by thetaphi on 2008-09-16T08:03:41Z:

hide constructor of LogUtil

Revision 278 by thetaphi on 2008-09-14T10:28:12Z:

change log messages

Revision 277 by thetaphi on 2008-09-12T17:33:51Z:

remove ZipFileHarvester TODO items (as already done).

Revision 276 by thetaphi on 2008-09-12T12:31:13Z:

change date stamp handling of ZIP file

Revision 275 by thetaphi on 2008-09-12T10:36:18Z:

fix some inconsistency with datestamps

Revision 274 by thetaphi on 2008-09-11T13:54:20Z:

fix doc bug

Revision 273 by thetaphi on 2008-09-11T10:26:22Z:

- add missing system identifier (for error messages) in ZipFileHarvester
- fix some StringBuilder mis-use

Revision 272 by thetaphi on 2008-09-09T17:43:13Z:

New Harvester: ZipFileHarvester (reads files from ZIP file/URL)

Revision 271 by thetaphi on 2008-09-08T08:38:59Z:

- Change parameter parsing ("*" optional)
- Possibility to detach Jetty, reset default to for debugging
- build.xml: move deleting of log files to other node

Revision 270 by thetaphi on 2008-09-07T16:17:06Z:

- Update of some Jakarta Commons components
- Update Nekohtml
- Update license infos with URLs and the SLF4J MIT license

Revision 269 by thetaphi on 2008-09-07T13:40:40Z:

Bundle Jetty as webserver for Axis

Revision 268 by thetaphi on 2008-09-07T12:35:28Z:

- Use "exec" in scripts
- README.txt: add note about parameter parsing

Revision 267 by thetaphi on 2008-09-07T11:39:55Z:

preserve exit code in

Revision 266 by thetaphi on 2008-09-07T11:38:07Z:

- Locking mechanism for cronjobs
- some script names changed
- remove "export" in
- change location of harvest.log file

Revision 265 by thetaphi on 2008-09-07T09:34:11Z:

make posix shell compatible

Revision 264 by thetaphi on 2008-09-07T09:23:09Z:

Fix build script, to not delete the emoty lucene-store

Revision 263 by thetaphi on 2008-09-07T08:51:13Z:

Fix build script to include new repository dir in binpackage and delete lucene-store on clean

Revision 262 by thetaphi on 2008-09-06T22:23:47Z:

first version of ne directory structure with configurable scripts

Revision 261 by thetaphi on 2008-09-06T12:01:09Z:

remove externals

Revision 258 by thetaphi on 2008-09-06T11:30:16Z:

move external libs directory to trunk and delete it

Revision 255 by thetaphi on 2008-09-05T18:11:46Z:

Add logging of index ID when collecting results

Revision 254 by thetaphi on 2008-09-05T17:52:27Z:

- add new script with class for checking indexes
- cleanup .sh files

Revision 253 by thetaphi on 2008-09-05T10:59:40Z:

- maybe fix the annoying reopen bug...????
- add log to IndexConfig classes

Revision 252 by thetaphi on 2008-09-02T09:34:29Z:

Big renaming operation of LuceneConversions:
- new name: TrieUtils
- method names changed
Should not bring changes for normal client code

Revision 251 by thetaphi on 2008-08-28T12:29:07Z:

- Update citation of C&G article
- Make variables/parameters final in LuceneConversions and TrieRangeQuery

Revision 250 by thetaphi on 2008-08-03T22:57:59Z:

fix small bug: during interruption of converter thread, the interrupted exception is not handled correctly

Revision 249 by thetaphi on 2008-07-04T08:44:30Z:

documentation fix

Revision 248 by thetaphi on 2008-07-03T18:21:28Z:

fix memory leak because InflaterInputStream & Co use native library and should be closed directly after using and not by carbage collector. See

Revision 247 by thetaphi on 2008-06-20T00:13:37Z:

more elegant dom tree enumeration

Revision 246 by thetaphi on 2008-05-29T06:27:33Z:

remove unneeded package prefix (package is in imports)

Revision 245 by thetaphi on 2008-05-28T13:58:55Z:

remove unneeded synchronization for sessions (Collections.synchronizedMap() uses different mutex, not itsself).

Revision 244 by thetaphi on 2008-05-27T18:41:15Z:

javadoc update (author missing in new file)

Revision 243 by thetaphi on 2008-05-27T09:47:24Z:

Make harvester more fault tolerant on conversion errors (e.g. NumberFormatException during XPath). The default is still to stop conversion (important for example if XPath Queries are faulty). When configuration is "tested" it can be switched to ignore conversion errors or delete all faulty documents.

Revision 242 by thetaphi on 2008-05-23T06:29:33Z:

synchronization added

Revision 241 by thetaphi on 2008-05-22T21:41:35Z:

make SearchResultList members private (not needed outside anymore)

Revision 240 by thetaphi on 2008-05-22T21:39:13Z:

get size of SearchResultList list with IOException (outside List-code)

Revision 239 by thetaphi on 2008-05-21T13:44:16Z:

cache factor fix

Revision 238 by thetaphi on 2008-05-21T13:16:25Z:

again fix a bug in new Hits implementation

Revision 237 by thetaphi on 2008-05-21T13:10:33Z:


Revision 236 by thetaphi on 2008-05-21T13:08:47Z:


Revision 235 by thetaphi on 2008-05-21T12:42:37Z:

remove soon deprecated "Hits" usage

Revision 234 by thetaphi on 2008-05-21T09:26:16Z:

fix uninitialized variable

Revision 233 by thetaphi on 2008-05-19T09:25:45Z:

use java.concurrent.locks.* for locking and implement timeout

Revision 232 by thetaphi on 2008-05-18T12:19:43Z:

hopefully fix a deadlock...

Revision 223 by thetaphi on 2008-03-05T22:23:55Z:

rename tag for augmentation during validation

Revision 222 by thetaphi on 2008-03-05T13:06:36Z:

- Add possibility to not augment documents during validation (default is to do it)
- Add search property with QueryParser class

Revision 218 by thetaphi on 2008-01-22T23:20:03Z:

bug in index reopening => restructure again :)

Revision 217 by thetaphi on 2008-01-22T22:37:35Z:

restructure IndexConfig & Co.

Revision 215 by thetaphi on 2008-01-20T12:28:39Z:

update version checking

Revision 214 by thetaphi on 2008-01-20T01:05:20Z:

- new version comparison
- check digester version

Revision 213 by thetaphi on 2008-01-19T23:04:53Z:

again the enums... ;-)

Revision 212 by thetaphi on 2008-01-19T10:37:39Z:

prevent a NPE

Revision 210 by thetaphi on 2008-01-18T13:36:13Z:

small bug in calculation of next harvesting datestamp in SingleFileEntitiesHarvester

Revision 209 by thetaphi on 2008-01-18T13:30:58Z:

AXIS support for MoreLikeThis

Revision 208 by thetaphi on 2008-01-18T13:10:30Z:

MoreLikeThisQuery priorityqueue refactored

Revision 207 by thetaphi on 2008-01-18T11:03:33Z:

- documentation error
- w/s

Revision 206 by thetaphi on 2008-01-18T00:28:37Z:

- remove usage of EnumSet, it is simplier and faster another way
- rename entry in SingleFilesEntitiesHarvester.ParseErrorAction

Revision 205 by thetaphi on 2008-01-17T23:49:55Z:

Cleanup code, made harvester API more clear

Revision 204 by thetaphi on 2008-01-17T16:04:06Z:

parseErrorAction values are now checked on based enum and meaningfull error message is printed on harvester startup.

Revision 203 by thetaphi on 2008-01-17T15:49:59Z:

remove dead code part

Revision 202 by thetaphi on 2008-01-17T14:17:48Z:

little update in harvester error handler

Revision 201 by thetaphi on 2008-01-17T14:06:38Z:

- Changed Harvester interface to differentiate between clean and unclean shutdown (simplifies handling of index properties like lastHarvestDate and validIdentifiers)
- New abstract SingleFileEntitiesHarvester as superclass of WebCrawlingHarvester and DirectoryHarvester that manages similarities and parsing of each file entity with better error reporting. Both harvesters are now able to ignore corrupt XML files during harvesting.
- Better error handling in Harvester (SAXParseExceptions and TransformerExceptions are logged with location info

Revision 200 by thetaphi on 2008-01-16T21:36:22Z:


Revision 199 by thetaphi on 2008-01-16T14:10:07Z:

Add support for "More like this" queries to SearchService. The AXIS implementation is still missing this, but will be added later.

Revision 197 by thetaphi on 2008-01-13T23:53:16Z:

use boost in TrieRangeQuery for hashCode(), toString() and equals()

Revision 196 by thetaphi on 2008-01-13T23:34:08Z:

Javadoc problem.

Revision 195 by thetaphi on 2008-01-13T21:47:46Z:

Make TrieRangeQuery final

Revision 194 by thetaphi on 2008-01-13T21:37:33Z:

small StringBuilder optimization

Revision 193 by thetaphi on 2008-01-11T10:42:59Z:

- remove support for threaded virtual indexes (as this makes more problems and is not optimal for heavy sites with much indexes per virtual index)
- refactor code of index configuration
- cache MultiReader in virtual index for sort performance

Revision 191 by thetaphi on 2008-01-11T08:34:49Z:

for not threaded virtual indexes (recommened in most cases) use an IndexSearcher over a MultiReader instead a MultiSearcher over separate IndexSearchers

Revision 190 by thetaphi on 2008-01-10T21:45:09Z:

- reopen corrected (close old reader after reopening)
- made closeIndex() abstract

Revision 189 by thetaphi on 2008-01-10T14:33:09Z:

missed a error message rewrite

Revision 188 by thetaphi on 2008-01-10T14:30:31Z:

switch to RC1 of Lucene 2.3, which seems stable. As son as the final version is released I replace the file again. This commit also uses the new IndexReader.reopen() method to quicker reopen indexes after a parallel harvesting.

Revision 185 by thetaphi on 2008-01-02T21:46:07Z:

bump year

Revision 184 by thetaphi on 2008-01-02T21:35:37Z:

- Support of OAI harvesters and WebCrawlingHarvester for a connect/read timeout
- Refactor code (remove OAIDownload class)
- Remove recursion in download retries

Revision 183 by thetaphi on 2007-12-29T10:13:12Z:

better logging of new session (with index)

Revision 182 by thetaphi on 2007-12-27T10:19:46Z:

more effective casts: primitive to objects

Revision 181 by thetaphi on 2007-12-20T19:08:36Z:

enable logging in optimizer writer, take #2

Revision 180 by thetaphi on 2007-12-20T18:44:12Z:

enable logging in optimizer writer.

Revision 179 by thetaphi on 2007-12-20T13:45:58Z:

* Better logging of conversion errors in indexer
* Only store the first error in background threads as failure, later ones are simply logged.

Revision 178 by thetaphi on 2007-12-20T10:57:23Z:

update nekohtml parser.

Revision 177 by thetaphi on 2007-12-18T10:32:39Z:

add URL exclusion filter to WebCrawlingHarvester

Revision 176 by thetaphi on 2007-12-18T09:58:01Z:

- Add support for term vectors (no search support for that until now, but indexes supporting them can be build)
- Renaming of FieldConfig variables (remove lucene*)

Revision 175 by thetaphi on 2007-11-29T08:13:21Z:

Optimization in session creation

Revision 174 by thetaphi on 2007-11-28T23:15:51Z:

remove an outdated comment

Revision 173 by thetaphi on 2007-11-28T23:02:17Z:

better locking mechanism in search cache cleanup

Revision 172 by thetaphi on 2007-11-28T10:35:27Z:

fix bug in cache cleanup code by replacing with Common's LRUMap

Revision 171 by thetaphi on 2007-11-11T12:08:37Z:

API change for valid harvester properties. They are now added to a Set and returned in public API (unmodifiable). Custom harvesters must be changed to this new API.

Revision 170 by thetaphi on 2007-11-11T10:28:05Z:

small problem in #2

Revision 169 by thetaphi on 2007-11-11T10:23:56Z:

small problem in

Revision 168 by thetaphi on 2007-11-11T10:19:58Z:

Support PrintStream logging of IndexWriter through commons logging system (debug level) to make it possible to track index merges etc.

Revision 167 by thetaphi on 2007-10-27T07:23:09Z:

typos in documentation

Revision 166 by thetaphi on 2007-10-26T14:09:30Z:

list and explain harvester properties in each harvester class documentation

Revision 164 by thetaphi on 2007-10-17T08:21:00Z:

- Renaming methods in LuceneConversions to show data type in method name.
- Remove public method for inserting trie-based values in encoded form

Revision 163 by thetaphi on 2007-10-16T23:44:09Z:

Restructuring methods in LuceneConversions that create Trie based index entries. The methods can now be simplier reused in foreign projects without knowing too much about panFMP internas.

Revision 162 by thetaphi on 2007-10-11T09:29:17Z:

Some cleanups and documentation updates

Revision 161 by thetaphi on 2007-10-10T21:55:15Z:

fix bug in OAIStaticRepositoryHarvester

Revision 160 by thetaphi on 2007-10-10T19:44:47Z:

Support for If-Modified-Since in OAIStaticRepositoryHarvester

Revision 159 by thetaphi on 2007-10-10T18:19:05Z:

new OAIStaticRepositoryHarvester to harvest static repositories.
TODO: enable If-Modified-Since for harvesting the static file.

Revision 158 by thetaphi on 2007-10-07T18:05:48Z:

documentation for

Revision 157 by thetaphi on 2007-09-06T12:28:19Z:

convert spaces to tabs, unix line endings

Revision 156 by thetaphi on 2007-09-06T09:41:39Z:

Add boost support to FieldCheckingQuery

Revision 155 by thetaphi on 2007-09-06T08:19:41Z:

add missing "private"

Revision 154 by thetaphi on 2007-08-22T11:47:28Z:

Make compatible with Digester 1.8

Revision 153 by thetaphi on 2007-08-20T21:59:13Z:

Documentation update.

Revision 152 by thetaphi on 2007-08-19T20:49:24Z:


Revision 151 by thetaphi on 2007-08-19T20:31:50Z:

* Introduction of BooleanParser that accepts true, false, yes, no, on, off as values
* New harvesterProperty to enable/disable compression of XML (default=true)

Revision 150 by thetaphi on 2007-08-19T15:59:08Z:

SearchResultItem: Default is to return field as String (this is foreward-compatible)

Revision 149 by thetaphi on 2007-08-19T14:34:19Z:

Clean up namespace declarations in templates

Revision 148 by thetaphi on 2007-08-19T13:17:55Z:

Two new datatypes for fields (for stored fields only):
* XML: saves the XML of XPath/Template expression as XML string
* XHTML: stores a <field-template/> as XHTML string in index. Useful to generate XHTML-thumbnails

lucenestorage attribute may now contain COMPRESSED, too. This stores the field result in compressed form in index. Useful e.g. for XHTML fields

Revision 147 by thetaphi on 2007-08-19T09:35:55Z:

Documentation updates.

Revision 146 by thetaphi on 2007-08-19T08:29:58Z:

Documentation: one tags for bold too much.

Revision 145 by thetaphi on 2007-08-18T12:00:11Z:

Documentation updates.

Revision 144 by thetaphi on 2007-08-17T16:18:25Z:

Make it possible to hide XML without specifying other fields to load. Now, search always uses a FieldSelector

Revision 143 by thetaphi on 2007-08-17T12:55:38Z:

Some cleanups

Revision 142 by thetaphi on 2007-08-17T09:54:36Z:

Remove not needed IOExceptions in Query factories. Add MatchAllDocsQuery support. Removed declaration of unchecked NumberFormatExceptions

Revision 141 by thetaphi on 2007-08-16T12:44:51Z:

Make constructors of return objects in webservice protected

Revision 140 by thetaphi on 2007-08-16T12:35:53Z:

New possibility in API and webservice to store querys in cache to retrieve later by a simple hash string.

Revision 139 by thetaphi on 2007-08-15T17:51:46Z:

fix bug with anyOf using wrong operator in AXIS webservice

Revision 138 by thetaphi on 2007-08-15T15:48:16Z:

Some changes to show Java Collection API list usage for paged results. Implement this in webservice, too.

Revision 137 by thetaphi on 2007-08-15T13:37:01Z:

TODO update

Revision 136 by thetaphi on 2007-08-15T13:33:51Z:

New search API. Please read the JavaDocs for examples how to use it. The old AXIS engine was moved to Please use this API only in web services. The new API supports queries in any boolean combination and uses all standard Lucene classes for query construction.

Revision 134 by thetaphi on 2007-08-08T21:03:03Z:

Fix not Java but ISO8601 compatible timezone with ":" in

Revision 133 by thetaphi on 2007-08-07T21:34:21Z:

If index is created not updated, do not try to delete unknown identifiers, this is useless.

Revision 132 by thetaphi on 2007-08-02T22:33:32Z:

WebCrawlingHarvester: Initial redirect to foreign address allowed, better checking of redirect targets, fixed EOF error when HEAD request with Content-Encoding

Revision 131 by thetaphi on 2007-08-02T16:21:11Z:

* Fix NPE in toString() of some config classes
* Fix content-type parsing with not-lowercase charset
* Default log4J logfile with separate logging entry for this package

Revision 130 by thetaphi on 2007-08-02T10:20:42Z:

Update javadoc documentation

Revision 129 by thetaphi on 2007-08-01T17:18:58Z:

some changes and documentation, new property to insert a short pause between HTTP requests.

Revision 128 by thetaphi on 2007-08-01T13:22:01Z:

* outdated Comments removed
* HTML flow changed, some features changed, "Accept:" header changed

Revision 127 by thetaphi on 2007-08-01T10:59:33Z:

Small fix in HTML parsing to enable body inside frameset/noframes

Revision 126 by thetaphi on 2007-08-01T10:51:03Z:

Replace the unreasonable HTML parser in WebCrawlingHarvester by NekoHTML.

Revision 125 by thetaphi on 2007-07-31T22:37:04Z:

Better pattern matching in HTML analyzer of WebCrawlingHarvester

Revision 124 by thetaphi on 2007-07-31T21:19:59Z:

Default XML charset support for WebCrawlingHarvester

Revision 123 by thetaphi on 2007-07-31T20:11:47Z:

* Added support for harvesters that do not get "deleted" documents (e.g. DirectoryHarvester) to delete "unknown" documents. The harvester can create a set of "valid" identifiers on harvesting and submit that list to the IndexBuilder. After indexing all new documents, this set is synchronized with the index and spare documents deleted.
* Added new harvester: WebCrawlingHarvester -- works like WGET and harvests all documents from a directory and its subdirectories on an webserver. It analyzes HTML pages with links and harvests all documents with correct MIME type and extension that are below the initial URL.

Revision 122 by thetaphi on 2007-07-31T12:09:55Z:

Remove fromDateReference/thisHarvestDateReference from individual harvester and make it available from the abstract Harvester. thisHarvestDateReference should be set after a successful harvest only.

Revision 119 by thetaphi on 2007-07-31T07:42:38Z:

print SAXParseException correct

Revision 118 by thetaphi on 2007-07-31T07:05:28Z:

* OAIHarvester: Exceptions, embedded in Digester SAXExceptions, are are reported with correct stack trace (redesign for patch yesterday)
* Config dto.

Revision 117 by thetaphi on 2007-07-30T22:42:58Z:

* IndexBuilder: Manage the Exceptions in background threads better and only throw a new special IndexBuilderBackgroundFailure to stop harvesting process. The real error is then printed after closing Harvester.
* IndexBuilder: Use Java 1.5 atomic API
* OAIHarvester: Exceptions, embedded in Digester SAXExceptions, are are reported with correct stack trace.

Revision 115 by thetaphi on 2007-07-30T17:14:32Z:

Use TreeHashMaps/TreeHashSets in some cases to preserve order of indexes and fields. HarvesterCommitEvent was changed and documented to receive Sets of committed identifiers.

Revision 114 by thetaphi on 2007-07-30T15:59:54Z:

Better error code in IndexBuilder with better messages on shutdown. A second deadlock situation was resolved.

Revision 113 by thetaphi on 2007-07-30T12:34:30Z:

Error in Harvester that used always the same harvester -- hmpf!

Revision 112 by thetaphi on 2007-07-30T12:28:55Z:

* Remove AbstractHarvester and replace by Harvester
* Add javadocs to Harvester
* New Rebuilder
* More small changes.
* Deadlock bug in IndexBuilder solved.

Revision 111 by thetaphi on 2007-07-29T23:19:48Z:

fix bug with javadoc generation in older ANT

Revision 110 by thetaphi on 2007-07-29T22:59:12Z:

Cleanup in Rebuilder.

Revision 109 by thetaphi on 2007-07-29T22:29:33Z:

* Javadocs updates.
* MetadataDocument now with field for config.

Revision 108 by thetaphi on 2007-07-29T20:15:59Z:

MetadataDocument.invalidateXMLCache() no longer needed

Revision 107 by thetaphi on 2007-07-29T19:49:46Z:

make MetadataDocument a correct JavaBean

Revision 103 by thetaphi on 2007-07-29T17:34:21Z:

more simplification in Config (remove unused ExtendedDigester parameters from methods in Config)

Revision 102 by thetaphi on 2007-07-29T17:23:25Z:

simplify inner classes of Config

Revision 101 by thetaphi on 2007-07-29T17:02:17Z:

simplify TrieRangeQuery #2 (faster, because not private)

Revision 100 by thetaphi on 2007-07-29T16:50:51Z:

simplify TrieRangeQuery

Revision 99 by thetaphi on 2007-07-29T16:44:09Z:

Link to website in javadocs

Revision 98 by thetaphi on 2007-07-29T16:04:24Z:

add missing @PublicForDigesterUse

Revision 97 by thetaphi on 2007-07-29T16:01:47Z:

add missing deprecated

Revision 96 by thetaphi on 2007-07-29T15:36:30Z:

New annotation @PublicForDigesterUse which marks methods/classes that are only public for Digester but are not intended to be public.

Revision 95 by thetaphi on 2007-07-29T14:09:33Z:

* more @Override
* rename AnyExpressionConfig

Revision 94 by thetaphi on 2007-07-29T13:58:37Z:

add @Override where applicable to be sure that overrided method has correct signature

Revision 93 by thetaphi on 2007-07-29T12:50:09Z:

* SaxRule redesign
* Version printout on Config startup

Revision 92 by thetaphi on 2007-07-28T09:59:42Z:

version information to log on config startup

Revision 91 by thetaphi on 2007-07-24T15:04:29Z:

make example config file with a namespace prefix, this helps with xincluded xsl, because default namespace is still valid!

Revision 90 by thetaphi on 2007-07-24T15:02:10Z:

make example config file with a namespace prefix, this helps with xincluded xsl, because default namespace is still valid!

Revision 89 by thetaphi on 2007-07-24T09:26:59Z:

print list of supported properties on error.

Revision 88 by thetaphi on 2007-07-24T09:21:36Z:

implement checking of harvester property names on config load.

Revision 87 by thetaphi on 2007-07-24T07:54:13Z:

some additional checks for queue sizes and thread count

Revision 86 by thetaphi on 2007-07-23T22:51:06Z:

fix bug in harvesterCommitEvent (PangaVista!!!)

Revision 85 by thetaphi on 2007-07-23T22:37:13Z:

implement missing checkIndexerBuffer() in IndexBuilder

Revision 84 by thetaphi on 2007-07-23T20:49:54Z:

* New IndexBuilder implementation using BlockingQueue's
* New harvester properties, see config.xml

Revision 83 by thetaphi on 2007-07-22T20:59:18Z:

Make Config.TemplateSaxRule compatible with XSLTC by adding dummy namespace prefixes to auto-generated stylesheets

Revision 82 by thetaphi on 2007-07-22T18:04:08Z:

* Implement better thread synchronisation between harvester, converter and indexer
* remove <validate> and <autoOptimize> from index configuration. Instead put it in a similar way into <harvesterProperties>. This needs a change in config files! By that it is possible to globally enable autoOptimize or disable validation from globalHarvesterProperties

Revision 81 by thetaphi on 2007-07-22T16:13:31Z:

rename "indices" to its correct English name "indexes"

Revision 80 by thetaphi on 2007-07-22T15:48:12Z:

rename "indices" to its correct English name "indexes"

Revision 79 by thetaphi on 2007-07-22T15:41:46Z:

small fixes

Revision 77 by thetaphi on 2007-07-17T07:03:26Z:

rename function in XPathResolverImpl

Revision 76 by thetaphi on 2007-07-16T06:40:30Z:

supply index config to MetadataDocument.loadFromLucene()

Revision 75 by thetaphi on 2007-07-16T06:36:12Z:

Exception on document loading from index when identifier empty

Revision 74 by thetaphi on 2007-07-15T22:04:23Z:

Some cosmetic changes:
* Only explicitely start indexer thread on IndexBuilder.close()
* logging messages

Revision 73 by thetaphi on 2007-07-15T21:29:14Z:

fix bug of missing document: last converter thread finalizes indexer

Revision 72 by thetaphi on 2007-07-15T20:51:18Z:

Multiple converter threads support (see config.xml). Default=1

Revision 71 by thetaphi on 2007-07-15T19:38:59Z:

separate Locks in IndexBuilder (new Object()), this will enable more than one converter thread.

Revision 70 by thetaphi on 2007-07-15T17:07:29Z:

supply index config to MetadataDocument.createInstanceFromLucene()

Revision 69 by thetaphi on 2007-07-15T16:31:14Z:

* Store class name of MetadataDocument class used to store the document.
* Remove Sets from standard MetadataDocument
* New OAIMetadataDocument with sets
* Rebuilding now generates documents using stored class

Revision 68 by thetaphi on 2007-07-15T14:22:05Z:

javadocs build update

Revision 66 by thetaphi on 2007-07-15T12:57:39Z:

check that all variables are declared *before* fields and filters

Revision 65 by thetaphi on 2007-07-14T15:31:49Z:

cleanup imports

Revision 64 by thetaphi on 2007-07-14T15:15:15Z:

Restructuring of configuration without inner classes #1

Revision 63 by thetaphi on 2007-07-11T09:29:02Z:

add identifierPrefix as harvesterProperty to This enables setting a prefix, that is inserted after "file:" and before the relative file name.

Revision 62 by thetaphi on 2007-07-10T16:46:50Z:

small update in startThreads()

Revision 61 by thetaphi on 2007-07-10T11:57:34Z:

Fix failure throwing and exit conditions.

Revision 60 by thetaphi on 2007-07-09T22:21:38Z:

* Small timestamp bug in
* Start with JavaDoc comments

Revision 59 by thetaphi on 2007-07-09T18:47:11Z:

Move Lucene version check into Package class.

Revision 58 by thetaphi on 2007-07-09T17:17:13Z:

Put a Lucene version check into Config class.

Revision 57 by thetaphi on 2007-07-08T21:57:57Z:

todo update

Revision 56 by thetaphi on 2007-07-08T21:45:21Z:

Logging during harvesting centralized in

Revision 55 by thetaphi on 2007-07-08T20:01:38Z:

Disable memory checking complete, make global harvester properties for IndexBuilder buffers.

Revision 54 by thetaphi on 2007-07-08T19:31:24Z:

Disable memory checking until further investigations

Revision 53 by thetaphi on 2007-07-08T18:14:03Z:

Enable memory checking for index builder to auto-decrement the buffers on low memory. Added global harvester properties.

Revision 52 by thetaphi on 2007-07-08T17:37:55Z:

Make Rebuilder a subclass of AbstractHarvester.

Revision 51 by thetaphi on 2007-07-08T14:22:46Z:

IndexBuilder was extended by an additional thread for convertig the MetadataDocument to Lucene Documents. Harvesting now runs with three threads:
* Harvesting (primary)
* Converting Documents
* Indexing

Revision 50 by thetaphi on 2007-07-07T13:24:35Z:

make SaxFilter private

Revision 49 by thetaphi on 2007-07-06T17:10:47Z:

implement document boosting NUMBER-returning by XPath

Revision 48 by thetaphi on 2007-07-06T14:39:48Z:

Fix filters in SEARCH mode

Revision 47 by thetaphi on 2007-07-06T14:28:31Z:

Support for XSL Templates in variables and fields <variable-template>, <field-template>. Result is treated like a XPath NodeSet and indexed.

Revision 46 by thetaphi on 2007-07-05T07:44:00Z:

listTerms with prefix, final for equals() and hashCode(), better hashCodes

Revision 45 by thetaphi on 2007-07-03T22:30:04Z:

warning message about close failure with exception info

Revision 44 by thetaphi on 2007-07-03T22:19:31Z:

Constants and Enums uppercase (change Config.DataType)

Revision 43 by thetaphi on 2007-07-03T13:57:15Z:

remove not needed synchronization

Revision 42 by thetaphi on 2007-07-03T09:13:49Z:

cleanups, new separate QNameParser

Revision 41 by thetaphi on 2007-07-03T06:51:51Z:

Make XPathResolver more clearer structured...

Revision 40 by thetaphi on 2007-07-02T20:45:28Z:

- re-implement XPath resolvers
- add xpath function to check other indices for duplicates
- change behavior of IndexConfig to enable opening of IndexReaders without cache that can be closed
- add finallys to correctly close lucene Resources

Revision 39 by thetaphi on 2007-07-02T06:58:33Z:

config.xml did not what it should do

Revision 38 by thetaphi on 2007-07-02T06:38:14Z:

cleanup on Exception during variable processing

Revision 37 by thetaphi on 2007-07-02T00:26:19Z:

remove inner class from Rebuilder and implement reconstructing of MetadataDocument from Lucene

Revision 36 by thetaphi on 2007-07-01T22:24:06Z:

REVERT: Context node of XPath is document element not the DOM Document itsself!

Revision 35 by thetaphi on 2007-07-01T20:39:26Z:

debugging functionality

Revision 34 by thetaphi on 2007-07-01T16:44:35Z:

Context node of XPath is document element not the DOM Document itsself!

Revision 33 by thetaphi on 2007-07-01T16:14:59Z:

Implement filter mechanism. Docs can be filtered during harvesting by specifying one or more XPathes that allow or deny them.

Revision 32 by thetaphi on 2007-07-01T13:06:42Z:

change metadatq structure for variables to prepare document filtering

Revision 31 by thetaphi on 2007-07-01T10:05:15Z:

TermCheckerSet with correct generics

Revision 30 by thetaphi on 2007-06-30T08:37:15Z:

better analyzer from classname generator

Revision 29 by thetaphi on 2007-06-30T08:12:18Z:

enable check for deprecation and unchecked

Revision 28 by thetaphi on 2007-06-30T08:11:24Z:

remove deprecation and unchecked warnings

Revision 26 by thetaphi on 2007-06-29T09:47:51Z:

New Feature: XPath variables in <fields>: you can define XPath variables like in XSLT before the field definitions and use these variables in other XPathes

Revision 25 by thetaphi on 2007-06-26T19:37:10Z:

fix issue with orphaned files on optimize after harvesting/rebuilding

Revision 23 by thetaphi on 2007-06-26T19:13:16Z:

fix build.xml update

Revision 22 by thetaphi on 2007-06-26T19:03:39Z:

fix build.xml update

Revision 21 by thetaphi on 2007-06-26T19:01:39Z:

make scripts & config working
enable building of source package

Revision 20 by thetaphi on 2007-06-26T17:07:46Z:

fix build.xml update

Revision 19 by thetaphi on 2007-06-26T17:06:42Z:

build.xml update

Revision 18 by thetaphi on 2007-06-26T14:28:49Z:

build.xml update

Revision 16 by thetaphi on 2007-06-26T13:21:41Z:

build system with version numbers and manifest

Revision 14 by thetaphi on 2007-06-26T10:12:12Z:

javadoc fix

Revision 13 by thetaphi on 2007-06-25T12:27:08Z:

IndexBuilder error handling on close

Revision 12 by thetaphi on 2007-06-22T22:44:41Z:

rebuilder opens index before creating indexbuilder. this helps to support complete rebuild (create=true).

Revision 11 by thetaphi on 2007-06-22T22:22:43Z:

IndexBuilder update #2

Revision 10 by thetaphi on 2007-06-22T22:12:33Z:

IndexBuilder rewritten to not use IndexReader to delete/update docs. This can be done with IndexWriter directly.

Revision 9 by thetaphi on 2007-06-22T19:16:23Z:

Fix IndexBuilder close, activate Lucene 2.2 setAllowDocsOutOfOrder in BooleanQuery

Revision 6 by thetaphi on 2007-06-22T17:13:14Z:

documentation things

Revision 4 by thetaphi on 2007-06-22T16:45:14Z:

Rename AdvRangeQuery & others to TrieRangeQuery (2)

Revision 3 by thetaphi on 2007-06-22T16:24:04Z:

Rename AdvRangeQuery & others to TrieRangeQuery

Revision 2 by thetaphi on 2007-06-21T18:12:21Z:

version number

Revision 1 by thetaphi on 2007-06-21T17:15:50Z:

initial import