commit 091115564e1b5c9716d322db62f2274e454a5cbe Author: Uwe Schindler Date: Fri Jul 21 19:43:31 2023 +0200 Update XALAN-J (faster XSLTC, bugfix); unfprtunately serializer is still broken, so keep existing patched serializer commit a49c97680833fd56277153c9ba4df5f8320f84de Author: Uwe Schindler Date: Mon Mar 27 16:34:48 2023 +0200 Upgrade forbiddenapis to 3.5 commit 51713766cef221c1b5ec25aadfb09147e7d67161 Author: Uwe Schindler Date: Tue Feb 28 00:02:20 2023 +0100 Update forbiddenapis to 3.4 commit 46057a4d0e302f19e8dd14f3444ec50bc615eb48 Author: Uwe Schindler Date: Tue Aug 2 12:01:11 2022 +0200 Better handling of HttpClient InterruptedException commit 1b021c9d82b7f1c57dc728b8dd162d71410e4974 Author: Uwe Schindler Date: Sun Jul 31 12:04:22 2022 +0200 Add workaround for: https://stackoverflow.com/questions/55087292/how-to-handle-http-2-goaway-with-httpclient commit 0e8ad77ef91b2dfca4169ed9099b9a6174ae1ae2 Author: Uwe Schindler Date: Sat Jul 30 00:02:13 2022 +0200 Clean up socket leak; suppress close exception commit a58b94b9bdd12e74d8d9a6acdc2998fa8580f684 Author: Uwe Schindler Date: Fri Jul 29 16:55:53 2022 +0200 Fix bug with initial lookup of base URL commit 6b015e37e311a3614ea4861e310fb2a7706b5ed2 Author: Uwe Schindler Date: Fri Jul 29 14:14:24 2022 +0200 Add missing documentation commit ceeeb283d604007f2e25f9eb1e09e2e5260864db Author: Uwe Schindler Date: Fri Jul 29 13:47:33 2022 +0200 Use new Java 11 HttpClient for WebCrawlingHarvester and OAI (this should improve speed and redirects) commit c9db875fdaac6882cd9dfa6c1393b9806621fe05 Author: Uwe Schindler Date: Thu Jul 28 17:00:39 2022 +0200 Use Runtime.version() to get Java version commit 18a23fe3c90fa90d6edac2edea516e4e96379e8a Author: Uwe Schindler Date: Thu Mar 31 22:43:53 2022 +0200 Fix NPE commit 30cd2dfa276b2363d72230adab46a2e844f009c3 Author: Uwe Schindler Date: Thu Mar 31 22:26:54 2022 +0200 Use new Java 11 static collections commit b61a360f237ec26bd2e88abf1ceaeb15f55004e9 Author: Uwe Schindler Date: Thu Mar 31 16:04:05 2022 +0200 Update minimum requirements to Java 11 commit 658c331a77d5d7f36a8750383423e41f47998c53 Author: Uwe Schindler Date: Thu Mar 31 12:49:16 2022 +0200 Add runtime dependency for JAXB commit 277ff3e2797eeeada5b1fd4adef05802bca8efb0 Author: Uwe Schindler Date: Thu Mar 31 12:26:54 2022 +0200 Use jakarta.xml.bind instead of javax (for Java 11 compatibility) commit 1743c05732b02003d78d1de05465842e5e9dc4a6 Author: Uwe Schindler Date: Sat Mar 5 12:21:09 2022 +0100 Update log4j to 2.17.2 commit 90f312e12d1c8ba47f5d7a457a13935981cda10c Author: Uwe Schindler Date: Wed Dec 29 18:55:06 2021 +0100 Update some libraries commit 8ad9f91a02057b24be6df8b51087fb5d0670f8aa Author: Uwe Schindler Date: Sat Dec 18 12:44:16 2021 +0100 Update to log4j 2.17.0 commit 7e02adec57ac47cf64234f3324a60b027c454bdf Author: Uwe Schindler Date: Tue Dec 14 09:37:16 2021 +0100 Update to Log4J 2.16.0 commit f63b291b345fa509ae2694fdbffe2da29c670f94 Author: Uwe Schindler Date: Fri Dec 10 13:26:49 2021 +0100 Update Log4J to prevent RCE when logging untrusted text commit 2546c3965a8b4bce9477d511900f5d3de1307fb9 Author: Uwe Schindler Date: Mon May 3 12:49:36 2021 +0200 Add missing documentation commit 57f2e8c5ea3e138b93d8ffb5fa37d0e64951f895 Author: Uwe Schindler Date: Mon May 3 12:43:48 2021 +0200 Add support for "Authorization" header in OAI-PMH -- against standard! commit b3aebe732dfe4decd908774533bece6b2502ceaa Author: Uwe Schindler Date: Wed Dec 2 18:32:32 2020 +0100 Improve builder API and make the stream impl ready for parallel execution (bugfix) commit cf7ecfb7d21f30217c861e40f28e1877da5f8106 Author: Uwe Schindler Date: Wed Dec 2 14:26:09 2020 +0100 Improve valid identifiers tracking by using a new hash builder that uses Lucene's BytesRefHash, which is optimized for huge string hashes (UTF-8 instead of UTF-16 encoding, byte pools,...) commit 190585fd44f1b1a2570359e3f817b4cab7c31087 Author: Uwe Schindler Date: Wed Dec 2 00:56:42 2020 +0100 small fix commit bd6fa86ad43e7f83827fb8db9247645667af2027 Author: Uwe Schindler Date: Wed Dec 2 00:52:24 2020 +0100 For huge validIdentifiers sequentially delete (to not allocate the huge terms query) in addition to the Set commit 8015f74e455da0d6bc725853b82c96e0cd859752 Author: Uwe Schindler Date: Wed Nov 4 00:04:18 2020 +0100 Improve shell scripts commit c9861a89cc94aaa0403010072f57c3074c68e637 Author: Uwe Schindler Date: Tue Nov 3 23:52:20 2020 +0100 Improve error handling commit 448ee9e34ae8c7281bbc81eaf314a471dd0d9f83 Author: Uwe Schindler Date: Tue Nov 3 11:14:35 2020 +0100 Remove synchronization on bulkProcessor field (does block) commit 6855755b4cafa9596eba7eed874d66854e989676 Author: Uwe Schindler Date: Tue Nov 3 10:34:19 2020 +0100 Fix threading issue commit 8f84446fe10a6bdd13780a3688c97d1599e49128 Author: Uwe Schindler Date: Tue Nov 3 02:10:53 2020 +0100 Add support for reindex plugin commit cf523643afb1f7500c67838028e3b95f200de0f4 Author: Uwe Schindler Date: Mon Nov 2 15:30:16 2020 +0100 Make startup messages conform with Elasticsearch commit 14c7111c506d8811a866836c7b1398668750dac0 Author: Uwe Schindler Date: Mon Nov 2 15:23:52 2020 +0100 Add Javadocs commit c768540dd8576a561dabf1bf1f004512b9d11601 Author: Uwe Schindler Date: Mon Nov 2 15:19:02 2020 +0100 Cleanup server code: - add path prefix (server.rootPath) - make host/port configurable - add shutdown listener to shutdown server graceful and commit changes to Elasticsearch commit 0f50a6d69bdcbed7ac0d29b8249c8198aedb3b78 Author: Uwe Schindler Date: Sat Oct 31 00:02:31 2020 +0100 Add license header commit 98adcc09d581da787ab90ad37851dc868e3bbb06 Author: Uwe Schindler Date: Sat Oct 31 00:00:28 2020 +0100 Add support for async pushes to index using REST API (#1) First version of new push client: Add Undertow server into separate lib (not bundled yet) and only use for compile. Also cleanup Elasticsearch dependencies, so no needless plugins are initialized commit 135fbc49599c3717fe0bb683c518bb685bc2473d Author: Uwe Schindler Date: Fri Oct 30 01:39:22 2020 +0100 Add missing license header commit 2c41c7cb511354d42d58eef403b9404f5e8e2bfe Author: Uwe Schindler Date: Thu Oct 29 18:43:46 2020 +0100 Update Log4J2 (security) commit aa58cb20e6d2f190e9789bba0ec2f121ccf91b50 Author: Uwe Schindler Date: Thu Oct 29 17:54:59 2020 +0100 Update libraries commit 0f30ed6c3fdc9a905d86525c9a3e87d94767facd Author: Uwe Schindler Date: Thu Oct 29 16:28:51 2020 +0100 Add NoOpHarvester in preparation to allowing a webservice to push documents manually commit b4127ed43e83eb1a03a819ff7080df2cb9c883c4 Author: Uwe Schindler Date: Wed Aug 26 22:30:53 2020 +0200 At a hack to add "complex attributes" (needed for JSON-LD): Elemnts with name "__AT_xxx" are converted into a JSON-LD attribute named @xxx commit 438e810217f3966ed292511f0fe714cf6447c431 Author: Uwe Schindler Date: Mon Jun 22 10:53:09 2020 +0200 Update forbiddenapis commit 9795351c8583515307002d09af67725995a5bb5e Author: Uwe Schindler Date: Mon Jun 22 10:52:04 2020 +0200 Ignore datestamps on delete requests commit 74dfd935435f5e06beb5b348b3c8bae81917c67b Author: Uwe Schindler Date: Mon Jun 22 10:51:41 2020 +0200 Allow to recover on network error (broken digester instances) commit d3e3bc76183f9b43dc5ee927294540aaeeae9875 Author: Uwe Schindler Date: Thu Feb 6 10:00:05 2020 +0100 Fix download of Ivy commit 2bf3e88eca49c3ed0093fcfabafaa78c19c2ea10 Author: Uwe Schindler Date: Sun Oct 13 11:54:17 2019 +0200 Update forbiddenapis to v2.7 commit 98f74465e7eb2b28e48178bc833240a1e9028eb7 Author: Uwe Schindler Date: Thu Jul 25 17:53:36 2019 +0200 Hack: Return boolean if anything was done during harvesting/rebuilding commit 5cb863a6321d187c157da63db3bdcacbc7ee859c Author: Uwe Schindler Date: Wed May 1 23:55:58 2019 +0200 Move former releases to GitHub commit f0c8cbb4620ee6e1e14cd61dd5becd3e476da2d2 Author: Uwe Schindler Date: Wed May 1 20:57:11 2019 +0200 Move project to GitHub commit 8a7300adf0f192fcb386e916c089d7d196458ea9 Author: Uwe Schindler Date: Tue Mar 19 17:03:44 2019 +0000 Update Elasticsearch to 5.6.16 commit f66648ee6f58a5feee997a676a0f80220280a1af Author: Uwe Schindler Date: Thu Feb 21 08:08:51 2019 +0000 Update Elasticsearch to 5.6.15 commit a10c503622edb3a8fbb4b99783836700991c457e Author: Uwe Schindler Date: Wed Dec 19 11:25:08 2018 +0000 Update Elasticsearch to 5.6.14 commit aa8cab1a632545b1e2f32f9a0638232c8588e415 Author: Uwe Schindler Date: Sun Nov 11 00:25:17 2018 +0000 Update Elasticsearch to 5.6.13 commit e70f10196b2a95c52cdf07b85d01c04a96f36afa Author: Uwe Schindler Date: Sun Sep 23 18:09:38 2018 +0000 Update Elasticsearch to 5.6.12 commit 207053dd807b0bbcedb766a1d7ab542319dbb718 Author: Uwe Schindler Date: Mon Sep 17 11:50:53 2018 +0000 Update forbiddenapis to 2.6 commit f3f742faa91f82e7f8844e4189046ed846e50711 Author: Uwe Schindler Date: Mon Aug 27 11:35:38 2018 +0000 Update Elasticsearch to 5.6.11 commit 9a22fd9e154a14ba4483e65f8121e75bd158f929 Author: Uwe Schindler Date: Thu Apr 19 09:20:12 2018 +0000 Update Elasticsearch commit 1d9a8f3ae252a76270f40c95ee4bb1f848261941 Author: Uwe Schindler Date: Mon Apr 16 13:32:44 2018 +0000 Final version of the XALANJ-2419 fix commit 9c6db848ea019fbb994497aa3625458142505032 Author: Uwe Schindler Date: Sun Apr 15 23:20:26 2018 +0000 More fixes for serializer commit ae463ce955a9cd0737161e60ee02cf8d7c7f82e8 Author: Uwe Schindler Date: Sun Apr 15 21:04:01 2018 +0000 Add a patched version pof serializer.jar so it gets compatible with supplementary characters. Unfortunately XALANJ had no release including XALANJ-2419 bugfix! commit 32a27a7088dd34aa3539f043b58e4bff88379283 Author: Uwe Schindler Date: Thu Mar 1 09:56:33 2018 +0000 Update Elasticsearch to 5.6.8 commit a879bd2de3cfcc1ad62bf75b72b19259185639e4 Author: Uwe Schindler Date: Thu Feb 8 17:16:11 2018 +0000 Update to Elasticsearch 5.6.7 commit 5b4134b84fcfa8b53452af7bd2d8d9bca8054803 Author: Uwe Schindler Date: Mon Jan 22 15:08:53 2018 +0000 Update Elasticsearch to 5.6.6 commit 480fa1e9ce91e4dced0bd8059c2f00eaa18ea952 Author: Uwe Schindler Date: Wed Dec 13 11:32:25 2017 +0000 Small simplification commit 817c33b09258ed425e29bbd231e9747b3f36b6d3 Author: Uwe Schindler Date: Fri Dec 8 17:58:53 2017 +0000 Update Elasticsearch to 5.6.5 commit c2935709445e0e8872a2d9b2fc5f065846c4776e Author: Uwe Schindler Date: Thu Nov 9 13:09:01 2017 +0000 Update Elasticsearch commit 63802ba6efe709c14ddeb975c3e2f5b1c4f79ded Author: Uwe Schindler Date: Thu Oct 5 09:43:00 2017 +0000 Update Elasticsearch to 5.6.2 commit d095cdf099fcd68420001af31594f5534df50eba Author: Uwe Schindler Date: Tue Sep 19 11:28:58 2017 +0000 Update Elasticsearch to 5.6.1 commit fbc455035e0528fa6a9b5381d70375f3511fb979 Author: Uwe Schindler Date: Fri Sep 15 17:23:16 2017 +0000 Update Elasticsearch to 5.6.0 commit 2d676c90b9c012cd51b428d9698c7e8693b023e8 Author: Uwe Schindler Date: Fri Aug 18 13:47:12 2017 +0000 Update Elasticsearch to 5.5.2 commit 4b8ef94946b5f00e0dced0b2ac7d55cbbd9e44ad Author: Uwe Schindler Date: Tue Aug 8 07:35:12 2017 +0000 Use G1GC by default commit c204b044dd88692d2894f1e348e6c784a95eb89a Author: Uwe Schindler Date: Tue Jul 25 16:49:38 2017 +0000 Update to Elasticsearch 5.5.1 commit 57fb4d2adf63160bdcb348767397b379dd368c75 Author: Uwe Schindler Date: Fri Jul 7 22:28:00 2017 +0000 Update Elasticsearch to 5.5.0 commit c668aebf3a6286782c2f1aebfb96b909ef874b8c Author: Uwe Schindler Date: Thu Jun 29 16:59:08 2017 +0000 Restore attribute to json conversion by default (after xmlns fix was applied) commit 2e5713cd86c9aafccf05199e22d458809c7489b2 Author: Uwe Schindler Date: Thu Jun 29 15:55:40 2017 +0000 Ignore more namespaces during JSON attribute marshalling commit 75ef2d3e8ad9b7df7809fc21dda646832324abec Author: Uwe Schindler Date: Thu Jun 29 15:24:42 2017 +0000 Fix attribute handling in XML to JSON converter, disable attribute handling in JSON type by default commit dfbffc75f49e4d9620af5cb4f1de31b32a427bf0 Author: Uwe Schindler Date: Wed Jun 28 09:59:13 2017 +0000 Update Elasticsearch to 5.4.3 commit 39cb256bec2b40266c0c95401b8718eb04f25267 Author: Uwe Schindler Date: Wed Jun 21 17:02:53 2017 +0000 Update to Elasticsearch 5.4.2 commit 08137d8f87d75dfaf4639b971605e96eb2231a7b Author: Uwe Schindler Date: Mon Jun 19 17:11:26 2017 +0000 Simplify date parsing commit 97768c97276aa42a1ef81297a0f3f3213f5c8f74 Author: Uwe Schindler Date: Sun Jun 18 12:22:28 2017 +0000 Change ISO year representation to be compatible to SimpleDateFormat commit 05174c0e220affce3fce639835161327c3f9add2 Author: Uwe Schindler Date: Sat Jun 17 10:21:47 2017 +0000 fix NPE in OAI harvester commit 885d9f24581a46e4b3b451022f64bc2226a9012d Author: Uwe Schindler Date: Sat Jun 17 00:52:41 2017 +0000 Fix Javadocs commit 5edaa9f1193182527eba371d11d03b43c429db62 Author: Uwe Schindler Date: Sat Jun 17 00:51:28 2017 +0000 Update panFMP to use Java 8's java.time classes (mainly instants for datestamps). commit 62ef21b10b0e7489d0a43a06e7cd05b322a8e14f Author: Uwe Schindler Date: Sun Jun 4 12:07:33 2017 +0000 Update to Elasticsearch 5.4.1 commit e1a6a1bdb4644accae456ec35e7cdc5e601141a6 Author: Uwe Schindler Date: Thu May 4 19:47:35 2017 +0000 Cleanup Exception handling commit 00fda0dd2ac9863f0252f0b2b0613dfef0fee174 Author: Uwe Schindler Date: Thu May 4 16:50:32 2017 +0000 Update Ivy version commit 8bb8041a1303dce60992cefbeb560703a106be05 Author: Uwe Schindler Date: Thu May 4 16:43:50 2017 +0000 Use a lambda here commit e97cbb53ceed9ec336cce31bca521e02e3dea313 Author: Uwe Schindler Date: Fri Apr 28 18:22:13 2017 +0000 Update to Elasticsearch 5.3.2 commit 7f23fc2339205750e3703541f8c8ba86608b4559 Author: Uwe Schindler Date: Sat Apr 22 12:34:57 2017 +0000 Update to Elasticsearch 5.3.1 commit 49c86913cb6092708d690aa4ae07d95325a99e1a Author: Uwe Schindler Date: Fri Mar 31 14:27:58 2017 +0000 Update to Elasticsearch 5.3.0 commit 3e5b5b20740f289fbbab5291749163091a28b0b6 Author: Uwe Schindler Date: Tue Mar 28 08:00:36 2017 +0000 Add support for boolean datatype commit fa89b3dc244dd3d592c318e717ecf46cffd88b0e Author: Uwe Schindler Date: Thu Mar 2 16:41:50 2017 +0000 Update Elasticsearch to 5.2.2 commit 1b1cadc978600961a6766ac64b651416b6b34b27 Author: Uwe Schindler Date: Wed Feb 15 11:17:23 2017 +0000 Update Elasticsearch to 5.2.1 commit 5514e3e8cbc8378ed54d01fe1ff9deeefabee979 Author: Uwe Schindler Date: Wed Feb 15 10:26:51 2017 +0000 Update forbiddenapis commit 414a05730e8ad64837be9beb1ea12869a5c2de07 Author: Uwe Schindler Date: Fri Feb 10 13:09:43 2017 +0000 Remove useless check commit fe87352a1e849691f2ee2ba5cf8620596708e4dd Author: Uwe Schindler Date: Fri Feb 10 09:41:29 2017 +0000 Small API change using helper method commit 6f25eed5cc0d3234032c956cc10128cedb7f307f Author: Uwe Schindler Date: Wed Feb 8 22:09:24 2017 +0000 Disable debug logging commit a0319060005174b965e5b56b2c26d377c3ebf2cd Author: Uwe Schindler Date: Wed Feb 8 18:56:28 2017 +0000 Remove obsolete harvester property commit 586c34c228b2bdd6af26756c62373fdbae61b30d Author: Uwe Schindler Date: Wed Feb 8 18:11:45 2017 +0000 Update to Elasticsearch 5.2.0: - Java 8 - Remove custom delete by query for identifiers - Update to Log4j v2 - Update configs for new data types - Use sourceAsMap and filtering instead of stored fields commit 69ed279e9837f1c0d4c5772f76301ec727cf1e23 Author: Uwe Schindler Date: Sat Jan 14 23:17:33 2017 +0000 Update Elasticsearch to 2.4.4 commit e06ddeeee7332385ac29f747d9eccce1b84e3e2e Author: Uwe Schindler Date: Fri Dec 16 18:50:05 2016 +0000 Update ES to 2.4.3 commit d15f317e033a8f48a5b44701372204c9285aee4e Author: Uwe Schindler Date: Sat Nov 26 23:26:56 2016 +0000 Update ES to 2.4.2 commit 24eb69e330afe283bf26d2cdb2bd50b106b8420e Author: Uwe Schindler Date: Thu Sep 29 17:22:21 2016 +0000 Update Elasticsearch to 2.4.1 commit 2225a54bb1db0d0a14bff4a9784c11b243f98557 Author: Uwe Schindler Date: Thu Sep 1 17:15:56 2016 +0000 Update Elasticsearch to 2.4.0 commit 2cd5c054a7d5c49ea7809c456776f1ccece91893 Author: Uwe Schindler Date: Thu Aug 4 21:50:57 2016 +0000 Update to ES 2.3.5 commit d6949db1b864996422bd92a09b8e3708709763e7 Author: Uwe Schindler Date: Sun Jul 10 11:45:09 2016 +0000 Update Elasticsearch & forbiddenapis commit 12489de3ff6570cdb29d74053174c90a72d3f4a9 Author: Uwe Schindler Date: Wed Jun 29 18:26:58 2016 +0000 Add hook for rebuilding commit 3b6de08b6b274908a8a584d5795f67b9824a4e49 Author: Uwe Schindler Date: Wed Jun 29 17:33:57 2016 +0000 Fix bug when adding multiple kv-pairs commit 571fc66e8df3a9db7b8af55735fe7a383f8803a9 Author: Uwe Schindler Date: Wed May 18 23:03:58 2016 +0000 Elasticsearch version 2.3.3 commit bb82046b7e2b0705d87a530e0bc097dafc9d6abe Author: Uwe Schindler Date: Tue Apr 26 14:05:23 2016 +0000 Update ES to 2.3.2 commit a42b73f23d3856eb5a5465b00313cf138c61ddda Author: Uwe Schindler Date: Tue Apr 5 07:43:28 2016 +0000 Update ES to 2.3.1 commit beb2bf4f46de9bf91eaacd72874ae945d0eae6f9 Author: Uwe Schindler Date: Wed Mar 30 21:42:44 2016 +0000 Update ES to 2.3.0 commit b573be4630720817279d10fc123d2551ad92db12 Author: Uwe Schindler Date: Tue Mar 15 23:21:18 2016 +0000 Update ES to 2.2.1 commit 79e6fed3a71515e059390c750951d389bb696f5a Author: Uwe Schindler Date: Wed Feb 3 10:12:44 2016 +0000 Update Elasticsearch to 2.2.0 commit f28fcd1fe9fa88a824d0ef060d91153726900673 Author: Uwe Schindler Date: Tue Jan 12 10:01:40 2016 +0000 Fix missing property check commit a5d083f8ed5a42ca4deaba0febbe09848d5dfea0 Author: Uwe Schindler Date: Sun Dec 20 23:54:25 2015 +0000 Fix typo commit 0f0aefbcb0d96a2cf659989e995c1c64f3592f0d Author: Uwe Schindler Date: Sun Dec 20 23:47:27 2015 +0000 Refactoring, hide privat set commit a7a7f9d688d81cec042b0f742db34b5be3a8b875 Author: Uwe Schindler Date: Sun Dec 20 23:34:13 2015 +0000 On full OAI harvesting by default track valid identifiers, too commit e1b0be6dbd7b68f706f1960a19ad6e01c21cad84 Author: Uwe Schindler Date: Sun Dec 20 22:16:04 2015 +0000 Log and fail on bulk execution errors commit 74c8295047c388342fc76cefd40c1d78b233c629 Author: Uwe Schindler Date: Sun Dec 20 19:10:49 2015 +0000 Improve logging and make shutdown of DocumentProcessor more safe commit 7adb3d4c9e59b534e8f8f7c9757f2fa10213689a Author: Uwe Schindler Date: Sun Dec 20 18:37:23 2015 +0000 Minor cleanups commit a04a6e58a61ca4ae7a93748255cbfaa721eeed79 Author: Uwe Schindler Date: Sat Dec 19 10:44:30 2015 +0000 Update Elasticsearch to 2.1.1 commit fff5fef2c7f5e9c009ef136980a2b50e488acedb Author: Uwe Schindler Date: Wed Dec 2 18:31:20 2015 +0000 Wait infinite for uncompleted bulk requests (bug in ES) commit acddaedf595d702c09c5a42863ccef37b9651599 Author: Uwe Schindler Date: Wed Dec 2 17:12:35 2015 +0000 Improve thread pool rejection (don't ignore executions after shutdown) commit b142740dc5c91a10205b168cbd6916d0056e5552 Author: Uwe Schindler Date: Wed Nov 25 14:53:20 2015 +0000 Remove scrolling commit 82f88a2e1c8a38ef632e2098de6f9df6d550a05e Author: Uwe Schindler Date: Tue Nov 24 22:23:55 2015 +0000 Update Elasticsearch to 2.1.0 commit a96c26627365993dc90f108ef8d9f14bf0cd24ae Author: Uwe Schindler Date: Fri Oct 30 22:45:42 2015 +0000 Switch dev version to 2.1 commit d411ef5e08a869bdd2676aa854afe391e39dd11e Merge: 177897b bbf5c0c Author: Uwe Schindler Date: Fri Oct 30 22:45:14 2015 +0000 Update to Elasticsearch 2.0 commit bbf5c0c1e3612f70937759566f46f40a46f74a0e Merge: 989a8fa 177897b Author: Uwe Schindler Date: Fri Oct 30 22:42:06 2015 +0000 Merge trunk commit 989a8fa2fa16575aac9ca8af99a827b989ea2cd1 Author: Uwe Schindler Date: Thu Oct 29 21:20:46 2015 +0000 Elasticsearch 2.0.0 final release commit 177897b25e9660e0d2f2697e0086cf1ae2bbc9c3 Author: Uwe Schindler Date: Thu Oct 15 21:05:12 2015 +0000 Update elasticsearch commit db38bbfc9141c6d2bc07ab886b14519bad299216 Merge: 3529b21 c3d04a4 Author: Uwe Schindler Date: Wed Oct 14 15:55:31 2015 +0000 Merge trunk commit c3d04a4f42ab386b6df284c8d3761b96b5eae25c Author: Uwe Schindler Date: Wed Oct 14 15:53:45 2015 +0000 Update forbidden-apis to 2.0 commit 3529b21fccfbb2c7bd482e81e36954b0df37b8d4 Author: Uwe Schindler Date: Wed Oct 14 15:53:01 2015 +0000 Try ES 2.0 RC1 -> works commit cce9e593d4795b726d321ec0d77950de27f35be2 Author: Uwe Schindler Date: Fri Sep 25 12:45:48 2015 +0000 Update to beta2 commit 63dc5e41d27bc4a64115cc0dc7e5db2af6a0830b Author: Uwe Schindler Date: Wed Sep 16 10:25:51 2015 +0000 Update Elasticsearch to 1.7.2 commit 3f163a6f71f4f5799f1f16a8fabde696d1aaec20 Author: Uwe Schindler Date: Fri Sep 4 16:39:33 2015 +0000 Update Javadocs commit d41b1342ab9146078d32147350c984ac31b10181 Author: Uwe Schindler Date: Fri Sep 4 16:30:44 2015 +0000 Add a temporary workaround for https://github.com/elastic/elasticsearch/issues/13155 commit dc01f50292618903595800931f2dddd5f9cf5607 Author: Uwe Schindler Date: Fri Sep 4 16:27:35 2015 +0000 Fix default unit size parsing bug; Remove default analyzer from mapping, it must now be configured in the index settings, named "default" commit b1ece2047974bff52510d59b6678608a73d0d3c7 Author: Uwe Schindler Date: Fri Sep 4 15:55:39 2015 +0000 Fix ivy downloading commit a6fc6ca29fbcd591a807f4eae0a6f17dd52d8590 Author: Uwe Schindler Date: Fri Sep 4 15:22:20 2015 +0000 Initial update, completely untested commit bb9b34ddfa899be971f5af7e1533da47a8b85a7c Author: Uwe Schindler Date: Fri Sep 4 14:56:19 2015 +0000 Start branch for Update to Elasticsearch 2.0 commit 7ca40189b6afee6c0ffac2e581e09a74add6e910 Author: Uwe Schindler Date: Sat Aug 1 09:11:50 2015 +0000 Update Elasticsearch to 1.7.1 commit 03fad34be6b27c3d838ec2c27dd5418cf34a9b70 Author: Uwe Schindler Date: Sat Jul 25 22:51:27 2015 +0000 Update Elasticsearch to 1.7.0 commit cf0f83d96284c7d8969e85c07f2e3e42b57ca944 Author: Uwe Schindler Date: Thu Jun 11 10:13:26 2015 +0000 Update elasticsearch version commit 920bcecfd2556a31e58ff2ca708e95ae58e6e9ed Author: Uwe Schindler Date: Wed May 27 11:50:19 2015 +0000 Define STring output encoding as UTF-16 to work around surrogate bug in XALAN/XERCES serializer.jar commit 67314fb0ef796878084bb102c8131a501f2ad6c3 Author: Uwe Schindler Date: Sat May 16 15:20:28 2015 +0000 Cleanups, also make relative file URIs by resolving using URI class (makes encoding + forward slashes) commit b7f816f66105b23579ed462497545bf716860984 Author: Uwe Schindler Date: Sat May 16 14:27:01 2015 +0000 fix typo in attribute commit 7ecb1cb58911dd5b72ae8fb5558194172d67508d Author: Uwe Schindler Date: Sat May 16 14:24:33 2015 +0000 Add java.io.File* signatures to forbidden-apis commit 7dadefe139f1dfbc2df26ce8cf06aed5ea8562da Author: Uwe Schindler Date: Sat May 16 13:50:02 2015 +0000 Migrate to NIO.2 commit 2cdf277547153e83773b50d825c0ef3fc4110a0f Author: Uwe Schindler Date: Fri May 15 17:43:33 2015 +0000 Fix previous commit commit 5d1441053c5a57b9fa1bd03b72855781e945fcda Author: Uwe Schindler Date: Fri May 15 17:36:42 2015 +0000 Print processed document status only when pool was running commit 5d8551501ef64ccccfe77d3abd6eac0f7ddac51d Author: Uwe Schindler Date: Fri May 15 16:39:54 2015 +0000 Add a method to delete documents in Harvester subclasses commit b235a354e1086fc2903e0e4ed79ad45d28fe225d Author: Uwe Schindler Date: Fri May 15 13:44:15 2015 +0000 Remove non-bulk processDocument in favour of accessor to get Elasticsearch ActionRequest commit 51c7e4861d6db0fc52018a0365574c4406f1f7fd Author: Uwe Schindler Date: Fri May 15 11:31:11 2015 +0000 Remove useless logging in direct requests commit b1233decbc4d02de592162804c727c0307117104 Author: Uwe Schindler Date: Fri May 15 10:50:49 2015 +0000 Remove useless flush call commit b8dfbbbe37a8ac3c683f779ef4f3629d3e2a5974 Author: Uwe Schindler Date: Fri May 15 10:22:44 2015 +0000 Use actionGet() instead Future.get() commit 0a83db9364b30255d341190d2e261a738685f91c Author: Uwe Schindler Date: Fri May 15 10:10:40 2015 +0000 Cleanups with indexing and RequestBuilders commit 26e4e85e155cbf1f3eecb4dc8cebd60a6a35a0d8 Author: Uwe Schindler Date: Thu May 14 17:46:28 2015 +0000 Rewrite DocumentProcessor: - no document queues anymore, just use a fixed thread pool and a bounded task queue - use BulkProcessor - add configuration to allow sending multiple bulk requests to Elasticsearch commit 1564eaf7c50e389d18ad4eb74d032471d4ec3096 Author: Uwe Schindler Date: Thu May 14 12:32:01 2015 +0000 Fix forbidden-apis commit 53c9222b72a14a23caacb9ff54be02d3283f3795 Author: Uwe Schindler Date: Thu May 14 12:31:10 2015 +0000 Fix cleanup bug commit 1f505ced9af534d1a86c9339442c548a5c410049 Author: Uwe Schindler Date: Thu May 14 12:30:17 2015 +0000 First version using Elasticsearch's BulkProcessor commit 47608151ca05068d3a985546a4f88331faa26114 Author: Uwe Schindler Date: Thu May 14 10:54:42 2015 +0000 Cleanups commit d1fd4fd9b77f1a9f2890d59d5e03958b2f586289 Author: Uwe Schindler Date: Wed May 13 22:50:02 2015 +0000 On close, if something failed, clean up the mdocBuffer, so we can for sure enqueue the final EOF docs commit ac943cca5ea62f1e703c2677b9c3b704ca091a49 Author: Uwe Schindler Date: Wed May 13 22:21:09 2015 +0000 Change shutdown algorithm commit 0a57c795b3636921b9275cd92275f834681c442f Author: Uwe Schindler Date: Wed May 13 21:58:03 2015 +0000 Don't use submit if no Future is needed commit d7f5a27e49ebd6f450096b329412f0cb9afbff7e Author: Uwe Schindler Date: Wed May 13 17:40:11 2015 +0000 Use an ExecutorService as threadpool commit 3bc0b9168e07a9bd76b8b1597ccddba167ee2755 Author: Uwe Schindler Date: Wed May 13 16:39:09 2015 +0000 Some refactoring, also make thread counting safe commit a46324d6d245ce252a18cf0665967baff9dcb06f Author: Uwe Schindler Date: Wed May 13 13:43:19 2015 +0000 Remove no longer used CommitEvent.java commit 1d27dc7fbbc8a5a4b51da9d9ed35878fdcf30d50 Author: Uwe Schindler Date: Fri May 8 12:39:08 2015 +0000 Make content type for _source field configureable, switch to CBOR instead of JSON (by default) commit aad417fd0ef9e44ae3c9db09f76caac347f55fcf Author: Uwe Schindler Date: Tue Apr 28 23:53:40 2015 +0000 Update Elasticsearch to 1.5.2 and forbiddenapis to 1.8 commit fabfd3b96328d66e180d9a74a02da060d6978ef6 Author: Uwe Schindler Date: Fri Apr 10 09:46:09 2015 +0000 Update Elasticsearch to 1.5.1 commit 138aa402cd82ebb23a46b45990edc1b9953756a5 Author: Uwe Schindler Date: Tue Mar 24 12:06:56 2015 +0000 Rewrite delete by query to do a scanning search, that issues a bulk delete for each found item. This removed bugs in clouds, where shards may get unconsistent results (see https://github.com/elastic/elasticsearch/pull/10082) commit 71dc9fc9c00d401b4610f479798ceabb1b528b15 Author: Uwe Schindler Date: Tue Mar 24 12:04:52 2015 +0000 Update to ElasticSearch 1.5. Fix one deprecation. TODO: deleteByQuery is also deprecated commit b5d7d342e9025e29a7f9ea409751f900a7897120 Author: Uwe Schindler Date: Fri Feb 20 16:52:29 2015 +0000 Update elasticsearch to 1.4.4 commit 26a752866bfc69264f888b59f1f3f1002caa1f30 Author: Uwe Schindler Date: Thu Feb 12 21:45:27 2015 +0000 Update Elasticsearch commit 2c7a445afb9cef7a374c27ccd04035e90c11997f Author: Uwe Schindler Date: Thu Jan 29 10:32:06 2015 +0000 Simplify thread-local SimpleCookieHandler (use Java 6+ impl, per thread) commit 2042b1ca2ab7927c0e7a4dc0f3b76c5fc05ef226 Author: Uwe Schindler Date: Thu Jan 1 16:55:58 2015 +0000 Prevent creation of synthetic accessor methods (by using pkg-private access modifier) commit f2fa967502e1fcc66b0d3903b0820a7b8ea5c09f Author: Uwe Schindler Date: Sun Dec 28 13:58:18 2014 +0000 Rename rule to have better suited name commit d95a003fa3889e771dbc227f0c7d68be456dad14 Author: Uwe Schindler Date: Sun Dec 28 13:56:13 2014 +0000 Whitespace fix commit d44b01de4d92bda9100b6104549ea0b34278e62a Author: Uwe Schindler Date: Sun Dec 28 13:55:47 2014 +0000 Fix documentation commit 25247b5a19144d4af2934b2d12590bfc197777be Author: Uwe Schindler Date: Sun Dec 28 13:47:59 2014 +0000 Fix bug with settings and their prefix. Add new Digester rule to only populate with element name (instead whole path), so no filtering is needed commit aee5e565c49dc67123aac9ec8e02dab3e6aad13d Author: Uwe Schindler Date: Sun Dec 28 12:55:55 2014 +0000 Allow index settings parsed from file commit 6fd3fd6fdeee2eec0ed2a6a0bb07ae963a057157 Author: Uwe Schindler Date: Wed Dec 17 09:07:07 2014 +0000 Update ES commit d7ece44c93d0d9fa2a53f0918545270eaaa9bd9f Author: Uwe Schindler Date: Fri Nov 28 17:29:04 2014 +0000 add support for plugins commit 662ca67860cdcab7ca663516987b3b9af3793b8a Author: Uwe Schindler Date: Fri Nov 28 16:49:53 2014 +0000 update elasticsearch commit 2a5620be1d4baec73616c396334421c995a1dad0 Author: Uwe Schindler Date: Fri Nov 28 13:14:21 2014 +0000 Update forbidden apis again commit 5d5a1591644299912d3d83c1520e913f59acd9c1 Author: Uwe Schindler Date: Fri Nov 28 13:12:00 2014 +0000 Update forbidden APIS commit 5962007b85612ce0f65a5710f3640a1f0df1d087 Author: Uwe Schindler Date: Sun Nov 16 17:41:15 2014 +0000 Make scroll requests more safe commit 4757acc272ed90c6a1fd1eb6c724b2c29933555c Author: Uwe Schindler Date: Wed Nov 12 20:29:05 2014 +0000 Fix bug in serializing KeyValuePairs: Single items of type KeyValuePairs were not handled correctly commit 3f74d3fa9a16d3a832bcd351748a631c1abdf223 Author: Uwe Schindler Date: Wed Nov 5 18:45:57 2014 +0000 Update ES commit bd904ec990e9a8f366317dc6496491d76040ab75 Author: Uwe Schindler Date: Mon Oct 27 18:24:00 2014 +0000 Allow to set a maximum bulk size (in number of bytes of the JSON document sources) commit 5f6cc4e5d5dc2b6543ef09163ccf9171b9c88d65 Author: Uwe Schindler Date: Wed Oct 15 01:00:36 2014 +0000 Use JAXB to unmarshal XML->JSON nodes (to preserve type) commit 99e333fd3428f973654975973555421711156aac Author: Uwe Schindler Date: Tue Oct 14 22:51:42 2014 +0000 Add support for INTEGER values (mapped internally to Long). This allows safety when rounding might occur. commit 31f3d18e7cd4cec7fc59b0938cedf5a74c8d0c18 Author: Uwe Schindler Date: Wed Oct 8 20:49:21 2014 +0000 Fix XHTML bug commit 186889f8f2daaf9eda7a521e66c0d1baba2b6d96 Author: Uwe Schindler Date: Tue Oct 7 12:56:49 2014 +0000 Don't record identifiers from deleted documents in OAIStaticRepository. Don't use valid identifiers if no change commit 7b723aba78ffda6c6a0c99c551415a155244c826 Author: Uwe Schindler Date: Tue Oct 7 12:52:44 2014 +0000 Allow to ignore datestamps in OAI-PMH. If datestamps are ignroed, it never does incremental harvesting. This allows to do a complete re-harvest. commit 5f50acbbd36720f302086ab78c8a8ae597d55667 Author: Uwe Schindler Date: Tue Oct 7 10:35:53 2014 +0000 Don't resolve URLs / dirs in ctor commit 126ce1128c2873108c36935e6d7bcff59c72f62a Author: Uwe Schindler Date: Tue Oct 7 09:45:59 2014 +0000 Make config parsing in harvesters up to ctor, make fields final; add identifierPrefix for OAI commit 8a95b8fd235adc1628521fd39582cc0328a8c540 Author: Uwe Schindler Date: Wed Oct 1 14:18:52 2014 +0000 Update Elasticsearch commit d296b1ffc388f78e895f3c9eaf75adc8950ce236 Author: Uwe Schindler Date: Tue Sep 30 21:52:48 2014 +0000 formatting cleanup commit 2496584c1d076ce7b2a822fafd7e6015e72ca1ed Author: Uwe Schindler Date: Tue Sep 30 19:44:03 2014 +0000 Allow more Exceptions on addDocument() commit 54489a9b851a13b164b7cfa64626ef08dbc1cd82 Author: Uwe Schindler Date: Tue Sep 30 14:54:25 2014 +0000 Update count on direct pushing of docs commit ad284e9bf94cc219a69133beab052bbac28f53fe Author: Uwe Schindler Date: Tue Sep 30 13:36:19 2014 +0000 Small update in logging commit 03489a48bec16ceecefd33a5f454e3b4d0cf98ae Author: Uwe Schindler Date: Mon Sep 29 18:12:40 2014 +0000 Update Elasticsearch commit 5e0ebe744bb64f2787132097e44fd8f06cc5810d Author: Uwe Schindler Date: Thu Sep 11 17:38:34 2014 +0000 Fix bug with xml fields commit f927c52e9930169dfaa6a0ecfd199a2afb43f606 Author: Uwe Schindler Date: Thu Sep 11 10:28:47 2014 +0000 Fix warnings commit 3693f44251b4652b99b4f70dc40917f39d4136e3 Author: Uwe Schindler Date: Wed Aug 27 15:45:01 2014 +0000 Update nekohtml commit 3146098c3ab7c4485b4ffc923fcc83859e4fbc12 Author: Uwe Schindler Date: Tue Aug 26 14:04:57 2014 +0000 Fix mapping of "src" property commit d38c60a0bfd558416c40620bb1c0161251547239 Author: Uwe Schindler Date: Tue Aug 26 13:41:33 2014 +0000 Allow src= also for commit 96b1330bffdd2a366a67e162d0f70eb77ce2a17a Author: Uwe Schindler Date: Thu Aug 14 08:23:38 2014 +0000 Update XALAN, finally :-) commit e6acf3644c1e0c3d0d02c7e85c96d556ea085e17 Author: Uwe Schindler Date: Thu Aug 14 07:52:06 2014 +0000 Update ES commit e759e77b6dd93bd1430364eb3e167247525f5d12 Author: Uwe Schindler Date: Mon Jul 28 22:16:18 2014 +0000 Update ES. commit 62bacb4c95c0fb59ec51a5e78802f628dd1476cc Author: Uwe Schindler Date: Thu Jul 24 21:26:36 2014 +0000 Update to ES 1.3.0 commit 7f901ec5febcbc739b71d27884b66e653222b18e Author: Uwe Schindler Date: Wed Jul 9 20:34:09 2014 +0000 Update Elasticsearch commit 8980e6c3c1b2b6c5b59410207352e691c7280de3 Author: Uwe Schindler Date: Thu Jun 19 16:39:47 2014 +0000 split properties for tools, reformat build.xml commit d92f7e052457a5d3aa7d5ef4a824758d2b82379d Author: Uwe Schindler Date: Thu Jun 19 09:40:38 2014 +0000 change constant commit 3aaee2094ff3c11d03f146792ffc35fcff55c044 Author: Uwe Schindler Date: Wed Jun 18 17:31:37 2014 +0000 After createIndex wait for cluster to get yellow. commit bbca5a9e9e911ab901a19a71ff80d2407b56ba58 Author: Uwe Schindler Date: Tue Jun 17 23:17:21 2014 +0000 Small fixes. Initially create index with right realName, not alternate. Cleanup Exceptions. commit 2c65bd3eadcb7a21c64188f7776f423148db4154 Author: Uwe Schindler Date: Mon Jun 16 18:02:58 2014 +0000 Add support to remove aliases, not found in config commit 3569f93f2825fe270c2b9466d6786115e0f5adcb Author: Uwe Schindler Date: Mon Jun 16 17:17:02 2014 +0000 Add support for aliases commit 7e9df7b0dedc954af61293994a29902cf342adff Author: Uwe Schindler Date: Mon Jun 16 14:52:32 2014 +0000 Fix typo commit a38f3286d41ebd8cb7cef5f69ff65b9486a39943 Author: Uwe Schindler Date: Mon Jun 16 14:41:31 2014 +0000 Make configs more immutable, remove null map commit 77068b3424909fa6b783f1eabc2ba80a17b5a49e Author: Uwe Schindler Date: Mon Jun 16 14:26:52 2014 +0000 Move to Java 7 diamond commit decce6272ec8b8fee981ff572e579c0e0ed2b253 Author: Uwe Schindler Date: Mon Jun 16 14:18:04 2014 +0000 Code cleanup by Eclipse commit d9273366f6eab65d89a335e5655c40208fe04ebe Author: Uwe Schindler Date: Mon Jun 16 14:13:14 2014 +0000 Factor out code to get aliased index commit 42c99fe8992fa150cc6ab835a208a55d38cc3b5f Author: Uwe Schindler Date: Mon Jun 16 13:44:11 2014 +0000 Cleanups commit f5680ef2c1011eb5cfb6f3f027b8f5027d38f133 Author: Uwe Schindler Date: Mon Jun 16 12:35:31 2014 +0000 direct delegation to correct method commit dc1f6d0cfa460cf6a429eb7fccb54fd56444076c Author: Uwe Schindler Date: Mon Jun 16 12:32:44 2014 +0000 Small refactoring commit c7a208f6a471494c6c2ed9b9d957426de5266037 Author: Uwe Schindler Date: Mon Jun 16 11:30:08 2014 +0000 efactor config, move Elasticsearch stuff out of TargetIndexConfig commit 345dc0dcc7008d53792a538385c897a403bb427c Author: Uwe Schindler Date: Mon Jun 16 09:46:48 2014 +0000 More UTF-8 changes commit 9bc2cf38a828c3f7cc8048729f728f44f3e1883e Author: Uwe Schindler Date: Mon Jun 16 09:44:20 2014 +0000 Use UTF8 in XML. commit d547ad3346f908c15c09447f62fdddeb79c830c7 Author: Uwe Schindler Date: Mon Jun 16 09:43:30 2014 +0000 Revert DIF schema removal. Add DIF schema as example file locally (license???) commit a649a6e2d822ffc413e064b65dda9d4fbbfd27bd Author: Uwe Schindler Date: Mon Jun 16 08:51:57 2014 +0000 Support for index settings and configuration of alternate names. commit e8fe4df46349b57b9e5132fcdb9841fcf45f7720 Author: Uwe Schindler Date: Sun Jun 15 23:32:02 2014 +0000 Cleanup formatting commit b32037c614ca9436e3bcd4c9971d4fbb42801e6b Author: Uwe Schindler Date: Sun Jun 15 23:21:40 2014 +0000 Cleanup formatting commit 701f0f9de43cdb786aa1ab20762be3b946e17d1c Author: Uwe Schindler Date: Sun Jun 15 23:02:10 2014 +0000 Merge mappings before sending to ES, send mapping with index create commit bf9315ab5bf321eba19be0694c796a90b278d746 Author: Uwe Schindler Date: Sun Jun 15 16:07:48 2014 +0000 Fix NPE when upgrading from older versions commit f75d5784db420e3a2b4bcc4c7d3bc6e740c0b425 Author: Uwe Schindler Date: Sun Jun 15 15:43:34 2014 +0000 Only allow targetIndex ids for rebuilder (we cannot rebuild parts of an elasticsearch index, would cause data loss) commit edcac0125bfa65af7fbd7090807fffa43e8da78b Author: Uwe Schindler Date: Sun Jun 15 10:22:49 2014 +0000 Refactor harvester metadata (for now always save it, if empty, too) commit cc79c4497f81260c2401a28b4d3b57adcaaeb6d3 Author: Uwe Schindler Date: Sun Jun 15 09:23:05 2014 +0000 Remove the waiting for clusterstate, for now just leave flush to work agains ES bug. Also change the metadata to be non-stored, but with _source enabled commit 5d3f80634e549c02a83c4cd64e6e9a5669a36352 Author: Uwe Schindler Date: Sat Jun 14 23:08:41 2014 +0000 Add some cluster state checks... TODO: Investigate commit 657369fa72029b8464e9a6eb286e773d259f5067 Author: Uwe Schindler Date: Sat Jun 14 21:52:06 2014 +0000 Refactor target index creation out of DocumentProcessor and make rebuilder create new index. This is done by automatically creating aliases. commit faa3968faaefe1db6a7651c38c9d3a1125e03604 Author: Uwe Schindler Date: Fri Jun 13 18:34:37 2014 +0000 improve default mapping commit feb4230fcb4087f49256d024458b5f3694be256c Author: Uwe Schindler Date: Tue Jun 3 22:28:24 2014 +0000 Update Elasticsearch commit 8d7f040ef1d2b89f30f4ceb742a77111ce33e01d Author: Uwe Schindler Date: Thu May 22 18:08:17 2014 +0000 Update to Elasticsearch 1.2.0 commit da8f7357dac4fcceda04b0fac20df986b8af4586 Author: Uwe Schindler Date: Sat May 10 17:44:08 2014 +0000 Fix logging directory commit 531dad96ed81b1b59e32700ca987c58d0f65a53f Author: Uwe Schindler Date: Wed Apr 30 13:10:42 2014 +0000 Change log message commit ed6d02083bec071c8ae404fb0ff7a73ba7d269af Author: Uwe Schindler Date: Tue Apr 29 23:10:48 2014 +0000 Simplify the default mapping with dynamic fields. Disable "_all". Also put provided mapping before defining internal types. commit a3c43c02e220ec21a6fac1e1552a0215166bb90b Author: Uwe Schindler Date: Tue Apr 29 22:35:30 2014 +0000 Remove "include_in_all" from default mapping commit 984ee2122bc562676a7c89d5da6b075d1ebe816e Author: Uwe Schindler Date: Tue Apr 29 14:52:08 2014 +0000 Rename class and improve Javadocs commit 75fe131fea205a10dad9855e64b319236ad91be1 Author: Uwe Schindler Date: Tue Apr 29 12:03:12 2014 +0000 Use a KeyValuePairs object before building JSON to handle duplicate field names. This also improves the XML converter. commit 250f062cdc39c10dee51b51b3e15e45089e4ea9f Author: Uwe Schindler Date: Thu Apr 24 15:13:15 2014 +0000 Add forbidden-api checks commit 7c3d7999da8521f86975442f9bd4536485ad87e3 Author: Uwe Schindler Date: Thu Apr 24 15:00:25 2014 +0000 Some charset cleanup commit dcec4a83d694b15eb1e5a3e52a77ec477ef4e78a Author: Uwe Schindler Date: Thu Apr 24 14:52:20 2014 +0000 Fix some smaller problems commit bb40158741b953f5ea8ab4ea78204649d18c2d60 Author: Uwe Schindler Date: Thu Apr 24 13:35:40 2014 +0000 fix typo commit ab39872213d6af551d89e44db669e3a366ced972 Author: Uwe Schindler Date: Thu Apr 24 10:26:42 2014 +0000 Update scripts to actually work with the binary distribution commit f966fcbb9dd975caf7d3c8a39cbc71c598f9418c Author: Uwe Schindler Date: Wed Apr 23 18:42:03 2014 +0000 Cleanup line endings #2 commit b574c5258cd05d89cbe5ecb92e2411b3bf42eb50 Author: Uwe Schindler Date: Wed Apr 23 18:40:09 2014 +0000 Cleanup line endings commit ceeeec340ee0d08639cffb3b5a0d06d3f6647eac Author: Uwe Schindler Date: Wed Apr 23 18:36:31 2014 +0000 Remove old scripts, update other ones (preliminary, does not yet work) commit 00c29621046f1c6293e6064d51639a50633ed22e Author: Uwe Schindler Date: Wed Apr 23 18:33:03 2014 +0000 Remove useless example folder commit 830e59f8d9aec0fcfcc8234e83ae475768d905cb Merge: 3afaa77 e7f60e0 Author: Uwe Schindler Date: Wed Apr 23 15:12:24 2014 +0000 Merge ES branch into trunk commit e7f60e0e19aaf6a178a0c814f293f98be3c22067 Merge: f66c566 3afaa77 Author: Uwe Schindler Date: Wed Apr 23 14:41:35 2014 +0000 Merged revision(s) 552 from main/trunk: add release signing commit 3afaa77748f393b43dc3222df8b9afdb35fac1cd Author: Uwe Schindler Date: Wed Apr 23 14:36:56 2014 +0000 add release signing commit f66c566af0af747a9a0600495511dfc81048a4fc Author: Uwe Schindler Date: Tue Apr 22 09:18:02 2014 +0000 Improve schema and mapping for more example facets commit 8955aa3f9e75118b3e03f33a07f7a554f3f623bc Author: Uwe Schindler Date: Tue Apr 22 08:57:51 2014 +0000 Fix mapping.json commit a020a27854876ceb007b93735d50d798b2aedcce Author: Uwe Schindler Date: Tue Apr 22 08:52:39 2014 +0000 Update ES, supply default mapping for example config commit 45b438ecf2bed2a2f50d25ba90d6aa8d2bbaab44 Author: Uwe Schindler Date: Wed Apr 9 17:22:22 2014 +0000 Update Javadocs commit 3635ad38a7353bd5d389028dbdf7a7947128d885 Author: Uwe Schindler Date: Wed Apr 9 17:12:42 2014 +0000 Fix javadocs commit e352deea3b12f96e21e6a0c4495b3d171fc45d7d Author: Uwe Schindler Date: Wed Apr 9 17:08:34 2014 +0000 Rename field commit d77930f7fab89a329e6ebd9b8a37ae9514fbd499 Author: Uwe Schindler Date: Wed Apr 9 17:07:11 2014 +0000 Add missing harvesterProperties to PanFMP1IndexHarvester commit 3afcd1e025ec2eabdbfb2db1c3e2b1af1f042336 Author: Uwe Schindler Date: Wed Apr 9 16:16:01 2014 +0000 Fix bug with empty index name, better logging commit 734b4ffe87a0d8c8c5ccd0b8db51c798f1d3e800 Author: Uwe Schindler Date: Wed Apr 9 15:53:55 2014 +0000 Allow configuration of ES address in ElasticsearchHarvester commit b9a1a502198ac2206faab36fb1b0618ee420f949 Author: Uwe Schindler Date: Wed Apr 9 13:42:05 2014 +0000 Prevent NPE commit fd753ba2f0890fc1901e95f2098a2209886a0b6f Author: Uwe Schindler Date: Wed Apr 9 13:30:35 2014 +0000 add missing cleanup commit 76ab097a4aad1e8c39283e65af9f16081aceaa8a Author: Uwe Schindler Date: Wed Apr 9 13:29:40 2014 +0000 Refactor some names, add ElasticsearchHarvester commit 6c587ad93510e6b11c0daeb01d0acfe33b519b22 Author: Uwe Schindler Date: Tue Apr 8 13:45:18 2014 +0000 Remove unused constants commit 1ac77c9c1719175e8c5b843701e74e0c82d68bf6 Author: Uwe Schindler Date: Tue Apr 8 13:42:48 2014 +0000 Rename Legacy Harvester commit 8fafcdee752979a9991fb80d10e1f423e81c636d Author: Uwe Schindler Date: Tue Apr 8 13:38:00 2014 +0000 OAIMetadataDocument & Sets update commit 62532dfed620e0a1667f7ce6d98e145dd9f3b132 Author: Uwe Schindler Date: Tue Apr 8 09:49:27 2014 +0000 Set Locale to ROOT for generic date formats commit 2a470aea94b620e4b5088e080a2fd247a428d741 Author: Uwe Schindler Date: Wed Apr 2 17:36:51 2014 +0000 Add implementation for Rebuilder. Change Harvester to no longer store MetadataDocument class name in index. Change Harvester API to take HarvesterConfig in ctor commit 82354eac227e3dabcae554a394ec8fb13c42005b Author: Uwe Schindler Date: Fri Mar 28 13:04:51 2014 +0000 Some refactoring, more logging commit 5498c46383a34f99b95adcc4318232246bfe820f Author: Uwe Schindler Date: Fri Mar 28 11:28:30 2014 +0000 Add support for custom mapping from config file commit ac0aa591cb95e51f91f9fdabb7acef1b56579fb4 Author: Uwe Schindler Date: Fri Mar 28 10:36:50 2014 +0000 Add more logging commit 86c364a4e5a36ef7295b692d88686edcad5016a8 Author: Uwe Schindler Date: Fri Mar 28 10:34:03 2014 +0000 Add mapping for HARVESTER_METADATA_TYPE (simple string-string kv pairs) commit 31a3e9c0b5b96cd93f6982271d1c2caf55a82404 Author: Uwe Schindler Date: Fri Mar 28 09:52:11 2014 +0000 Support for mappings of internal fields commit 4cb67c84269f5ed275dde41f2872c040ef1b3363 Author: Uwe Schindler Date: Thu Mar 27 13:16:38 2014 +0000 Nuke more instances of the term "Index" commit 1e6e41580aeab6c2428973e07c5367af9ffe7582 Author: Uwe Schindler Date: Thu Mar 27 13:15:36 2014 +0000 Nuke more instances of the term "Index" commit 8523a156bd79bb4c1b8210a9b3e85fb7b19e8877 Author: Uwe Schindler Date: Thu Mar 27 10:19:03 2014 +0000 Add default config options commit d389aa3bdf7b62e314c99f5e957036b27135d3a9 Author: Uwe Schindler Date: Thu Mar 27 10:11:41 2014 +0000 Refactor the counting of indexed items, extract constants commit 81623b2daadacedfc6f38838976fe607b9d48878 Author: Uwe Schindler Date: Tue Mar 25 19:54:21 2014 +0000 Remove useless field, update to ES 1.1.0 commit 8dfd141f497ffdd16ba2a107b42767adca187f72 Author: Uwe Schindler Date: Mon Mar 24 22:50:52 2014 +0000 ctor refactoring commit 31b9c389e30b1ed9a7a25b21ccbf84a4e7122116 Author: Uwe Schindler Date: Mon Mar 24 21:58:06 2014 +0000 Refactoring, add thread-less document addition (for PANGAEA use without harvester) commit 991cfc83437e8b35e48bd5bdeb664cc9375c7fa9 Author: Uwe Schindler Date: Mon Mar 24 20:59:35 2014 +0000 Use shorter get() on ActionRequest commit b7350870a227f5b7ee2254f105775c152bd6a90c Author: Uwe Schindler Date: Mon Mar 24 19:17:53 2014 +0000 Refactor large method commit 05cd3886636db5680d26129819bc98361bc288a6 Author: Uwe Schindler Date: Mon Mar 24 19:14:08 2014 +0000 More refactoring commit fcc34606411fce7e38258654a47e381e0ccf191f Author: Uwe Schindler Date: Mon Mar 24 19:02:55 2014 +0000 Add assert commit f8e7fc76f44b8b4bd246b4fa28bb5bd16bbd07b9 Author: Uwe Schindler Date: Mon Mar 24 19:01:16 2014 +0000 Refactor threading in DocumentProcessor commit 7254a568baf238ac2b1ffecd56b48542f6a53f90 Author: Uwe Schindler Date: Mon Mar 24 16:27:56 2014 +0000 Make vars final commit 1eacbbdc7c86d689ab6649312651255347d12c49 Author: Uwe Schindler Date: Mon Mar 24 15:19:38 2014 +0000 Decouple index name from harvester name, make typeName configureable. commit 6ed5c95ee91945ffc46d789a7fca3dc53bd2f1a3 Author: Uwe Schindler Date: Mon Mar 24 14:36:05 2014 +0000 Rename internal variables commit 03a5f5fc6bd47ab747ec5bd3a6a53c8734ddee95 Author: Uwe Schindler Date: Mon Mar 24 14:28:39 2014 +0000 More config refactoring commit b153c84b8ab8f5268c32e989a40256a4c359ce61 Author: Uwe Schindler Date: Mon Mar 24 14:24:02 2014 +0000 Rename config classes commit 5123a706e1580f29f29ff5c90d6b4a58be9f5704 Author: Uwe Schindler Date: Mon Mar 24 14:18:49 2014 +0000 Rename some config settings commit a682aff6115e6c23876805caebe1386b2991f01b Author: Uwe Schindler Date: Mon Mar 24 14:02:31 2014 +0000 Support for lastHarvested metadata commit bef6d54eb434e119f4ab7952176f808247660ab6 Author: Uwe Schindler Date: Fri Mar 21 19:10:42 2014 +0000 Cleanup notice commit 7d750da615bde14111756430a14877df7de26cce Author: Uwe Schindler Date: Fri Mar 21 14:33:59 2014 +0000 Add configuration for internal field names, add XMLToJSON class commit 2a8f367d5668576bfeab4fd20617aa25c2dd745c Author: Uwe Schindler Date: Tue Mar 11 10:03:43 2014 +0000 Remove unneeded XPath functions (keep template on how to implement for later adding new functions), cleanup imports for XPath commit 2915bc204338f7af1afe9e2dd0296448f89b5a2c Author: Uwe Schindler Date: Thu Mar 6 13:00:58 2014 +0000 Cleanup some code commit 8288dc35ffc6e674675c6a32f1d0e2d8f8306ab9 Author: Uwe Schindler Date: Thu Mar 6 11:34:58 2014 +0000 Remove "_all" field special handling. Add some template for later JSON support. commit 7f11c489a0a8f118d2e019203dc212d7fb56a406 Author: Uwe Schindler Date: Thu Mar 6 11:08:43 2014 +0000 Update Elasticsearch commit 9162247a44294dbde730979ee94084c082089106 Author: Uwe Schindler Date: Wed Feb 12 17:10:42 2014 +0000 Update to ElasticSearch 1.0.0 commit 62c77899705530db19af1937a54599695eabf888 Author: Uwe Schindler Date: Thu Feb 6 11:06:00 2014 +0000 rename provider commit 57fced3ff5e541b1cc36bd710ad1f0104ed6081a Author: Uwe Schindler Date: Thu Feb 6 11:05:34 2014 +0000 Fix data providers & version numbers commit f17b2c3e4f3a10eb0dbba291a33f30343c2d85f1 Author: Uwe Schindler Date: Wed Feb 5 13:19:46 2014 +0000 Import org.w3c.Document (no more Lucene Documents used) commit 4b059d96cc516ad8c29484ce68cf7983cfc56e15 Author: Uwe Schindler Date: Wed Feb 5 13:05:57 2014 +0000 Update ElasticSearch, change version handling commit 3219cf803c1e63c1433ff905e0ff456b544c1197 Author: Uwe Schindler Date: Wed Jan 29 23:33:55 2014 +0000 Simple package version. commit 98ddde85faf1a48534195a61dbc2dc5de50ae994 Author: Uwe Schindler Date: Wed Jan 29 23:19:08 2014 +0000 Better HostAndPort parser (borrowed from Google Guava) commit a497f32498b42b58df1dc4085b45403ce6976e7b Author: Uwe Schindler Date: Wed Jan 29 22:49:51 2014 +0000 Factor out host:port parsing. Make the digester path to strip from node settings dynamic commit 68a796976a6c2bdb6f0a8830f156711b5abfc25b Author: Uwe Schindler Date: Wed Jan 29 15:51:09 2014 +0000 Clean up IndexConstants commit f81f9869a28ab9838a193834d70c63fa603c8a9d Author: Uwe Schindler Date: Wed Jan 29 13:57:01 2014 +0000 Cleanup docs commit 8b3e5c92d554182458989a30e03ce657b99719eb Author: Uwe Schindler Date: Wed Jan 29 13:55:22 2014 +0000 Refactor IndexBuilder to DocumentProcessor commit daf894a74218fa6d1e083cb9eab18ba457ab77c2 Author: Uwe Schindler Date: Wed Jan 29 13:52:35 2014 +0000 move HarvesterCommitEvent commit 32f2671c9e57e3aecfcab6d0e8d228b19ffe804c Author: Uwe Schindler Date: Wed Jan 29 13:33:26 2014 +0000 Refactoring of harvester package commit 81ed8f94193969a304ada3552fc3d17ef8e76db4 Author: Uwe Schindler Date: Wed Jan 29 09:06:23 2014 +0000 Move "repository" to "conf" commit 06ff7c791a3275589f3f939dd711a21322c545d2 Author: Uwe Schindler Date: Wed Jan 29 09:03:40 2014 +0000 remove outdated script commit 2fecff023d13a1b891359bb64890f57a0b144dbf Author: Uwe Schindler Date: Tue Jan 28 23:56:29 2014 +0000 remove autoOptimize commit 08840dbf763790d35fea2e74ad171cf6256888ae Author: Uwe Schindler Date: Tue Jan 28 23:48:04 2014 +0000 Add ES config commit 2c8eb8ca7e3de0a8b1b89e54d3f7b3d5d1089f5a Author: Uwe Schindler Date: Tue Jan 28 16:43:36 2014 +0000 Cleanup commit 7bd73bf43091e70e74be7bedbdd1ac068102b67f Author: Uwe Schindler Date: Tue Jan 28 16:08:54 2014 +0000 Use type on top-level request commit e950c1042bd89c71a9df865126b2d0e7def2e9ae Author: Uwe Schindler Date: Tue Jan 28 15:44:07 2014 +0000 fix warnings commit 0938014ec02819bb32f2d6fc49018640a99b9354 Author: Uwe Schindler Date: Tue Jan 28 15:40:26 2014 +0000 Add support for deling unseen document identifiers commit 6aa483a203f12fe70e376da8aec7c8a52d15f8e0 Author: Uwe Schindler Date: Tue Jan 28 13:45:38 2014 +0000 Cleanup commit 6f5780432c241501d1615fff0744ddde0928dfe4 Author: Uwe Schindler Date: Tue Jan 28 13:43:02 2014 +0000 Fix bugs with XML doc, remove displayName, remove indexDir commit cf2ae6c428e6adc18900737ccf572fc69a898a1e Author: Uwe Schindler Date: Tue Jan 28 11:38:37 2014 +0000 First version of (hopefully) working indexer (just a hack) commit 55d9420ecd3d923c4bb6dfdbbb428854d2b615fa Author: Uwe Schindler Date: Tue Jan 28 10:30:28 2014 +0000 Some refactoring to make Config class smaller commit be9a6fabf1a22516578b16eb6ac124e4ae186cc7 Author: Uwe Schindler Date: Sat Jan 18 00:39:25 2014 +0000 Refactor try/catch commit 925b74f0b4d259997fbcf690233e6f022ecdeee7 Author: Uwe Schindler Date: Sat Jan 18 00:30:05 2014 +0000 Cleanup ExternalIndexHarvester commit c57c2e507faf116fe6e9c88b03a749ae4e18501b Author: Uwe Schindler Date: Sat Jan 18 00:19:04 2014 +0000 Remove identifier from document JSON commit 46873fdce830de942eed4f23d450fbc04bc7560a Author: Uwe Schindler Date: Sat Jan 18 00:15:32 2014 +0000 More cleanup, remove tokenizedText datatype commit 9fabf576d43577c118f74bec2763dbb50e624b62 Author: Uwe Schindler Date: Fri Jan 17 10:42:41 2014 +0000 Make loggers final commit 31e66781e260cfffec03f0a1ee976c3376077d5e Author: Uwe Schindler Date: Fri Jan 17 10:30:36 2014 +0000 Remove more Lucene parts commit e49ca11d24cb36093f3407f2db78a0904092b4f2 Author: Uwe Schindler Date: Fri Jan 17 10:00:50 2014 +0000 Remove Lucene stuff commit ad495821d6f9ffe9bb48b557fef7749166c7d777 Author: Uwe Schindler Date: Thu Jan 16 14:14:02 2014 +0000 First working config! commit f7f3594d2864c78ad6b4669855a329548823bcb0 Author: Uwe Schindler Date: Thu Jan 16 10:12:57 2014 +0000 Commit JSON generator commit b69fbd322ea916a79fe9c0fe9df3f9dfd69dec00 Author: Uwe Schindler Date: Mon Jan 13 09:19:31 2014 +0000 Reome more code for now (TODO), fix warnings commit d57e5d97a1c96814553891260601b7aaa743a3eb Author: Uwe Schindler Date: Mon Jan 13 09:01:20 2014 +0000 Disable some code commit 36661bdadd1a0534531d12feb4459cf039dab73c Author: Uwe Schindler Date: Mon Jan 13 08:50:00 2014 +0000 Update ES commit 896b90a659d042da545890f66c44fa71669bf748 Author: Uwe Schindler Date: Tue Oct 8 12:54:42 2013 +0000 Useless if commit 713eb3ddd06bca11aa494d57b16c9ed0cc07445e Author: Uwe Schindler Date: Tue Oct 8 12:41:53 2013 +0000 Remove SingleIndexConfig, merge with super commit 5735f4dc390dd6d9ae37ba15cd59f0b64711cf0f Author: Uwe Schindler Date: Tue Oct 8 12:15:53 2013 +0000 More cleanups, Locale.ROOT as we are on Java 6 now commit 4958b52d1be74a6ebfb3b2d56240655513b41025 Author: Uwe Schindler Date: Tue Oct 8 12:08:42 2013 +0000 More cleanup commit 442972e03e18ae5ee63c2fa733aa4003c58a063d Author: Uwe Schindler Date: Tue Oct 8 11:57:41 2013 +0000 More cleanup, remove Config modes (we will only support harvesting) commit 696d53b4826a7d30a03242587d4d8321aa7e244f Author: Uwe Schindler Date: Fri Oct 4 11:47:33 2013 +0000 Reindent all code with 2 spaces commit 1f412697220df01f5ba871474a7609ffcbeff681 Author: Uwe Schindler Date: Fri Oct 4 11:45:35 2013 +0000 Reindent all code with 2 spaces commit 3c3bc3dcdd8d823e79f0ab0328f8d58c5201d48e Author: Uwe Schindler Date: Fri Oct 4 11:41:33 2013 +0000 Some cleanups for ES and Lucene 4.x commit 2b2152685188ac3e3b68064836ccdabac63c0d14 Author: Uwe Schindler Date: Fri Oct 4 10:27:44 2013 +0000 Small exclusion while packaging commit ef6571946939bb765d1efae342978983043df8af Author: Uwe Schindler Date: Thu Sep 12 10:17:40 2013 +0000 Exclude xml-apis, as shipped with Java 6 commit e63fa80a4f47916ff1bdcbdbbb2823805d1187c4 Author: Uwe Schindler Date: Thu Sep 12 10:08:15 2013 +0000 No custom tasks at the moment commit 127cb2a3e7d81a6cf957cb896df0c2bdf4dd7f2b Author: Uwe Schindler Date: Thu Sep 12 10:06:48 2013 +0000 Minimum supported Java version, for now leave xml-apis in commit 463a0cc1797ec825e694937bd47edd7d6111e504 Author: Uwe Schindler Date: Thu Sep 12 10:02:08 2013 +0000 Remove obsolete JAR files commit 44e7fe3397d8f326049d2133c931c56044e72830 Author: Uwe Schindler Date: Thu Sep 12 10:00:51 2013 +0000 Use IVY to resolve dependencies commit e4f05a52b6d5ad769b09259af2f2f0ccf8d634b8 Author: Uwe Schindler Date: Wed Sep 4 15:57:45 2013 +0000 Remove more classes commit 47893f6a78a724a337d69d75436c34ab97d5e309 Author: Uwe Schindler Date: Wed Sep 4 15:53:11 2013 +0000 First cleanup: - remove search package.html- remove Jetty / Webapp /Java example commit 7ca32d689f5670176fec5fe36df96a7c7defe17f Author: Uwe Schindler Date: Wed Sep 4 13:11:59 2013 +0000 Branch for development with ES commit c9e7f755e4598c33a81e8a00d45212778c3fabca Author: Uwe Schindler Date: Thu Jun 27 12:10:18 2013 +0000 Update of javadocs patch macro commit 6bbeeea0199e48da0f20e553190811b20cc9d2c6 Author: Uwe Schindler Date: Wed Jun 26 18:54:49 2013 +0000 Fix javadocs frame injection bug, set encoding of source files and javadocs commit 38b726824a6bcb4a6ae9fbdc282fd28963a19549 Author: Uwe Schindler Date: Mon Apr 15 17:07:33 2013 +0000 test commit commit ea13aa66edb7771968b94636ffa023dfa00b4bc7 Author: Uwe Schindler Date: Tue Jan 29 13:51:40 2013 +0000 Update NEKOHTML commit 3f288bff3d189e08636bd6a79de6f08cd4f9a26c Author: Uwe Schindler Date: Tue Dec 25 13:00:52 2012 +0000 Update to Lucene 3.6.2 commit c6267dc4af8ec4050d14b8aa33c39bb77660d08e Author: Uwe Schindler Date: Sat Jul 21 20:42:23 2012 +0000 Update to Lucene 3.6.1 commit f98324b9d6c1f5255c830ed81675c9d902f0f16a Author: Uwe Schindler Date: Tue Jul 17 08:16:00 2012 +0000 Fix filtering of sets when OAI reporitory does not report sets assigned to metadata. Set filtering is only needed for static repositories and if harvesting more than one set for network repositories. commit 010949bd311487acb0a11109f727048fb5efb1f8 Author: Uwe Schindler Date: Fri Apr 13 21:59:02 2012 +0000 Lucene 3.6.0 final version update (from Snapshot used before). This will be the last Lucene 3.x version. commit e807bf10cc57a98c21bbd701d756904179a173fa Author: Uwe Schindler Date: Tue Apr 3 07:13:41 2012 +0000 Update Lucene snapshot commit 0f9f362df1e4e69af7307bbc13b9caffe1bddb6d Author: Uwe Schindler Date: Sat Feb 18 01:28:58 2012 +0000 Separate document boost from norms. The change is backwards compatible, but it's still recommended to reindex (./rebuild.sh) to make use of absolute boosts (also works with numeric queries) commit 19d6dec28cf0c75e1d3cb425b78eb8beb7ca7450 Author: Uwe Schindler Date: Fri Feb 17 18:46:54 2012 +0000 Update to 3.x branch and "remove" my own deprecations :-) commit c8be5de61276244ac361a64456c42b4b3b2802e0 Author: Uwe Schindler Date: Fri Nov 25 23:55:51 2011 +0000 Upgrade to Lucene Core 3.5.0 commit dd3aff2fc228fe4aa3c3867acee054b07c6d1c26 Author: Uwe Schindler Date: Thu Sep 15 07:13:00 2011 +0000 Upgrade to Lucene 3.4.0 commit be460ec74e25ca07e864a678ee7df011b65fbed7 Author: Uwe Schindler Date: Thu Sep 1 22:56:57 2011 +0000 update nekohtml to 1.9.15 commit 2cef3ec988598d13ed7bd2b289c31a66d808adef Author: Uwe Schindler Date: Thu Jul 14 14:46:04 2011 +0000 Remove deprecations, upgrade NumericField usage (with index backwards compatibility) commit 928f02b55e2aa0e3ec36fe08abf49c0d7898eaee Author: Uwe Schindler Date: Fri Jul 1 06:53:03 2011 +0000 Upgrade to Lucene 3.3.0 commit 317032ffe065a62ea6ebfe0db6788003c25f92c9 Author: Uwe Schindler Date: Fri Jun 3 15:42:02 2011 +0000 Upgrade Lucene to version 3.2.0 commit b9ea163277bd775eb73e73450ae53cd86a39e0e5 Author: Uwe Schindler Date: Wed Mar 30 17:28:08 2011 +0000 Upgrade Lucene to 3.1.0. Deprecation warnings will be fixed later! commit 26ed3f35343558ed932a31dc78b7d7cb1923d1d8 Author: Uwe Schindler Date: Tue Feb 1 14:12:15 2011 +0000 fix TODO.txt commit 63c30147bde346de626d2c5eeff96b09bb3c9f64 Author: Uwe Schindler Date: Tue Feb 1 14:05:09 2011 +0000 update Lucene's javadocs location commit 5b821eab11c9cb76551ea5b48eadea3147da618b Author: Uwe Schindler Date: Tue Feb 1 13:36:14 2011 +0000 Change version information for 1.1 branch commit 6e9efda46a2b75fd16a3fa0b2cd1438444391fb0 Author: Uwe Schindler Date: Mon Jan 31 18:54:04 2011 +0000 update xerces to 2.11.0 commit 96ea63660145bfe024338bce2d4baf85d13f00f8 Author: Uwe Schindler Date: Thu Dec 9 13:49:55 2010 +0000 update jetty and sfl4j commit 6580f678001872d952ee76dd3bec4b055e5a5f67 Author: Uwe Schindler Date: Mon Dec 6 11:07:00 2010 +0000 Update Lucene to 3.0.3 commit 6aded52bb7ef2c6667e13961ab3935e2803bc397 Author: Uwe Schindler Date: Thu Jul 15 02:54:07 2010 +0000 update jetty commit 1038099c8a9f9d71a782cd681b49afac5a2c627b Author: Uwe Schindler Date: Mon Jun 21 23:20:43 2010 +0000 add XERCES 2.10.0 commit 2c30be5954ae9eb44c8855e802e745a3a23eec7c Author: Uwe Schindler Date: Fri Jun 18 16:17:27 2010 +0000 Update Lucene to 3.0.2 (released today) commit aec055f721873b9b2c053dd540496286a6ea652c Author: Uwe Schindler Date: Mon May 17 17:24:02 2010 +0000 fix lots of upper/lower case problems with default locale; update slf4j commit e69bde672d7c602eb426aa75ca021687715919cb Author: Uwe Schindler Date: Tue Mar 2 15:02:36 2010 +0000 update lucene to 3.0.1 commit e7e85a199c2ba920053ecdd896233a3cbc647227 Author: Uwe Schindler Date: Tue Feb 9 10:22:17 2010 +0000 initial version with warmer commit 17adcfdc61455016b51c40f10413b675deeb6fc1 Author: Uwe Schindler Date: Fri Nov 27 13:49:33 2009 +0000 Add support for indexVersionCompatibility (default is Version.LUCENE_24 for backwards compatibility). commit ed82009c506a93693ab077dd7e291766735f2eb2 Author: Uwe Schindler Date: Fri Nov 27 11:49:21 2009 +0000 First version for Lucene 3.0, that is still backwards compatible (it uses Version.LUCENE_24 for analyzers and query parsers). The support for compressed fields is preserved, but the index format for that changes. It is recommended to rebuild the index, because already compressed fields may suddenly get bigger before reindexed again (see Lucene 3.0 release notes). Later commits will have support for configuring the version number of analyzers and query parsers, until then its fixed to LUCENE_24 for BW compatibility. commit ed1c2b2f4599cfbe31a8a59f5284fa075b0d00bb Author: Uwe Schindler Date: Mon Nov 23 08:48:51 2009 +0000 Change version number for trunk commit a74f99f728f799b3a496780875c52d44fce93999 Author: Uwe Schindler Date: Sun Nov 22 22:09:11 2009 +0000 Only use Lucene Core documentation commit 76708a12338b449199bb7ac74dd6c289f1877dc1 Author: Uwe Schindler Date: Sun Nov 22 16:28:25 2009 +0000 update some libs before release of 1.0 commit 29996857e9afac929bced2e9c19be559d799d962 Author: Uwe Schindler Date: Sat Nov 21 16:08:58 2009 +0000 add support for md5 and sha1 checksums during package build commit 9d4254a69f347addc8c92d42d58cfb3d575d9581 Author: Uwe Schindler Date: Sun Nov 8 19:10:47 2009 +0000 Upgrade to Lucene 2.9.1 commit fc117edd9f97dcfa427cdb5ddce41f89912449e4 Author: Uwe Schindler Date: Thu Oct 29 12:29:32 2009 +0000 fix javadoc generation commit 7fbdc916b6695726d23e863fe53e0a1ec5e42b43 Author: Uwe Schindler Date: Mon Oct 26 17:22:22 2009 +0000 fix typo and dead code commit af41b9e222ee3920d4b81d4c890a34475929db76 Author: Uwe Schindler Date: Mon Oct 26 17:18:10 2009 +0000 Auto do maxScore/Score for sorted results commit 5f1fafa5ffbb3871b32e9123a0250a98c52b06d7 Author: Uwe Schindler Date: Thu Sep 24 21:25:19 2009 +0000 Update to final version of Lucene 2.9.0 - ready to publish panFMP version 1.0! commit 677f8423443a8772af1ad819a8a608ce67a3da3c Author: Uwe Schindler Date: Sat Sep 19 00:03:48 2009 +0000 Update to Lucene 2.9.0-RC5 commit 0b8662a2ab8eff2cad1fa614091da5487641b2b8 Author: Uwe Schindler Date: Mon Sep 14 16:03:43 2009 +0000 upgrade jetty to 6.1.20 commit 0245091ea789e4a1826f8ea50dc51cf6a13be138 Author: Uwe Schindler Date: Sun Sep 13 15:03:25 2009 +0000 - Update Lucene to 2.9-RC4 - Ignore one deprecation warning (Field.STORE.COMPRESS related) commit fdbf36f6c0461a5e0a3788530c2801c9fb86bd82 Author: Uwe Schindler Date: Wed Sep 9 17:01:08 2009 +0000 Update to Lucene 2.9 RC3 commit a647b35e1088ffebe38100a3c2d8b6315fa17290 Author: Uwe Schindler Date: Tue Sep 1 13:23:23 2009 +0000 remove default stop-word list commit 23bcb0354917ddfccf541b756c9b6c46f4d128f9 Author: Uwe Schindler Date: Fri Aug 28 21:06:51 2009 +0000 Update Lucene to 2.9.0-rc2 commit 04bae9e70a07319ab5cdce003c45e05886b3f2db Author: Uwe Schindler Date: Fri Aug 28 08:14:52 2009 +0000 Update Lucene to 2.9.0-rc1 commit 1309df7ed66fcb5b332c1fc3b46599e38f86b0d0 Author: Uwe Schindler Date: Wed Aug 26 06:07:20 2009 +0000 Update Lucene to latest trunk (2009-08-26). commit f33d30f0da165fbfa9c62ca7f7e0b73bca2fa589 Author: Uwe Schindler Date: Sat Aug 15 06:48:14 2009 +0000 Update Lucene to latest trunk. commit 55f191210e3257f66f7bcb7a1255e1e59603ea60 Author: Uwe Schindler Date: Wed Aug 12 11:54:33 2009 +0000 Update XML file URL for COPEPOD example commit 38c16369873c7ff6454be13d397ba0a823e4fdaa Author: Uwe Schindler Date: Wed Aug 12 11:49:22 2009 +0000 Add support for additional XSL params in index configuration. They can be passed as attributes to commit a0b934a19806de5504570d65ad85db4e8f50e6ef Author: Uwe Schindler Date: Wed Aug 12 08:09:46 2009 +0000 Do not fail on invalid cookies, just print warning and ignore. commit 2fbe82a380a50fc7c0809600e3c16b295e2ca3d6 Author: Uwe Schindler Date: Tue Aug 11 22:10:42 2009 +0000 Add basic support for Cookies in OAI-/WebCrawlingHarvester. This is needed for GeoNetworkOpenSource (which sets a session ID needed later). Cookies are only recorded/enabled for the running thread and only when running affected harvesters. commit 962c9b31ca4df7a9dfd49d8035660aafa4b7dd55 Author: Uwe Schindler Date: Tue Aug 11 08:37:32 2009 +0000 - Update Lucene to latest Hudson trunk 2.9 build (the old query parser produces now a lot of deprecation warnings, but it is not yet sure if it gets really deprecated - I will fix this, as soon as Lucene 2.9 is released). Further work may be the move to the new QueryParser currently staying in Lucene Contrib - Update Jetty - Update Nekohtml commit 6809fe14dc0ddf8315802c5840dada938d33f9f9 Author: Uwe Schindler Date: Thu Jul 16 06:26:28 2009 +0000 - Use allowDocsOutOfOrder=false for collectors and remove the usage of SorterTemplate. - Fixes a compiler warning with the new harvester. - Update lucene-core.jar to the current trunk version. commit 3d84f07fb3dfe108190346e57674141888293cae Author: Uwe Schindler Date: Mon Jul 13 12:08:48 2009 +0000 Add a new harvester, that harvests foreign panFMP indexes (from another installation). The foreign indexes can use another XML schema and field structure, because a mapping can be done using XSLT (as with other harvesters). It is also possible to only harvest a subset of documents by specifying a query string. QueryParsers and Analyzers of the source index be specified for that. commit 913910eb5110140591df42bc6f5537e27bc15161 Author: Uwe Schindler Date: Thu Jul 2 06:57:55 2009 +0000 update Lucene to latest trunk (fix some bugs) commit 3c60d8623c87c844cb18b48b0ec593f788f5bbb3 Author: Uwe Schindler Date: Mon Jun 29 08:21:45 2009 +0000 Update Jetty to 6.1.18 commit e95738ed2be34b5c1713735ab8a3c2ba58337adb Author: Uwe Schindler Date: Wed Jun 24 10:28:37 2009 +0000 JavaDoc update in DateRangeQuery commit 3a21a0fe009fa8efc4fdf0b011cec545bdd79b61 Author: Uwe Schindler Date: Wed Jun 24 10:26:13 2009 +0000 Documentation updates #2 commit 31eeafc870e9f997d5908c2b4f47d0f29d35b37a Author: Uwe Schindler Date: Wed Jun 24 10:24:26 2009 +0000 Documentation updates commit 22750c56acf5faf996425a46be3192be34a3e3f9 Author: Uwe Schindler Date: Wed Jun 24 08:35:53 2009 +0000 Add missing @Override commit 0771db3c5dbbc20bd1c5922708c0ff10259aad17 Author: Uwe Schindler Date: Wed Jun 24 08:01:26 2009 +0000 New Lucene Trunk Version, TrieRangeQuery is now in Lucene-Core (with new name, so contrib-queries is no longer needed). This commit also respects other Lucene API improvements/changes (Collector). commit a478704043362e8e0241fba451d6d1f642b4b430 Author: Uwe Schindler Date: Tue Jun 2 09:26:07 2009 +0000 - new Lucene JARs - Changes to directory implementations. It now supports AUTO to choose NIO on all platforms excluding windows commit f31cab7502db5d71740be441af6617a8c91215ae Author: Uwe Schindler Date: Fri May 29 16:58:41 2009 +0000 Use NativeFSLockFactory, which has no problems with local filesystems. May fail with NFS filesystems, which should not be used for Lucene. commit a238921118e2a5dc93495448b149e39344963af4 Author: Uwe Schindler Date: Wed May 6 15:36:31 2009 +0000 nicer toString() for dates in TrieRangeQuery commit 21217ebdc3b25c0b4570714be1e0f11254db4150 Author: Uwe Schindler Date: Sat Apr 25 21:58:38 2009 +0000 Remove usage of HitCollector and replace by Collector (new Lucene API) commit 397bd8668c3f60c9532fe22cfb44c1e916dda400 Author: Uwe Schindler Date: Fri Apr 24 07:47:19 2009 +0000 update lucene JARs commit 741d20109b30b4b9ae24dc193d2ede31ca8f6e21 Author: Uwe Schindler Date: Fri Apr 17 07:28:43 2009 +0000 update Lucene, some new deprecations appeared, must be fixed (Collector, COMPRESS) commit 4f8e0302b6fbd88d0e436de0152a6666e8e8f779 Author: Uwe Schindler Date: Sat Apr 11 20:18:09 2009 +0000 New Lucene TrieRange version (updated to Lucene Trunk 2009-04-10): The internal encoding of numeric and date fields changed in index again. You need to rebuild indexes using rebuild.sh/rebuild.cmd. If you do not do this, range queries will return no or only few results. commit 0e34fe25b36a02a745fdd0ef71240971188f7809 Author: Uwe Schindler Date: Tue Mar 31 12:41:15 2009 +0000 fix NPE in rebuilder commit 572700b6800cfd7f4cb731130ad0c5df8f01fa51 Author: Uwe Schindler Date: Fri Mar 20 23:10:13 2009 +0000 fix typo commit 63339e9e85654585b439d011ecd295ebdf2b962f Author: Uwe Schindler Date: Sun Mar 15 17:18:13 2009 +0000 update to Jetty 6.1.15 commit 1bbc48bef5e12607fa8b4e45e89d52e765148c7c Author: Uwe Schindler Date: Tue Mar 10 19:29:49 2009 +0000 Automatically optimize index after rebuild (even if config does not enable auto-optimize) commit 240072907e391ebd77a3ac4f8b4f26f8fe125f69 Author: Uwe Schindler Date: Mon Feb 23 07:47:43 2009 +0000 update lucene jars to latest snapshot commit 865628e2289463d36c5938546e42f7306e6a087e Author: Uwe Schindler Date: Sat Feb 14 10:56:31 2009 +0000 Again an update, that changes index encoding of numeric values. To use indexes created before this update, you have to reindex them (using the rebuild.sh/.cmd script). Datestamp metadata may get lost, but this is no problem. New features: - Update to snapshot build of Lucene 2.9, that has a completely reimplemented trie package. This change is not backwards compatible, because of that, you need to rebuild. - Change in config.xml: property numericalTrieImplamentation renamed to triePrecisionStep, the new variable contains the step in the bit precision when generating trie encoded numeric values. Default is 8 (as before, which was "8bit"). Now every number between 1 and 64 is possible, lower values create bigger indexes, but faster queries (see javadoc). commit 5c98fc16578c097ce6ef12a19dd4dac89cd072e5 Author: Uwe Schindler Date: Thu Feb 5 11:08:03 2009 +0000 add a static main() method to LenientDateParser for testing. commit b5b6242cae6eb3289f4a257174100b6edff830da Author: Uwe Schindler Date: Mon Feb 2 13:17:59 2009 +0000 Improved sorting in LuceneHitCollector (uses new SorterTemplate utility from Lucene 2.9 to sort two arrays in parallel). commit 13c917976501e493c1eaadc1345b33707cd4c2b9 Author: Uwe Schindler Date: Sun Feb 1 12:42:57 2009 +0000 Link trie package from Lucene in a better way in JavaDocs (makes update of final URL with build.xml easier) commit eeceaf801f1729a4706565feee7dee50f5eef06e Author: Uwe Schindler Date: Sat Jan 31 22:53:22 2009 +0000 link Lucene's Hudson Javadocs commit b7f8fc08a7dca60dbcb5de5deea7bca098fdee0e Author: Uwe Schindler Date: Sat Jan 31 22:29:55 2009 +0000 Improve Javadocs commit bf54387bf480879ca9ca26cd096531558466df43 Author: Uwe Schindler Date: Thu Jan 29 09:38:25 2009 +0000 Update Lucene JARs to Hudson nightly build: - new TrieRangeQuery version - optimized index reopen when sorting enabled on queries Other updates: - FSDirectory now configureable in config.xml - some improvements in config parser commit 7290e0e49412a6c29a60b2aef0b335658ef916f0 Author: Uwe Schindler Date: Sat Jan 24 00:42:53 2009 +0000 - extra check for score - no synchronization needed, as no MT search anymore commit d469056c684607f6de7197b2e55d02e6e4caa031 Author: Uwe Schindler Date: Thu Jan 15 08:37:32 2009 +0000 Do not use norms and tf for string and numeric/datetime fields. To support this for numeric and datetime fields, lucene-queries.jar is also updated. commit bca5c57dbecf0af9c5ca45ded407b2d2033b462a Author: Uwe Schindler Date: Tue Jan 13 00:50:45 2009 +0000 Update some libs: - Digester to 2.0, now for Java 1.5, small changes in code for that - nekohtml to 1.9.11 Remove libs: - commons-collections-3.2.1.jar Small fixes in config etc. commit 1528246e62679eb21902a543c45349ba81cdc6dd Author: Uwe Schindler Date: Mon Jan 12 08:46:43 2009 +0000 Update Lucene to development snapshot of 2009-01-12 (includes efficient sortable numeric/datetime fields and TrieRangeQuery optimization) commit 30fe4d79492cb1ccf13cf5547d8f3745674c543c Author: Uwe Schindler Date: Sun Jan 11 11:56:27 2009 +0000 - remove copyright year from source file headers - change year to 2009 in documentation and build files commit 9c23e92aa35487214cdfe970cba9860e4caf9190 Author: Uwe Schindler Date: Fri Dec 5 09:27:22 2008 +0000 replace lucene-queries-2.9-dev.jar by hudson version commit e0f49e129ae8f21dccc671c9743c85b9841dd2fa Author: Uwe Schindler Date: Thu Dec 4 15:27:15 2008 +0000 fix compile error in example. commit 282396fba68e0103ca8cca6512d073c600d453a3 Author: Uwe Schindler Date: Thu Dec 4 14:09:11 2008 +0000 !!! WARNING !!! Backwards incompatible change! TrieRangeQuery was given to Apache Lucene as a contrib package (see https://issues.apache.org/jira/browse/LUCENE-1470). During the move to there, it was optimized, the trie encoding was changed to be compacter and you have the possibility to tune search speed by using more indexed precisions (using more disk space). This patch removes TrieRangeQuery and TrieUtils, adds a new dependency to the not yet released version of lucene-queries.jar contrib (version 2.9-dev, built locally/by Apache's Hudson). The backwards incompatible change is the use of a new trie encoding in the index. Indexes created with earlier versions of panFMP are not working anymore, when used with numeric/datetime fields. To make them work again, you can reharvest them after dropping or use the index rebuilder (rebuild.sh/rebuild.cmd). If you rebuild the index, datestamps of the metadata get lost (it will print out a warning for each document). As the metadata datestamp is not used anywhere in panFMP, this not a problem. If you do not rebuild/reharvest indexes, you will get spurious NumberFormatExceptions. Other fixes: - This patch also fixes sorting of numeric fields (now possible again). A further patch/issue (not yet done) for Lucene will do sorting not string-based on the encoded trie values, but use a more memory effective FieldCache of longs. - Throw correct exceptions in SearchService (copy/paste error) commit 377c5b3aeae1f37cd1accaeec8936b12614ec30a Author: Uwe Schindler Date: Sat Nov 29 22:39:15 2008 +0000 upgrade jetty to 6.1.14 commit 1f1f0539e623d6ac1801187b70a7161e1018b4df Author: Uwe Schindler Date: Fri Nov 21 13:21:42 2008 +0000 rename Hash to UUID in axis webservice commit f4b4bbd409751dc932956efe3bb5794a889848fc Author: Uwe Schindler Date: Fri Nov 21 11:55:09 2008 +0000 Add more documentation in comments to the default config file commit 6a8f53232542d92ed8ebdc3a6fe5198a2e369f7a Author: Uwe Schindler Date: Fri Nov 21 10:37:42 2008 +0000 Incompatible change in SearchService API: - storeQuery() now returns a UUID instead of a String. You may need to change your code, see documentation. - readStoredQuery uses UUIDs, too. commit d0d08acb2cc2e94cbcd302f882cd37d2c47182eb Author: Uwe Schindler Date: Thu Nov 20 14:41:34 2008 +0000 Show the element in example config.xml commit 2ac8088477ee0b5296c653285289356784290077 Author: Uwe Schindler Date: Sun Nov 16 22:48:50 2008 +0000 ExtendedDigester changes: - Remove usage of a Stack in favor of a linked List for the namespaces - Refactor replaying of prefix mappings for SaxRule - New access methods, unneeded ones removed commit 48e7351cdf3453642f156ba945f97be8ca52a131 Author: Uwe Schindler Date: Mon Nov 10 17:33:23 2008 +0000 some cleanup with sax parser for differentiating between xinclude (Config) and not-xinclude (elsewhere) commit a3b2842a8a26769147eed79a2778c9c465ca9754 Author: Uwe Schindler Date: Sun Nov 9 22:16:57 2008 +0000 Use new parent Config in SingleIndexConfig constructor to initialize harvester properties without the special class InheritedProperties (removed). Do the index check at end of config loading. commit 975ab475c7de26a593feee0a1c04ea1c94a8aed9 Author: Uwe Schindler Date: Sun Nov 9 17:21:36 2008 +0000 - add some more final declarations - reset the namespace map in digester on clear() and startDocument() to have always a clear document start without unneeded prefixes to be declared commit e702617468c904965bdc220f2c97a436331e6b0c Author: Uwe Schindler Date: Sun Nov 9 14:02:59 2008 +0000 - Rename "href" attribute to "src" - Remove unneeded import commit 4d2a783fc950984d36b3c99da594bf09e446d5f5 Author: Uwe Schindler Date: Fri Nov 7 15:56:09 2008 +0000 make some fields in configuration final commit f5294c09b19d4b991dbd71b205092b505d7c214a Author: Uwe Schindler Date: Fri Nov 7 14:56:36 2008 +0000 Possibility to set the XSL template in index configuration by a simple href attribute or by including the template as before. Both possibilities are supported, short templates may be directly included into the config document or given by filename, which is optimized, when you always have to set the same template, which is cached. There was also some refactoring, the parent element of IndexConfig is directly set in constructor. commit 2e044972d3a82dec6cd37de777b6631e3dd1d270 Author: Uwe Schindler Date: Fri Nov 7 08:39:11 2008 +0000 - Centralize & uniform trimming of harvester properties and search properties - Some cleanups in Config code commit 46a1932b789d5a1ed586075802f370358dcaf3e4 Author: Uwe Schindler Date: Fri Nov 7 01:07:49 2008 +0000 remove unneeded Axis Ant file commit 09dd86b97ee1966573f1a11590b38d93930ea977 Author: Uwe Schindler Date: Fri Nov 7 00:03:06 2008 +0000 Update to Jetty 6.1.12 commit d687a19d3816143171ed4fdd240063907d086728 Author: Uwe Schindler Date: Thu Nov 6 20:02:44 2008 +0000 Fix NPE in Axis Webservice commit 8dd220261b126208760706d743515d2ee09404b2 Author: Uwe Schindler Date: Tue Nov 4 20:09:45 2008 +0000 - replace empty datestamp variable by "" - some checks added - optimized and unmodifiable set/map constants commit a7521319beaf43db90f1689669063544f3f0664f Author: Uwe Schindler Date: Tue Nov 4 12:32:07 2008 +0000 Fix bug with empty datestamp and rewrite variable registration for XMLConverter commit a59c7531fe31a57ee57f68a17f8a2ce82cd91b12 Author: Uwe Schindler Date: Tue Nov 4 11:58:07 2008 +0000 - refactoring the XMLConverter - new methods for checking last modified datestamp - factory for MetadataDocument inside Harvester. commit beb1e0c07fa185d8582e2ae0d08ce491afa472e5 Author: Uwe Schindler Date: Tue Nov 4 00:26:46 2008 +0000 - Missing index builder variables in TransformerHandler - add datestamp variable - use the final identifier as identifier, not the source systemId commit 97fb11eba73fcffbe2c5ff9f9f37bad0b97f9705 Author: Uwe Schindler Date: Mon Nov 3 18:02:44 2008 +0000 New feature: Set index builder variables also in XSL for transforming metadata. Currently this works for all ib:-variables, but not the date stamp. commit de9215499d1a3ea94f3db54af1ab469fb4ff9569 Author: Uwe Schindler Date: Mon Nov 3 15:58:45 2008 +0000 wrong variable in initializer -> NPE commit a45c51b63238d274189237abc14b35ec6583c688 Author: Uwe Schindler Date: Thu Oct 30 22:50:29 2008 +0000 add missing formats commit 4663b1981fe28e822f52ad8c04788bb7697c10fb Author: Uwe Schindler Date: Thu Oct 30 17:44:01 2008 +0000 - Use UTC instead of GMT (should not change anything) - Remove German Date/Time formats from LenientDateParser - Correct order of Date parsing commit b55f16491da9aca35eb61e0d202b69db265d9514 Author: Uwe Schindler Date: Thu Oct 30 00:51:40 2008 +0000 fix deadlock in IndexBuilder. Problem was "unconditioned wait". commit 9940cbfc68c08baefe0b05bd58215452906a245d Author: Uwe Schindler Date: Wed Oct 29 18:10:45 2008 +0000 revert last commit (this will not work correctly) commit 1f640fd8ac6645d93244872934347f2889193652 Author: Uwe Schindler Date: Wed Oct 29 16:55:14 2008 +0000 cleanup sessions in LRUMap and cache background task commit 7abc239bd57c7b8bf6ccc0f8fad77a1b08b4d8eb Author: Uwe Schindler Date: Wed Oct 29 10:47:19 2008 +0000 Optimize and compact StringBuilder appends. commit 40a34ca893fb9fd74b3cd9fec66b5588386b8b5c Author: Uwe Schindler Date: Thu Oct 23 12:47:52 2008 +0000 fix encoding issue commit 31171ea628734183223b206171d9a77ffc3afff4 Author: Uwe Schindler Date: Thu Oct 23 08:33:26 2008 +0000 Add missing javadocs in utils package. Some small changes in code of ExtendedDigester. commit 5d6298d606056c7dc6f830c46d3e7ffd8eca7055 Author: Uwe Schindler Date: Wed Oct 22 21:31:41 2008 +0000 Add JavaDoc for TrieUtils and TrieRangeQuery, cite the paper. commit 5bb6f1d44e79a3dd5e483f02a26b02c27fa02b3a Author: Uwe Schindler Date: Wed Oct 22 17:40:42 2008 +0000 Update of Todo list commit 6b9045241b3c1dcb7ed9b6d5d6af5b5984ba4483 Author: Uwe Schindler Date: Wed Oct 22 12:37:09 2008 +0000 small fixes in examples (names, incorrect web.xml, readme) commit 202a1e24b85da3ff7fdb98a09821fb3f7c4b82e5 Author: Uwe Schindler Date: Wed Oct 22 06:36:14 2008 +0000 slf4j update commit cb9cdd978be18c2efc1f026ad9c312ede323d9dc Author: Uwe Schindler Date: Tue Oct 21 12:33:26 2008 +0000 Add examples to panFMP distribution: - a PHP example using SOAP API - two Java Servlets (Paging and Collector API) Both examples use the example configuarion (DIF metadata) and have XSLs to map to HTML. commit 1755f178b5465bee39141f20f19560e217b3dc65 Author: Uwe Schindler Date: Fri Oct 10 22:24:20 2008 +0000 Update to final release of Lucene 2.4.0 commit 366a539df8877149078172bb1e33922d818ab692 Author: Uwe Schindler Date: Wed Oct 8 16:30:41 2008 +0000 update nekohtml (1.9.9) and slf4j (1.5.3) commit 86e813aa2cb672cc6d806a6f440ebcfe8e17b824 Author: Uwe Schindler Date: Wed Oct 8 13:28:43 2008 +0000 rename a harvester property, as now changes to index are only commit at end of harvesting. This change is not backwards compatible, you may have to edit your config file: 1000 -> gets: 1000 commit a2ba9bf203b78b368ff59891eeb98cfe45d47aca Author: Uwe Schindler Date: Wed Oct 8 09:12:25 2008 +0000 small refactoring with LoggingErrorListener commit 58c21d5d58779574663dd4149de2bfbbdf60afcd Author: Uwe Schindler Date: Fri Oct 3 11:53:07 2008 +0000 small performance optimization: use System.currentTimeMillis() instead of new Date().getTime() commit a736dc0927a1cb2fd3c82246b21cfc389a37befc Author: Uwe Schindler Date: Wed Oct 1 16:16:23 2008 +0000 use remove() in ThreadLocal commit d842024f96df76dd81c2c44dc75513331ee72df6 Author: Uwe Schindler Date: Tue Sep 30 09:01:50 2008 +0000 disable saving of TF for trie fields. commit e5bc38e300a490a95d536b00afad8fe57826e297 Author: Uwe Schindler Date: Fri Sep 26 20:14:58 2008 +0000 Add JavaDoc #2 commit 08edc7db285c26fa992074ff29ee768be80c9657 Author: Uwe Schindler Date: Fri Sep 26 09:27:45 2008 +0000 Add JavaDoc. commit 586207b31e42bf165366f98599f81c7b293f8d25 Author: Uwe Schindler Date: Thu Sep 25 15:51:19 2008 +0000 update to Lucene 2.4.0-rc2: - fix Checker - fix deprecation of autoCommit=false with IndexWriter ctor. commit ea63984a92df592449bcf2a7b4d6f1e5f6cdb057 Author: Uwe Schindler Date: Thu Sep 25 14:07:41 2008 +0000 javadoc fix commit bffe3dd4fb828d48a68a9f382c12230c6ee385ef Author: Uwe Schindler Date: Thu Sep 25 14:02:29 2008 +0000 some final optimizations in AutoCloseIndexReader commit b70307aa2328d3eb1e73f355c90c5df4f1a24448 Author: Uwe Schindler Date: Wed Sep 24 13:41:04 2008 +0000 New implementation of LuceneCache with TimerTasks (cache cleanup in background): - new search config variables - index readers are kept open after reloading for configureable time or until GC removes them - use WeakReference to hold old readers in IndexConfig commit 4b5a1f064738530be0261463e0eac11dc9de904a Author: Uwe Schindler Date: Wed Sep 24 08:52:28 2008 +0000 remove "lastharvested" index file if CheckIndex finds error commit 825a472f63a20d186f356902e296eaa389dafbd8 Author: Uwe Schindler Date: Wed Sep 24 08:25:13 2008 +0000 remove empty lines from logging PrintStream commit 0366074359a3ce9f4f19ce16ade96cedbe57b7f7 Author: Uwe Schindler Date: Tue Sep 23 21:15:37 2008 +0000 correct fixing of index commit 5181ff94d8eabb8707939c13ed70211fad6fff19 Author: Uwe Schindler Date: Tue Sep 23 17:54:38 2008 +0000 - rename ReadOnlyAutoCloseIndexReader to AutoCloseIndexReader - fix Checker for Lucene 2.4 (may change when final version is out, see https://issues.apache.org/jira/browse/LUCENE-1402) commit a015c306abe5475c060c0ac3e3a50a9b565530c1 Author: Uwe Schindler Date: Tue Sep 23 14:32:19 2008 +0000 - upgrade to Lucene 2.4 (RC1) - some restructuring - not backwards compatible renaming of IndexConfig methods commit d8e05f3c89b77e65cdf58573b0c45951068d0638 Author: Uwe Schindler Date: Tue Sep 23 08:33:41 2008 +0000 force IndexWriter to optimize synchronous commit d3c56abe21c1d8308d5cbc1d6333f42edb377a7b Author: Uwe Schindler Date: Mon Sep 22 18:16:01 2008 +0000 fix some small leaks commit 80d504ead3ac1d72ab31887a1ed8e39c0480f9d1 Author: Uwe Schindler Date: Sun Sep 21 20:41:05 2008 +0000 new code to keep reopened or closed IndexReaders open until finalization. This helps preventing the problem, that one thread reopens all index readers at the same time another thread does a search, which then crashes. Hope, this is bug free, may need some testing. commit 8dc22a824fd98b20f31a4158c28f5ad7cc990064 Author: Uwe Schindler Date: Fri Sep 19 10:32:21 2008 +0000 doc update commit 0c40cd96979ad9b686ca75bf16b80ca0bb0712c6 Author: Uwe Schindler Date: Fri Sep 19 10:22:27 2008 +0000 support change of default query parser operator commit e3c2f484e8258f8a18838f6a055df2bea1314a4c Author: Uwe Schindler Date: Tue Sep 16 08:03:41 2008 +0000 hide constructor of LogUtil commit 18feacdd6ef6aaaec982fbd6cdd07424ccba8efe Author: Uwe Schindler Date: Sun Sep 14 10:28:12 2008 +0000 change log messages commit 65195da9227d8f429c86e3da955f9333865bd543 Author: Uwe Schindler Date: Fri Sep 12 17:33:51 2008 +0000 remove ZipFileHarvester TODO items (as already done). commit 44550f87db014a019c799cbbae2e4f280ee3c7eb Author: Uwe Schindler Date: Fri Sep 12 12:31:13 2008 +0000 change date stamp handling of ZIP file commit 1e755fea573910381a280aa59931003a23331478 Author: Uwe Schindler Date: Fri Sep 12 10:36:18 2008 +0000 fix some inconsistency with datestamps commit 44c6d4303e2d6ac1c1e5aa3aad9c86fc6827c92b Author: Uwe Schindler Date: Thu Sep 11 13:54:20 2008 +0000 fix doc bug commit 614537d0dd44bff0d42630f089b704134a873a89 Author: Uwe Schindler Date: Thu Sep 11 10:26:22 2008 +0000 - add missing system identifier (for error messages) in ZipFileHarvester - fix some StringBuilder mis-use commit 0e89e2ea85d116886a32387377f3ccdfe8c1f903 Author: Uwe Schindler Date: Tue Sep 9 17:43:13 2008 +0000 New Harvester: ZipFileHarvester (reads files from ZIP file/URL) commit e5ec23de6830d087e6e98152a13f09427e818290 Author: Uwe Schindler Date: Mon Sep 8 08:38:59 2008 +0000 - Change parameter parsing ("*" optional) - Possibility to detach Jetty, reset default to console.log.properties for debugging - build.xml: move deleting of log files to other node commit d56fbb3306ae3bcbd02584b53039b64a4b1a395a Author: Uwe Schindler Date: Sun Sep 7 16:17:06 2008 +0000 - Update of some Jakarta Commons components - Update Nekohtml - Update license infos with URLs and the SLF4J MIT license commit 2b17e9dcb238cc50a8d53b697902b9fd1212d1a7 Author: Uwe Schindler Date: Sun Sep 7 13:40:40 2008 +0000 Bundle Jetty as webserver for Axis commit 977b5e4c5d03933437ad4b40771a8506d0ea3dd2 Author: Uwe Schindler Date: Sun Sep 7 12:35:28 2008 +0000 - Use "exec" in scripts - README.txt: add note about parameter parsing commit 47377aad0a1190e8ace4435fe1d5426d968bad19 Author: Uwe Schindler Date: Sun Sep 7 11:39:55 2008 +0000 preserve exit code in withlock.sh commit 8dc804eabf6fab289ec65928c4cbcf89c50a12b1 Author: Uwe Schindler Date: Sun Sep 7 11:38:07 2008 +0000 - Locking mechanism for cronjobs - some script names changed - remove "export" in config.sh - change location of harvest.log file commit 99c1981465f25e0d40f436d586dff46d99e4ccc4 Author: Uwe Schindler Date: Sun Sep 7 09:34:11 2008 +0000 make config.sh.inc posix shell compatible commit c68c9e25bbb83527352f487a832e6d2cc178dc1b Author: Uwe Schindler Date: Sun Sep 7 09:23:09 2008 +0000 Fix build script, to not delete the emoty lucene-store commit 927f59cf30d04c977e622b0c4647437fa12fe4c1 Author: Uwe Schindler Date: Sun Sep 7 08:51:13 2008 +0000 Fix build script to include new repository dir in binpackage and delete lucene-store on clean commit 575c15b9d69acd5f9abf770ea5e71ad474d89ab2 Author: Uwe Schindler Date: Sat Sep 6 22:23:47 2008 +0000 first version of ne directory structure with configurable scripts commit ddfbd795354bf42262be09aeb8f1c75099e44f64 Author: Uwe Schindler Date: Sat Sep 6 12:01:09 2008 +0000 remove externals commit 1b50b797c614ceb780b07665351dd1494c470a53 Author: Uwe Schindler Date: Sat Sep 6 11:30:16 2008 +0000 move external libs directory to trunk and delete it commit b27b1348f13155ec64733fbf1c3a4e027655c9e5 Author: Uwe Schindler Date: Fri Sep 5 18:11:46 2008 +0000 Add logging of index ID when collecting results commit bb2fc9d9d1675cbb0c023ff2fd9c6e9d5150ce9e Author: Uwe Schindler Date: Fri Sep 5 17:52:27 2008 +0000 - add new script with class for checking indexes - cleanup .sh files commit 91d6c2251c56b7e11bc23288eb816cc147cee66c Author: Uwe Schindler Date: Fri Sep 5 10:59:40 2008 +0000 - maybe fix the annoying reopen bug...???? - add log to IndexConfig classes commit 3b85798fa3bd53382948d619260a5a1366f4ce62 Author: Uwe Schindler Date: Tue Sep 2 09:34:29 2008 +0000 Big renaming operation of LuceneConversions: - new name: TrieUtils - method names changed Should not bring changes for normal client code commit d05101501db2f92d0848e474436cc90249feeb07 Author: Uwe Schindler Date: Thu Aug 28 12:29:07 2008 +0000 - Update citation of C&G article - Make variables/parameters final in LuceneConversions and TrieRangeQuery commit 701ef78df95480321698e92e4d2f44fb7743990e Author: Uwe Schindler Date: Sun Aug 3 22:57:59 2008 +0000 fix small bug: during interruption of converter thread, the interrupted exception is not handled correctly commit e66f9fb12a1a39d00bb01600aec6bc52740b329d Author: Uwe Schindler Date: Fri Jul 4 08:44:30 2008 +0000 documentation fix commit 2942e9428a98e80ef9185dd435fdd60a0a140eab Author: Uwe Schindler Date: Thu Jul 3 18:21:28 2008 +0000 fix memory leak because InflaterInputStream & Co use native library and should be closed directly after using and not by carbage collector. See http://bugs.sun.com/view_bug.do?bug_id=4797189 commit 132593df8a9eaa827b8c7362abe7e673abd8b9aa Author: Uwe Schindler Date: Fri Jun 20 00:13:37 2008 +0000 more elegant dom tree enumeration commit e46e8e020adeb449105eefb38ec69dc185f78287 Author: Uwe Schindler Date: Thu May 29 06:27:33 2008 +0000 remove unneeded package prefix (package is in imports) commit 9cb1aeaa00025a1ab9e5940833e36d5a446fd68c Author: Uwe Schindler Date: Wed May 28 13:58:55 2008 +0000 remove unneeded synchronization for sessions (Collections.synchronizedMap() uses different mutex, not itsself). commit 17c6d1535033686750cdb9c5559ecbc2ba123b72 Author: Uwe Schindler Date: Tue May 27 18:41:15 2008 +0000 javadoc update (author missing in new file) commit 30237a33012076d69ada3e0231fae9520551bfcb Author: Uwe Schindler Date: Tue May 27 09:47:24 2008 +0000 Make harvester more fault tolerant on conversion errors (e.g. NumberFormatException during XPath). The default is still to stop conversion (important for example if XPath Queries are faulty). When configuration is "tested" it can be switched to ignore conversion errors or delete all faulty documents. commit d86a629d58175534323a26788422fa2077f6ad2f Author: Uwe Schindler Date: Fri May 23 06:29:33 2008 +0000 synchronization added commit 12739a055d5842a91fdd89d9f58fc307ba8b3765 Author: Uwe Schindler Date: Thu May 22 21:41:35 2008 +0000 make SearchResultList members private (not needed outside anymore) commit aa3a056e9bc16aa8358b813878d53ee61f84f981 Author: Uwe Schindler Date: Thu May 22 21:39:13 2008 +0000 get size of SearchResultList list with IOException (outside List-code) commit b657135979dde03db3a8be18f0be6bdfb6f139df Author: Uwe Schindler Date: Wed May 21 13:44:16 2008 +0000 cache factor fix commit 2f5f9f28a1df12358dab5cd81ba8610e0b78e4cf Author: Uwe Schindler Date: Wed May 21 13:16:25 2008 +0000 again fix a bug in new Hits implementation commit f1cf7b523086d1e200f4215d2779f534ffc12806 Author: Uwe Schindler Date: Wed May 21 13:10:33 2008 +0000 logging commit d7f279c71daa7f146ce20f1617eab99ff5d1112b Author: Uwe Schindler Date: Wed May 21 13:08:47 2008 +0000 optimizations commit 7fd417ea6e083d9a081978c2ddc9decff65d9459 Author: Uwe Schindler Date: Wed May 21 12:42:37 2008 +0000 remove soon deprecated "Hits" usage commit 0db382a545231d177981403a173030cd229ba136 Author: Uwe Schindler Date: Wed May 21 09:26:16 2008 +0000 fix uninitialized variable commit fdbe50ea46e35950f10e2ceaf74f329fd5f160d2 Author: Uwe Schindler Date: Mon May 19 09:25:45 2008 +0000 use java.concurrent.locks.* for locking and implement timeout commit 04bb95b8313c9accd7c3bc92bcb9d71f57429d75 Author: Uwe Schindler Date: Sun May 18 12:19:43 2008 +0000 hopefully fix a deadlock... commit 28d4d4b04c1b623df297068766d2f8f88146f8b6 Author: Uwe Schindler Date: Wed Mar 5 22:23:55 2008 +0000 rename tag for augmentation during validation commit a0465e236d769d2643c9cb9ac96e2b0a3b1eb5e4 Author: Uwe Schindler Date: Wed Mar 5 13:06:36 2008 +0000 - Add possibility to not augment documents during validation (default is to do it) - Add search property with QueryParser class commit ebf4b1d81ec70f6f55140c7ab82d3387d48167a7 Author: Uwe Schindler Date: Tue Jan 22 23:20:03 2008 +0000 bug in index reopening => restructure again :) commit ea929a52ce8bbf4bf030de0f18542399fba3539b Author: Uwe Schindler Date: Tue Jan 22 22:37:35 2008 +0000 restructure IndexConfig & Co. commit 9bbc9471152fb2d6836398328d35faaebd22f5b8 Author: Uwe Schindler Date: Sun Jan 20 12:28:39 2008 +0000 update version checking commit 13597baaa0b6ada37f29353b6c1db893fccc349c Author: Uwe Schindler Date: Sun Jan 20 01:05:20 2008 +0000 - new version comparison - check digester version commit 7e3e9b5bc2f02054bfa69e069c4d24ccae1645b4 Author: Uwe Schindler Date: Sat Jan 19 23:04:53 2008 +0000 again the enums... ;-) commit a7044caa81a5af786070296c2aa85a49856e0345 Author: Uwe Schindler Date: Sat Jan 19 10:37:39 2008 +0000 prevent a NPE commit 081836d93ca7121447f70648e19a6a32e8e4eed5 Author: Uwe Schindler Date: Fri Jan 18 13:36:13 2008 +0000 small bug in calculation of next harvesting datestamp in SingleFileEntitiesHarvester commit e3d5629a23f87f0455597fcea1813188af470878 Author: Uwe Schindler Date: Fri Jan 18 13:30:58 2008 +0000 AXIS support for MoreLikeThis commit 9fe6f818808b7a305689553ac158d30a0c603069 Author: Uwe Schindler Date: Fri Jan 18 13:10:30 2008 +0000 MoreLikeThisQuery priorityqueue refactored commit 243a1be16f8931fbc2d3010d9462dbdff6d14754 Author: Uwe Schindler Date: Fri Jan 18 11:03:33 2008 +0000 - documentation error - w/s commit e7d6e51e01d2a2605b0e9839ddacd6a00e95cbdd Author: Uwe Schindler Date: Fri Jan 18 00:28:37 2008 +0000 - remove usage of EnumSet, it is simplier and faster another way - rename entry in SingleFilesEntitiesHarvester.ParseErrorAction commit 09b9817b7a7b14fe89f2e3718c410755c509f05d Author: Uwe Schindler Date: Thu Jan 17 23:49:55 2008 +0000 Cleanup code, made harvester API more clear commit 241cc28dae0b937e5c41b3b500dba196fb5f79ac Author: Uwe Schindler Date: Thu Jan 17 16:04:06 2008 +0000 parseErrorAction values are now checked on based enum and meaningfull error message is printed on harvester startup. commit 3cd98094908687be479990a032b0f67d0c3d1b45 Author: Uwe Schindler Date: Thu Jan 17 15:49:59 2008 +0000 remove dead code part commit e8b5570cfbd01fe31f21a99532cea46180bf0c8e Author: Uwe Schindler Date: Thu Jan 17 14:17:48 2008 +0000 little update in harvester error handler commit 083ec0ed940a65dd88dca2d1565a90fc48b90c7f Author: Uwe Schindler Date: Thu Jan 17 14:06:38 2008 +0000 - Changed Harvester interface to differentiate between clean and unclean shutdown (simplifies handling of index properties like lastHarvestDate and validIdentifiers) - New abstract SingleFileEntitiesHarvester as superclass of WebCrawlingHarvester and DirectoryHarvester that manages similarities and parsing of each file entity with better error reporting. Both harvesters are now able to ignore corrupt XML files during harvesting. - Better error handling in Harvester (SAXParseExceptions and TransformerExceptions are logged with location info commit c14d01b1aaa03eac7de4526385c6d8899b26e485 Author: Uwe Schindler Date: Wed Jan 16 21:36:22 2008 +0000 reformatting commit 6256a5799db4a9d98a078272c2b4e91b39838e15 Author: Uwe Schindler Date: Wed Jan 16 14:10:07 2008 +0000 Add support for "More like this" queries to SearchService. The AXIS implementation is still missing this, but will be added later. commit 209d4e03dd6939aeaca38602000a4e3f8c0dce09 Author: Uwe Schindler Date: Sun Jan 13 23:53:16 2008 +0000 use boost in TrieRangeQuery for hashCode(), toString() and equals() commit 6a817e703fb476051bda7a97bdbfe2b699921b0e Author: Uwe Schindler Date: Sun Jan 13 23:34:08 2008 +0000 Javadoc problem. commit d72ce9c96e710bca1d208e24b4c4191ef45a09b0 Author: Uwe Schindler Date: Sun Jan 13 21:47:46 2008 +0000 Make TrieRangeQuery final commit 4a5729fcae15a1110af134f529c9bc2feb24fa68 Author: Uwe Schindler Date: Sun Jan 13 21:37:33 2008 +0000 small StringBuilder optimization commit 01c90f97dbbda0935d9ff6f487f8be26b0043708 Author: Uwe Schindler Date: Fri Jan 11 10:42:59 2008 +0000 - remove support for threaded virtual indexes (as this makes more problems and is not optimal for heavy sites with much indexes per virtual index) - refactor code of index configuration - cache MultiReader in virtual index for sort performance commit 27c8c4d1fd31397d1d5f25a609f11c2f2df771dc Author: Uwe Schindler Date: Fri Jan 11 08:34:49 2008 +0000 for not threaded virtual indexes (recommened in most cases) use an IndexSearcher over a MultiReader instead a MultiSearcher over separate IndexSearchers commit 81af4ea4a8f9cc9b055a50dc18d123eb4f53926a Author: Uwe Schindler Date: Thu Jan 10 21:45:09 2008 +0000 - reopen corrected (close old reader after reopening) - made closeIndex() abstract commit 3736dc5c760a9e11396bd43ceb7f1460129d21e3 Author: Uwe Schindler Date: Thu Jan 10 14:33:09 2008 +0000 missed a error message rewrite commit 33be73aee766fde5696eeb94cdebfbdf193d0aa0 Author: Uwe Schindler Date: Thu Jan 10 14:30:31 2008 +0000 switch to RC1 of Lucene 2.3, which seems stable. As son as the final version is released I replace the file again. This commit also uses the new IndexReader.reopen() method to quicker reopen indexes after a parallel harvesting. commit 3a483db3f2868ddc0632e2254855a888dd79bddf Author: Uwe Schindler Date: Wed Jan 2 21:46:07 2008 +0000 bump year commit 63de9506ff409235bdcc7c2cbca5996df374f28f Author: Uwe Schindler Date: Wed Jan 2 21:35:37 2008 +0000 - Support of OAI harvesters and WebCrawlingHarvester for a connect/read timeout - Refactor code (remove OAIDownload class) - Remove recursion in download retries commit bcbda97a2e3c0a0dce8c96b5dd3e19f0cca3135f Author: Uwe Schindler Date: Sat Dec 29 10:13:12 2007 +0000 better logging of new session (with index) commit 876ca473996267e93925c97ea63c3a8d33a754bf Author: Uwe Schindler Date: Thu Dec 27 10:19:46 2007 +0000 more effective casts: primitive to objects commit 101a4e1ddcd4ef1aa70cefdfc649d38aa6923475 Author: Uwe Schindler Date: Thu Dec 20 19:08:36 2007 +0000 enable logging in optimizer writer, take #2 commit 4eabf3e640a2aed811986d508b458b3e7dfc7cad Author: Uwe Schindler Date: Thu Dec 20 18:44:12 2007 +0000 enable logging in optimizer writer. commit e37aec0334519ac3aca3388c67210cd44955bf43 Author: Uwe Schindler Date: Thu Dec 20 13:45:58 2007 +0000 * Better logging of conversion errors in indexer * Only store the first error in background threads as failure, later ones are simply logged. commit c1b206987b3d9cc8c795147fd365a2f19fdbcbf4 Author: Uwe Schindler Date: Thu Dec 20 10:57:23 2007 +0000 update nekohtml parser. commit 1a1f89222d42458bf331fc3d3266aa31f7f2303e Author: Uwe Schindler Date: Tue Dec 18 10:32:39 2007 +0000 add URL exclusion filter to WebCrawlingHarvester commit 296beae8c41828d7f7932285496bdd6c2bb0ae40 Author: Uwe Schindler Date: Tue Dec 18 09:58:01 2007 +0000 - Add support for term vectors (no search support for that until now, but indexes supporting them can be build) - Renaming of FieldConfig variables (remove lucene*) commit 3b31c2a614481dfd96ffef88de870f98639ca295 Author: Uwe Schindler Date: Thu Nov 29 08:13:21 2007 +0000 Optimization in session creation commit 0172a00fd65c78cad4292ad18665104141b274e8 Author: Uwe Schindler Date: Wed Nov 28 23:15:51 2007 +0000 remove an outdated comment commit a7fd6f9491a76509dc389048c4894e56bf94ab77 Author: Uwe Schindler Date: Wed Nov 28 23:02:17 2007 +0000 better locking mechanism in search cache cleanup commit 28b3f08d11edfb37c98e4415a9b09fe6db2102b4 Author: Uwe Schindler Date: Wed Nov 28 10:35:27 2007 +0000 fix bug in cache cleanup code by replacing with Common's LRUMap commit 3f14f5caf1c0a509f98e889f5ab2d07390db4a04 Author: Uwe Schindler Date: Sun Nov 11 12:08:37 2007 +0000 API change for valid harvester properties. They are now added to a Set and returned in public API (unmodifiable). Custom harvesters must be changed to this new API. commit 287ee09ab85b6106e99e78e3f2fcc15fadc5f927 Author: Uwe Schindler Date: Sun Nov 11 10:28:05 2007 +0000 small problem in LogUtil.java #2 commit da382ea5f66f53c86251f5bea4bf3bbcac488dd6 Author: Uwe Schindler Date: Sun Nov 11 10:23:56 2007 +0000 small problem in LogUtil.java commit 566944b7e7b19f172781d35d911ee22940ef6b1e Author: Uwe Schindler Date: Sun Nov 11 10:19:58 2007 +0000 Support PrintStream logging of IndexWriter through commons logging system (debug level) to make it possible to track index merges etc. commit 2d7ec4bf3d2eb2c8a3547d7cd57f8c052a5fb41b Author: Uwe Schindler Date: Sat Oct 27 07:23:09 2007 +0000 typos in documentation commit 92156bf3776acc2446cc3a1915f8d5361fc83c17 Author: Uwe Schindler Date: Fri Oct 26 14:09:30 2007 +0000 list and explain harvester properties in each harvester class documentation commit 42f497023291906b5268837b675c6e05feb13ff8 Author: Uwe Schindler Date: Wed Oct 17 08:21:00 2007 +0000 - Renaming methods in LuceneConversions to show data type in method name. - Remove public method for inserting trie-based values in encoded form commit 5f17afa2596980cb8cd6a2fc57a2d15fb24149aa Author: Uwe Schindler Date: Tue Oct 16 23:44:09 2007 +0000 Restructuring methods in LuceneConversions that create Trie based index entries. The methods can now be simplier reused in foreign projects without knowing too much about panFMP internas. commit d4b2553c366f5e317f93f14751031235a03bb1f8 Author: Uwe Schindler Date: Thu Oct 11 09:29:17 2007 +0000 Some cleanups and documentation updates commit 64068c4902790c23c9abc28cdd6043dde5db0c33 Author: Uwe Schindler Date: Wed Oct 10 21:55:15 2007 +0000 fix bug in OAIStaticRepositoryHarvester commit 7ec130eb446822d5951efe06660ee0e1cd952289 Author: Uwe Schindler Date: Wed Oct 10 19:44:47 2007 +0000 Support for If-Modified-Since in OAIStaticRepositoryHarvester commit 2cb854e1903c71b7ec7e066d6ffd8d22960cfd53 Author: Uwe Schindler Date: Wed Oct 10 18:19:05 2007 +0000 new OAIStaticRepositoryHarvester to harvest static repositories. TODO: enable If-Modified-Since for harvesting the static file. commit c6501af7ee8b2a24f376ece0b74012ec781d1dfd Author: Uwe Schindler Date: Sun Oct 7 18:05:48 2007 +0000 documentation for LuceneConversions.java commit 2ad892fbec4df1208a6b673432400b53e0463947 Author: Uwe Schindler Date: Thu Sep 6 12:28:19 2007 +0000 convert spaces to tabs, unix line endings commit c5552f80f2aba065e9ade5d65a37bd12cb8abec3 Author: Uwe Schindler Date: Thu Sep 6 09:41:39 2007 +0000 Add boost support to FieldCheckingQuery commit 38b249d255527c84939cebb2dab4251f8d4bc5a4 Author: Uwe Schindler Date: Thu Sep 6 08:19:41 2007 +0000 add missing "private" commit ae117723c302720209db2c6b25d36eeffffd8afc Author: Uwe Schindler Date: Wed Aug 22 11:47:28 2007 +0000 Make compatible with Digester 1.8 commit 0446b504a0fc84e970ec5cc3004ded3573c8d839 Author: Uwe Schindler Date: Mon Aug 20 21:59:13 2007 +0000 Documentation update. commit 63ec3d9849779a3534fe343c500139b7e2c3ac0a Author: Uwe Schindler Date: Sun Aug 19 20:49:24 2007 +0000 cleanup commit 6818495b2c985ccd7503688988d964112fd183b6 Author: Uwe Schindler Date: Sun Aug 19 20:31:50 2007 +0000 * Introduction of BooleanParser that accepts true, false, yes, no, on, off as values * New harvesterProperty to enable/disable compression of XML (default=true) commit 3675d3c8986c96c56ceb5e14a6fc79ad64cc4119 Author: Uwe Schindler Date: Sun Aug 19 15:59:08 2007 +0000 SearchResultItem: Default is to return field as String (this is foreward-compatible) commit e8cced0402a4393c4016c3fdda531838af55e549 Author: Uwe Schindler Date: Sun Aug 19 14:34:19 2007 +0000 Clean up namespace declarations in templates commit b79518e948fa89a1c02d5dea068a10ced5d7063c Author: Uwe Schindler Date: Sun Aug 19 13:17:55 2007 +0000 Two new datatypes for fields (for stored fields only): * XML: saves the XML of XPath/Template expression as XML string * XHTML: stores a as XHTML string in index. Useful to generate XHTML-thumbnails lucenestorage attribute may now contain COMPRESSED, too. This stores the field result in compressed form in index. Useful e.g. for XHTML fields commit c6926a6d709333d41ebff46fe8bacb2b00e827b7 Author: Uwe Schindler Date: Sun Aug 19 09:35:55 2007 +0000 Documentation updates. commit 3fdde13eaa1e3871979cb571934195606102faef Author: Uwe Schindler Date: Sun Aug 19 08:29:58 2007 +0000 Documentation: one tags for bold too much. commit b9f4d6e39bee51da23f48d49b18d25a775ca3f3a Author: Uwe Schindler Date: Sat Aug 18 12:00:11 2007 +0000 Documentation updates. commit a60647f568dbc7b07d80b03a6ce7ddcd35009bfe Author: Uwe Schindler Date: Fri Aug 17 16:18:25 2007 +0000 Make it possible to hide XML without specifying other fields to load. Now, search always uses a FieldSelector commit de0bbf02f07c5eac77031502ca993a19a5392690 Author: Uwe Schindler Date: Fri Aug 17 12:55:38 2007 +0000 Some cleanups commit 8e90a6b41a214c2b52b653987fd31b1572f14e33 Author: Uwe Schindler Date: Fri Aug 17 09:54:36 2007 +0000 Remove not needed IOExceptions in Query factories. Add MatchAllDocsQuery support. Removed declaration of unchecked NumberFormatExceptions commit 5442c24b5fe1c436286b697b52c01ff191151b58 Author: Uwe Schindler Date: Thu Aug 16 12:44:51 2007 +0000 Make constructors of return objects in webservice protected commit e3d8b0203228c208d11325b13b463082df49d48d Author: Uwe Schindler Date: Thu Aug 16 12:35:53 2007 +0000 New possibility in API and webservice to store querys in cache to retrieve later by a simple hash string. commit 6b749e40a85b7fd7b162e59c395282ce9428fd40 Author: Uwe Schindler Date: Wed Aug 15 17:51:46 2007 +0000 fix bug with anyOf using wrong operator in AXIS webservice commit 4e31aeabec9a4d6662e79a7726fc6fd69ddd6417 Author: Uwe Schindler Date: Wed Aug 15 15:48:16 2007 +0000 Some changes to show Java Collection API list usage for paged results. Implement this in webservice, too. commit 29b8ff40d802dfd8ae3b4bd830e8d69f4145ea74 Author: Uwe Schindler Date: Wed Aug 15 13:37:01 2007 +0000 TODO update commit b7063da855dc4dd4655bd2388c0adf40d63a3d88 Author: Uwe Schindler Date: Wed Aug 15 13:33:51 2007 +0000 New search API. Please read the JavaDocs for examples how to use it. The old AXIS engine was moved to de.pangaea.metadataportal.search.axis. Please use this API only in web services. The new API supports queries in any boolean combination and uses all standard Lucene classes for query construction. commit 4a236b3909e89d168e2f9d2a2eae654f35c03087 Author: Uwe Schindler Date: Wed Aug 8 21:03:03 2007 +0000 Fix not Java but ISO8601 compatible timezone with ":" in LenientDateParser.java commit bfad835703c41dc74b786fe67c03eb00b8f7845a Author: Uwe Schindler Date: Tue Aug 7 21:34:21 2007 +0000 If index is created not updated, do not try to delete unknown identifiers, this is useless. commit b90921fc4db5eb22125f911748564da6d0c550a5 Author: Uwe Schindler Date: Thu Aug 2 22:33:32 2007 +0000 WebCrawlingHarvester: Initial redirect to foreign address allowed, better checking of redirect targets, fixed EOF error when HEAD request with Content-Encoding commit 2fbbeba4679be18eb3812fa7ec7be473df27880b Author: Uwe Schindler Date: Thu Aug 2 16:21:11 2007 +0000 * Fix NPE in toString() of some config classes * Fix content-type parsing with not-lowercase charset * Default log4J logfile with separate logging entry for this package commit 84d99e7ef7051d49448e6ce22f9f34db129e1c1f Author: Uwe Schindler Date: Thu Aug 2 10:20:42 2007 +0000 Update javadoc documentation commit 96aec51759269247d568a578ac1849b28b8f6af2 Author: Uwe Schindler Date: Wed Aug 1 17:18:58 2007 +0000 some changes and documentation, new property to insert a short pause between HTTP requests. commit bbf21a709ca582d69e6c21fc6ce652e051249ae0 Author: Uwe Schindler Date: Wed Aug 1 13:22:01 2007 +0000 * outdated Comments removed * HTML flow changed, some features changed, "Accept:" header changed commit 3b43bda4b2a641e8eacda372f5c8c740a93aa3f8 Author: Uwe Schindler Date: Wed Aug 1 10:59:33 2007 +0000 Small fix in HTML parsing to enable body inside frameset/noframes commit 7cf8fe46ccc76374944f32c6a84e4ac995bd7a58 Author: Uwe Schindler Date: Wed Aug 1 10:51:03 2007 +0000 Replace the unreasonable HTML parser in WebCrawlingHarvester by NekoHTML. commit 8eee7edb7a3fd11519a946568708884d674bd338 Author: Uwe Schindler Date: Tue Jul 31 22:37:04 2007 +0000 Better pattern matching in HTML analyzer of WebCrawlingHarvester commit f1cfc1f6bcea84909f5d5df1b6423a10400eb0cb Author: Uwe Schindler Date: Tue Jul 31 21:19:59 2007 +0000 Default XML charset support for WebCrawlingHarvester commit 910b0a0b78a3e6e62296a9516dca1f27b35e6302 Author: Uwe Schindler Date: Tue Jul 31 20:11:47 2007 +0000 * Added support for harvesters that do not get "deleted" documents (e.g. DirectoryHarvester) to delete "unknown" documents. The harvester can create a set of "valid" identifiers on harvesting and submit that list to the IndexBuilder. After indexing all new documents, this set is synchronized with the index and spare documents deleted. * Added new harvester: WebCrawlingHarvester -- works like WGET and harvests all documents from a directory and its subdirectories on an webserver. It analyzes HTML pages with links and harvests all documents with correct MIME type and extension that are below the initial URL. commit 657576efc487d6134fbfdc107fceb61b5377e45b Author: Uwe Schindler Date: Tue Jul 31 12:09:55 2007 +0000 Remove fromDateReference/thisHarvestDateReference from individual harvester and make it available from the abstract Harvester. thisHarvestDateReference should be set after a successful harvest only. commit 4df873d5213589b91de69566bed3427a148f4939 Author: Uwe Schindler Date: Tue Jul 31 07:42:38 2007 +0000 print SAXParseException correct commit d005687d973b6818c215785a891853af89e58e75 Author: Uwe Schindler Date: Tue Jul 31 07:05:28 2007 +0000 * OAIHarvester: Exceptions, embedded in Digester SAXExceptions, are are reported with correct stack trace (redesign for patch yesterday) * Config dto. commit 7b796898ef1ae133d6cee8a3aa95f0c78d2d3553 Author: Uwe Schindler Date: Mon Jul 30 22:42:58 2007 +0000 * IndexBuilder: Manage the Exceptions in background threads better and only throw a new special IndexBuilderBackgroundFailure to stop harvesting process. The real error is then printed after closing Harvester. * IndexBuilder: Use Java 1.5 atomic API * OAIHarvester: Exceptions, embedded in Digester SAXExceptions, are are reported with correct stack trace. commit 129512a535b295be65ac5fb4d6cde651041fcef0 Author: Uwe Schindler Date: Mon Jul 30 17:14:32 2007 +0000 Use TreeHashMaps/TreeHashSets in some cases to preserve order of indexes and fields. HarvesterCommitEvent was changed and documented to receive Sets of committed identifiers. commit 7db1c00b0d899d7de01a6c13f2c9b7f9c21f0b87 Author: Uwe Schindler Date: Mon Jul 30 15:59:54 2007 +0000 Better error code in IndexBuilder with better messages on shutdown. A second deadlock situation was resolved. commit dc667d9d9ad29c1855478adcb3a6bee96cea77d8 Author: Uwe Schindler Date: Mon Jul 30 12:34:30 2007 +0000 Error in Harvester that used always the same harvester -- hmpf! commit 7d65dbd532272a95dcdc3d858a53e34388abe8b9 Author: Uwe Schindler Date: Mon Jul 30 12:28:55 2007 +0000 * Remove AbstractHarvester and replace by Harvester * Add javadocs to Harvester * New Rebuilder * More small changes. * Deadlock bug in IndexBuilder solved. commit f2aeb67f432346a529e1526cb31c2a4ff0e8d601 Author: Uwe Schindler Date: Sun Jul 29 23:19:48 2007 +0000 fix bug with javadoc generation in older ANT commit 90f609b481d371f4953bb2c9d9f86442d43346ea Author: Uwe Schindler Date: Sun Jul 29 22:59:12 2007 +0000 Cleanup in Rebuilder. commit c302bac1d88ec1fcd713016d569188d5e550a055 Author: Uwe Schindler Date: Sun Jul 29 22:29:33 2007 +0000 * Javadocs updates. * MetadataDocument now with field for config. commit aa99a04817e7bf7f6fff43df2e57fba1825d956e Author: Uwe Schindler Date: Sun Jul 29 20:15:59 2007 +0000 MetadataDocument.invalidateXMLCache() no longer needed commit a44fb5a9ccdc9e1c14b867513e91c4fabe157da3 Author: Uwe Schindler Date: Sun Jul 29 19:49:46 2007 +0000 make MetadataDocument a correct JavaBean commit 85c5c4f09f15504d86bd1bca5e8bebde8a155d36 Author: Uwe Schindler Date: Sun Jul 29 17:34:21 2007 +0000 more simplification in Config (remove unused ExtendedDigester parameters from methods in Config) commit ce780f092ee332dfca12f7832eae9b711b3367ea Author: Uwe Schindler Date: Sun Jul 29 17:23:25 2007 +0000 simplify inner classes of Config commit 03572a859ab9606675eda6e5f68f9e6bd356fd1d Author: Uwe Schindler Date: Sun Jul 29 17:02:17 2007 +0000 simplify TrieRangeQuery #2 (faster, because not private) commit 1908d633c23e155201880c032cceed2ce36df4fe Author: Uwe Schindler Date: Sun Jul 29 16:50:51 2007 +0000 simplify TrieRangeQuery commit a7b272b2c3edcaa69d042dde7f315d20d271731c Author: Uwe Schindler Date: Sun Jul 29 16:44:09 2007 +0000 Link to website in javadocs commit d2779765e9924533e6a8a1a7fed05d43ac97486c Author: Uwe Schindler Date: Sun Jul 29 16:04:24 2007 +0000 add missing @PublicForDigesterUse commit 6f4b03ae9baee2d3160d873e3d8f27dfbc39e0b8 Author: Uwe Schindler Date: Sun Jul 29 16:01:47 2007 +0000 add missing deprecated commit e5bbdb59ec0cbb259d1dc9864cb9052df6693617 Author: Uwe Schindler Date: Sun Jul 29 15:36:30 2007 +0000 New annotation @PublicForDigesterUse which marks methods/classes that are only public for Digester but are not intended to be public. commit d360ae0eba1b021fa4fa8e9a0ecd3a0585e3a7b7 Author: Uwe Schindler Date: Sun Jul 29 14:09:33 2007 +0000 * more @Override * rename AnyExpressionConfig commit cc6615a38e4bc3b0cd80a463fb2c20a7de6f3d3f Author: Uwe Schindler Date: Sun Jul 29 13:58:37 2007 +0000 add @Override where applicable to be sure that overrided method has correct signature commit c7969f76e8602aabca86d333e992e62fc4431b68 Author: Uwe Schindler Date: Sun Jul 29 12:50:09 2007 +0000 * SaxRule redesign * Version printout on Config startup commit 845220c37179c2eaa487d78a45223acb1e112bb0 Author: Uwe Schindler Date: Sat Jul 28 09:59:42 2007 +0000 version information to log on config startup commit e9ddb68ab1c0cb67c1c678131010b2750c4d6dd6 Author: Uwe Schindler Date: Tue Jul 24 15:04:29 2007 +0000 make example config file with a namespace prefix, this helps with xincluded xsl, because default namespace is still valid! commit 9dae86834be839f3358b51412a67a6e2bf15213e Author: Uwe Schindler Date: Tue Jul 24 15:02:10 2007 +0000 make example config file with a namespace prefix, this helps with xincluded xsl, because default namespace is still valid! commit 6e9d7f4e06f66696ff00910a7d8c3ca2876e5065 Author: Uwe Schindler Date: Tue Jul 24 09:26:59 2007 +0000 print list of supported properties on error. commit 970a2f4ebb8ee56be698422d8096f7450b516453 Author: Uwe Schindler Date: Tue Jul 24 09:21:36 2007 +0000 implement checking of harvester property names on config load. commit 5abae4d81ed9ca7cd215c40bfe4c982726f5131f Author: Uwe Schindler Date: Tue Jul 24 07:54:13 2007 +0000 some additional checks for queue sizes and thread count commit 2073c807c3b8660d0588405fd0585c63f7252463 Author: Uwe Schindler Date: Mon Jul 23 22:51:06 2007 +0000 fix bug in harvesterCommitEvent (PangaVista!!!) commit b2bf1ba3556db5d00333a6e341e9ef582ce653c3 Author: Uwe Schindler Date: Mon Jul 23 22:37:13 2007 +0000 implement missing checkIndexerBuffer() in IndexBuilder commit 7f8150e510135153f48c244539bc50f08811c0ca Author: Uwe Schindler Date: Mon Jul 23 20:49:54 2007 +0000 * New IndexBuilder implementation using BlockingQueue's * New harvester properties, see config.xml commit b90c2b655f48fe01f8ca5803b7ac9f9fd0260eba Author: Uwe Schindler Date: Sun Jul 22 20:59:18 2007 +0000 Make Config.TemplateSaxRule compatible with XSLTC by adding dummy namespace prefixes to auto-generated stylesheets commit ac3e475dfe5221ad45c2c1c68a205439760ebd37 Author: Uwe Schindler Date: Sun Jul 22 18:04:08 2007 +0000 * Implement better thread synchronisation between harvester, converter and indexer * remove and from index configuration. Instead put it in a similar way into . This needs a change in config files! By that it is possible to globally enable autoOptimize or disable validation from globalHarvesterProperties commit 012720c694dcb56ec0a023ef045ef8fd0d1c9185 Author: Uwe Schindler Date: Sun Jul 22 16:13:31 2007 +0000 rename "indices" to its correct English name "indexes" commit a660be35084973a8c3512291307fc8d89f64a253 Author: Uwe Schindler Date: Sun Jul 22 15:48:12 2007 +0000 rename "indices" to its correct English name "indexes" commit 8bc9d0dc51c88a08916d33238c8814b1aa692ee0 Author: Uwe Schindler Date: Sun Jul 22 15:41:46 2007 +0000 small fixes commit e3ce224a3c32152b5122c7fc5eeb30f723a8d64b Author: Uwe Schindler Date: Tue Jul 17 07:03:26 2007 +0000 rename function in XPathResolverImpl commit 64e3e8dc85fb009798add623212827defec5f245 Author: Uwe Schindler Date: Mon Jul 16 06:40:30 2007 +0000 supply index config to MetadataDocument.loadFromLucene() commit a9311395ecedcf68f2c26882bab079a0b6bbb1df Author: Uwe Schindler Date: Mon Jul 16 06:36:12 2007 +0000 Exception on document loading from index when identifier empty commit 99bb334fd0dca7d163d688b7c47f7340134e2558 Author: Uwe Schindler Date: Sun Jul 15 22:04:23 2007 +0000 Some cosmetic changes: * Only explicitely start indexer thread on IndexBuilder.close() * logging messages commit 1f3f3f36bd252072ff3ec77c514ed2dd9c5fe395 Author: Uwe Schindler Date: Sun Jul 15 21:29:14 2007 +0000 fix bug of missing document: last converter thread finalizes indexer commit 5a4286db5ca47d02ca970c14e55715ef0e9b3096 Author: Uwe Schindler Date: Sun Jul 15 20:51:18 2007 +0000 Multiple converter threads support (see config.xml). Default=1 commit 58bf49d8ab91bf08f0239374a61f930ede0190db Author: Uwe Schindler Date: Sun Jul 15 19:38:59 2007 +0000 separate Locks in IndexBuilder (new Object()), this will enable more than one converter thread. commit 86733cb3e7be7b939ad89ed06bd35802039568da Author: Uwe Schindler Date: Sun Jul 15 17:07:29 2007 +0000 supply index config to MetadataDocument.createInstanceFromLucene() commit 0e21eec23f20c394d75acf7785e7a938fb02aa0b Author: Uwe Schindler Date: Sun Jul 15 16:31:14 2007 +0000 * Store class name of MetadataDocument class used to store the document. * Remove Sets from standard MetadataDocument * New OAIMetadataDocument with sets * Rebuilding now generates documents using stored class commit 7917d31c17e20bb7ea07821a0e912d6361a26c25 Author: Uwe Schindler Date: Sun Jul 15 14:22:05 2007 +0000 javadocs build update commit b1531d01e3f4086ac2f7a2a098aa8a8de25db18b Author: Uwe Schindler Date: Sun Jul 15 12:57:39 2007 +0000 check that all variables are declared *before* fields and filters commit e0a4338aedfc7d7ecd68a3b7845849b4c871ee1c Author: Uwe Schindler Date: Sat Jul 14 15:31:49 2007 +0000 cleanup imports commit a4a5647fcca7134b05f3884a7ca17741f6936c70 Author: Uwe Schindler Date: Sat Jul 14 15:15:15 2007 +0000 Restructuring of configuration without inner classes #1 commit 2da6e733450993e5b6e2bb2d8157270db06c3658 Author: Uwe Schindler Date: Wed Jul 11 09:29:02 2007 +0000 add identifierPrefix as harvesterProperty to DirectoryHarvester.java. This enables setting a prefix, that is inserted after "file:" and before the relative file name. commit 64ba157996c1f0da1c78420e5478d0a6795c7e1f Author: Uwe Schindler Date: Tue Jul 10 16:46:50 2007 +0000 small update in startThreads() commit afeec98a7f205b03f04cdd69f605ead90ea39f2c Author: Uwe Schindler Date: Tue Jul 10 11:57:34 2007 +0000 Fix failure throwing and exit conditions. commit d7ef1c739af0901410782eef78430598526bf4b0 Author: Uwe Schindler Date: Mon Jul 9 22:21:38 2007 +0000 * Small timestamp bug in DirectoryHarvester.java * Start with JavaDoc comments commit 1589f78290250814fef51150f5a6a46bbf1b65aa Author: Uwe Schindler Date: Mon Jul 9 18:47:11 2007 +0000 Move Lucene version check into Package class. commit e3969639d44450fce61678a40cd5517bc718022c Author: Uwe Schindler Date: Mon Jul 9 17:17:13 2007 +0000 Put a Lucene version check into Config class. commit 7cb42e3267bcaf6cba791fef276976ab7302cbc2 Author: Uwe Schindler Date: Sun Jul 8 21:57:57 2007 +0000 todo update commit 37f54a2051426e30df5a67c278abc505c03ff761 Author: Uwe Schindler Date: Sun Jul 8 21:45:21 2007 +0000 Logging during harvesting centralized in AbstractHarvester.java commit 48ba9ef8d13ebd56deff2715422b8e7bcc3a47c3 Author: Uwe Schindler Date: Sun Jul 8 20:01:38 2007 +0000 Disable memory checking complete, make global harvester properties for IndexBuilder buffers. commit 88b402e740ed783507f2ad6ebb8e159449a40dff Author: Uwe Schindler Date: Sun Jul 8 19:31:24 2007 +0000 Disable memory checking until further investigations commit 777fc7df2682ad2f56772e0923326e864bcb2c68 Author: Uwe Schindler Date: Sun Jul 8 18:14:03 2007 +0000 Enable memory checking for index builder to auto-decrement the buffers on low memory. Added global harvester properties. commit fb153d2990543642130a57ec37ee2e08973d6986 Author: Uwe Schindler Date: Sun Jul 8 17:37:55 2007 +0000 Make Rebuilder a subclass of AbstractHarvester. commit c626b2c6f83b74f5060d2019ef993f7cc9b7fe56 Author: Uwe Schindler Date: Sun Jul 8 14:22:46 2007 +0000 IndexBuilder was extended by an additional thread for convertig the MetadataDocument to Lucene Documents. Harvesting now runs with three threads: * Harvesting (primary) * Converting Documents * Indexing commit dc50e8de9deaa688f9a3ca2748ceac5c6dbba79f Author: Uwe Schindler Date: Sat Jul 7 13:24:35 2007 +0000 make SaxFilter private commit bcad47f205e9a9bb6066eb42ba2d291e02a0dfed Author: Uwe Schindler Date: Fri Jul 6 17:10:47 2007 +0000 implement document boosting NUMBER-returning by XPath commit 9271a737baabe0a632ce9a686f16512405c6723d Author: Uwe Schindler Date: Fri Jul 6 14:39:48 2007 +0000 Fix filters in SEARCH mode commit f642c928d4b2644878857ebfd7182117ce747829 Author: Uwe Schindler Date: Fri Jul 6 14:28:31 2007 +0000 Support for XSL Templates in variables and fields , . Result is treated like a XPath NodeSet and indexed. commit 78f0c9c794f73da62f5ff887ad9f4f4dc3652b9b Author: Uwe Schindler Date: Thu Jul 5 07:44:00 2007 +0000 listTerms with prefix, final for equals() and hashCode(), better hashCodes commit ce7b9a58711b721ab4227d03480b073b626a03e0 Author: Uwe Schindler Date: Tue Jul 3 22:30:04 2007 +0000 warning message about close failure with exception info commit 6572353d5a2152696a699fb298b48c89611084f6 Author: Uwe Schindler Date: Tue Jul 3 22:19:31 2007 +0000 Constants and Enums uppercase (change Config.DataType) commit e79b7cbea47b4ec725a0b08f5f7578d025a5cd98 Author: Uwe Schindler Date: Tue Jul 3 13:57:15 2007 +0000 remove not needed synchronization commit 723103bfd0efe7047e80228f26cdb1219d4bad4a Author: Uwe Schindler Date: Tue Jul 3 09:13:49 2007 +0000 cleanups, new separate QNameParser commit 4e39d54028e8448fba2e0475df7b2b45b2feeda7 Author: Uwe Schindler Date: Tue Jul 3 06:51:51 2007 +0000 Make XPathResolver more clearer structured... commit d4213ed1f0e25528a6a68555ec2fedd9f33db506 Author: Uwe Schindler Date: Mon Jul 2 20:45:28 2007 +0000 - re-implement XPath resolvers - add xpath function to check other indices for duplicates - change behavior of IndexConfig to enable opening of IndexReaders without cache that can be closed - add finallys to correctly close lucene Resources commit 829f10be7520f10a4da5c234d6ef08e8b0a20b37 Author: Uwe Schindler Date: Mon Jul 2 06:58:33 2007 +0000 config.xml did not what it should do commit 1d9d7a7afa57b839a46d336d7de234c3652d5f35 Author: Uwe Schindler Date: Mon Jul 2 06:38:14 2007 +0000 cleanup on Exception during variable processing commit 9b3e1b5f7a50811c50d2af57bf275b9287c2e7ea Author: Uwe Schindler Date: Mon Jul 2 00:26:19 2007 +0000 remove inner class from Rebuilder and implement reconstructing of MetadataDocument from Lucene commit 9c2fef76f67252a3815a86b707c52ac0a3a814e6 Author: Uwe Schindler Date: Sun Jul 1 22:24:06 2007 +0000 REVERT: Context node of XPath is document element not the DOM Document itsself! commit 3a2e306aa62caf000bca8651b2810b0262733f72 Author: Uwe Schindler Date: Sun Jul 1 20:39:26 2007 +0000 debugging functionality commit 9c3bf01c52be66b833a35062b17de1323adaff7c Author: Uwe Schindler Date: Sun Jul 1 16:44:35 2007 +0000 Context node of XPath is document element not the DOM Document itsself! commit 6ba99ef84efd148683674276b85c67c95f8b6ebd Author: Uwe Schindler Date: Sun Jul 1 16:14:59 2007 +0000 Implement filter mechanism. Docs can be filtered during harvesting by specifying one or more XPathes that allow or deny them. commit c968737f37b9c7ffd7bdf46d0c8b3d7444416b73 Author: Uwe Schindler Date: Sun Jul 1 13:06:42 2007 +0000 change metadatq structure for variables to prepare document filtering commit e408645c8fc2b1a425edf181437bb274c3dfbebc Author: Uwe Schindler Date: Sun Jul 1 10:05:15 2007 +0000 TermCheckerSet with correct generics commit 94de7e24e9d19a29cfdce9cb7beb73153307f785 Author: Uwe Schindler Date: Sat Jun 30 08:37:15 2007 +0000 better analyzer from classname generator commit 4e2883a3756866198f4c33b115a5ca70708f2e9c Author: Uwe Schindler Date: Sat Jun 30 08:12:18 2007 +0000 enable check for deprecation and unchecked commit dc36efa5a3e41380eb49d5d8dea6e471f889efdd Author: Uwe Schindler Date: Sat Jun 30 08:11:24 2007 +0000 remove deprecation and unchecked warnings commit 853cae9e87ec9e37ef0632ac00b09bcdef413b6a Author: Uwe Schindler Date: Fri Jun 29 09:47:51 2007 +0000 New Feature: XPath variables in : you can define XPath variables like in XSLT before the field definitions and use these variables in other XPathes commit 62ea9b704956df588fe379904ddeca3c7bcb964c Author: Uwe Schindler Date: Tue Jun 26 19:37:10 2007 +0000 fix issue with orphaned files on optimize after harvesting/rebuilding commit 8d022dd0a7989f96f842edbbbcb04656f4440115 Author: Uwe Schindler Date: Tue Jun 26 19:13:16 2007 +0000 fix build.xml update commit 3358ac3a09a084c1a2385b2a9ad65e8f0f1c09af Author: Uwe Schindler Date: Tue Jun 26 19:03:39 2007 +0000 fix build.xml update commit 535857d6d36892e7f0a49d3cb17f42365602a937 Author: Uwe Schindler Date: Tue Jun 26 19:01:39 2007 +0000 make scripts & config working enable building of source package commit 226c5ffc87f377edde7d992b6f058b90333a625a Author: Uwe Schindler Date: Tue Jun 26 17:07:46 2007 +0000 fix build.xml update commit c8095027e5fe5c14d4f37eda0392fadfc51a9cee Author: Uwe Schindler Date: Tue Jun 26 17:06:42 2007 +0000 build.xml update commit 5b9b09e4b7794db06d5278e1d5e27c47ddce669b Author: Uwe Schindler Date: Tue Jun 26 14:28:49 2007 +0000 build.xml update commit 512b0e59fda9afb7432000adac2c9a25d6790bec Author: Uwe Schindler Date: Tue Jun 26 13:21:41 2007 +0000 build system with version numbers and manifest commit 690f187d69abc73724325d302a35bafca921b83f Author: Uwe Schindler Date: Tue Jun 26 10:12:12 2007 +0000 javadoc fix commit 4c06e266e8b8ccf72b8ea535b9c75fb6a09ac72e Author: Uwe Schindler Date: Mon Jun 25 12:27:08 2007 +0000 IndexBuilder error handling on close commit c5d95257e009182e57ed57ff1c876432025f8b95 Author: Uwe Schindler Date: Fri Jun 22 22:44:41 2007 +0000 rebuilder opens index before creating indexbuilder. this helps to support complete rebuild (create=true). commit 8be11eaa08ce93e2948c4ff63896ff369553203a Author: Uwe Schindler Date: Fri Jun 22 22:22:43 2007 +0000 IndexBuilder update #2 commit cbda36d1b7c440df73b7ec5608e274eb745e6089 Author: Uwe Schindler Date: Fri Jun 22 22:12:33 2007 +0000 IndexBuilder rewritten to not use IndexReader to delete/update docs. This can be done with IndexWriter directly. commit 4a7e8a6e370e44422b939d33f12a88c52a4825b9 Author: Uwe Schindler Date: Fri Jun 22 19:16:23 2007 +0000 Fix IndexBuilder close, activate Lucene 2.2 setAllowDocsOutOfOrder in BooleanQuery commit 454a00abaf7b2d142dc2b7a8b8474e38e82b7b0c Author: Uwe Schindler Date: Fri Jun 22 17:13:14 2007 +0000 documentation things commit bd381ca55c4e49623746ffe79266001f4285fc47 Author: Uwe Schindler Date: Fri Jun 22 16:45:14 2007 +0000 Rename AdvRangeQuery & others to TrieRangeQuery (2) commit 060490194bf2b861fad3292666785ce601edd410 Author: Uwe Schindler Date: Fri Jun 22 16:24:04 2007 +0000 Rename AdvRangeQuery & others to TrieRangeQuery commit 40f3db0858e98f0f5c43fc374b06cf60e9e44e68 Author: Uwe Schindler Date: Thu Jun 21 18:12:21 2007 +0000 version number commit 267411c34456c2571f6626f9bc6db841a3377221 Author: Uwe Schindler Date: Thu Jun 21 17:15:50 2007 +0000 initial import