|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.pangaea.metadataportal.search.SearchService
public class SearchService
This class is the main entry point to panFMP's search engine.
import de.pangaea.metadataportal.search.*;
import org.apache.lucene.search.*;
import java.util.List;
...
// create a search service
SearchService service=new SearchService("config.xml", "indexname");
// build a query
BooleanQuery bq=service.newBooleanQuery();
bq.add(service.newDefaultFieldQuery("a search query for the simple search"), BooleanClause.Occur.MUST);
bq.add(service.newNumericRangeQuery("longitude", -20.0, 10.0), BooleanClause.Occur.MUST);
bq.add(service.newNumericRangeQuery("latitude", null, 30.5), BooleanClause.Occur.MUST);
Retrieve sorted results as a listing (works good for web pages that display search results like Google with paging). If the query was previously executed, it may return the results from cache:
// create a Sort, if you want standard sorting by relevance use sort=null
Sort sort=service.newSort(service.newFieldBasedSort("longitude", false));
// start search
SearchResultList list=service.search(bq,sort);
// print search results (start is item to start with, count is number of results)
int start=0,count=10;
List<SearchResultItem> page=list.subList(
Math.min(start, list.size()),
Math.min(start+count, list.size())
);
for (SearchResultItem item : page) {
System.out.println(item.getIdentifier());
}
It is good to know that SearchResultList implements the List interface. This makes it possible to
use the standard Java Collection API to access search results as you can see in the example.
Retrieve a large number of results in unsorted order through a SearchResultCollector. This is recommended for creating
large files with thousands of results or processing map data because iterating over a SearchResultList is very slow
and is expensive in memory consumption!!!
service.search(new SearchResultCollector() {
public boolean collect(SearchResultItem item) {
System.out.println(item.getIdentifier());
return true; // return false to stop collecting results
}
}, bq);
search(Query,Sort) that return the whole XML document and all stored fields.
This is like a select * from table in SQL.
This is not recommended especially when collecting a large number of results! It is better to fetch only fields
needed for processing (like a SQL select column1,column2 from table). This can be done by
special search(Query,Sort,boolean,Collection) methods accepting lists of fields.
To configure this class, use search properties in your config file (these are the defaults):
<queryParserClass>org.apache.lucene.queryParser.QueryParser</queryParserClass> <defaultQueryParserOperator>AND</defaultQueryParserOperator>More search properties are listed in
LuceneCache.
| Field Summary | |
|---|---|
protected LuceneCache |
cache
|
protected int |
collectorBufferSize
|
protected org.apache.lucene.queryParser.QueryParser.Operator |
defaultQueryParserOperator
|
protected IndexConfig |
index
|
protected Class<? extends org.apache.lucene.queryParser.QueryParser> |
queryParserClass
|
protected Constructor<? extends org.apache.lucene.queryParser.QueryParser> |
queryParserConstructor
|
| Constructor Summary | |
|---|---|
SearchService(String cfgFile,
String indexId)
Main constructor that initializes a SearchService. |
|
| Method Summary | |
|---|---|
Config |
getConfig()
Return the underlying configuration |
SearchResultItem |
getDocument(String identifier)
Reads one document from index using its identifier. |
SearchResultItem |
getDocument(String identifier,
boolean loadXml,
Collection<String> fieldsToLoad)
Reads one document from index using its identifier. |
SearchResultItem |
getDocument(String identifier,
boolean loadXml,
String... fieldName)
Reads one document from index using its identifier. |
IndexConfig |
getIndexConfig()
Return the underlying index configuration |
List<String> |
listTerms(String fieldName,
int count)
Returns a list of terms for fields of type FieldConfig.DataType.STRING. |
List<String> |
listTerms(String fieldName,
String prefix,
int count)
Returns a list of terms for fields of type FieldConfig.DataType.STRING. |
org.apache.lucene.search.BooleanQuery |
newBooleanQuery()
Constructs a BooleanQuery. |
org.apache.lucene.search.Query |
newDateRangeQuery(String fieldName,
Calendar min,
Calendar max)
Constructs a Query for querying a FieldConfig.DataType.DATETIME field. |
org.apache.lucene.search.Query |
newDateRangeQuery(String fieldName,
Date min,
Date max)
Constructs a Query for querying a FieldConfig.DataType.DATETIME field. |
org.apache.lucene.search.Query |
newDateRangeQuery(String fieldName,
String min,
String max)
Constructs a Query for querying a FieldConfig.DataType.DATETIME field. |
org.apache.lucene.search.Query |
newDefaultFieldQuery(String query)
Constructs a Query for querying the default field. |
org.apache.lucene.search.Query |
newDefaultFieldQuery(String query,
org.apache.lucene.queryParser.QueryParser.Operator operator)
Constructs a Query for querying the default field. |
MoreLikeThisQuery |
newDefaultMoreLikeThisQuery(String identifier)
Constructs a Query for matching all documents similar to the given one (by identifier). |
org.apache.lucene.search.SortField |
newFieldBasedSort(String fieldName,
boolean reverse)
Constructs a SortField instance to sort the results of a query based on a field. |
MoreLikeThisQuery |
newFieldedMoreLikeThisQuery(String identifier,
String fieldName)
Constructs a Query for matching all documents whose contents on a specific field are similar to the given document's one (by identifier). |
org.apache.lucene.search.Query |
newMatchAllDocsQuery()
Constructs a MatchAllDocsQuery. |
org.apache.lucene.search.Query |
newNumericRangeQuery(String fieldName,
Double min,
Double max)
Constructs a Query for querying a FieldConfig.DataType.NUMBER field. |
org.apache.lucene.search.Query |
newNumericRangeQuery(String fieldName,
Number min,
Number max)
Constructs a Query for querying a FieldConfig.DataType.NUMBER field. |
org.apache.lucene.search.Query |
newNumericRangeQuery(String fieldName,
String min,
String max)
Constructs a Query for querying a FieldConfig.DataType.NUMBER field. |
org.apache.lucene.search.Sort |
newSort(org.apache.lucene.search.SortField... sortFields)
Constructs a Sort instance to sort the results of a query based on different fields (like a SELECT ... |
org.apache.lucene.search.Query |
newTextQuery(String fieldName,
String query)
Constructs a Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or
FieldConfig.DataType.STRING field. |
org.apache.lucene.search.Query |
newTextQuery(String fieldName,
String query,
org.apache.lucene.queryParser.QueryParser.Operator operator)
Constructs a Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or
FieldConfig.DataType.STRING field. |
protected org.apache.lucene.search.Query |
parseQuery(String fieldName,
String query,
org.apache.lucene.queryParser.QueryParser.Operator operator)
Override in a subclass to use another query parser. |
org.apache.lucene.search.Query |
readStoredQuery(UUID uuid)
Reads a query identified by a hash code from the cache. |
SearchResultList |
search(org.apache.lucene.search.Query query)
Executes search and returns search results with default sorting by relevance. |
SearchResultList |
search(org.apache.lucene.search.Query query,
boolean loadXml,
Collection<String> fieldsToLoad)
Executes search and returns search results with default sorting by relevance. |
SearchResultList |
search(org.apache.lucene.search.Query query,
boolean loadXml,
String... fieldName)
Executes search and returns search results with default sorting by relevance. |
SearchResultList |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Sort sort)
Executes search and returns search results. |
SearchResultList |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Sort sort,
boolean loadXml,
Collection<String> fieldsToLoad)
Executes search and returns search results. |
SearchResultList |
search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Sort sort,
boolean loadXml,
String... fieldName)
Executes search and returns search results. |
void |
search(SearchResultCollector collector,
org.apache.lucene.search.Query query)
Executes search and feeds search results to the supplied SearchResultCollector. |
void |
search(SearchResultCollector collector,
org.apache.lucene.search.Query query,
boolean loadXml,
Collection<String> fieldsToLoad)
Executes search and feeds search results to the supplied SearchResultCollector. |
void |
search(SearchResultCollector collector,
org.apache.lucene.search.Query query,
boolean loadXml,
String... fieldName)
Executes search and feeds search results to the supplied SearchResultCollector. |
void |
setCollectorBufferSize(int bufferSize)
Sets the buffer size of search methods using a SearchResultCollector. |
UUID |
storeQuery(org.apache.lucene.search.Query query)
Stores a query for later use in the cache. |
List<String> |
suggest(String query,
int count)
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces. |
List<String> |
suggest(String query,
org.apache.lucene.queryParser.QueryParser.Operator operator,
int count)
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces. |
List<String> |
suggest(String fieldName,
String query,
int count)
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces. |
List<String> |
suggest(String fieldName,
String query,
org.apache.lucene.queryParser.QueryParser.Operator operator,
int count)
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected LuceneCache cache
protected IndexConfig index
protected Class<? extends org.apache.lucene.queryParser.QueryParser> queryParserClass
protected Constructor<? extends org.apache.lucene.queryParser.QueryParser> queryParserConstructor
protected org.apache.lucene.queryParser.QueryParser.Operator defaultQueryParserOperator
protected int collectorBufferSize
| Constructor Detail |
|---|
public SearchService(String cfgFile,
String indexId)
throws Exception
SearchService. The underlying LuceneCache is a singleton per config file,
so you can create more than one instance of this class without additional memory consumption.
cfgFile - file name and path of configuration file
Exception| Method Detail |
|---|
public org.apache.lucene.search.BooleanQuery newBooleanQuery()
BooleanQuery. Use this query type to combine different query types from the factory methods
(native Lucene Query are useable, too). The current version is equivalent to
BooleanQuery bq=new BooleanQuery();
but this should be avoided to make further extensions to this class possible.
public org.apache.lucene.search.Query newTextQuery(String fieldName,
String query,
org.apache.lucene.queryParser.QueryParser.Operator operator)
throws org.apache.lucene.queryParser.ParseException
Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or
FieldConfig.DataType.STRING field.
String fields are not parsed by the query parser. They will be matched exact.
Tokenized text fields are parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expand to different query types combined by a BooleanQuery.
The query parser will use the given default operator.
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
org.apache.lucene.queryParser.ParseException
public org.apache.lucene.search.Query newTextQuery(String fieldName,
String query)
throws org.apache.lucene.queryParser.ParseException
Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or
FieldConfig.DataType.STRING field.
String fields are not parsed by the query parser. They will be matched exact.
Tokenized text fields are parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expand to different query types combined by a BooleanQuery.
The query parser uses the default query operator (AND), which can be configured by the search property "defaultQueryParserOperator"
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
org.apache.lucene.queryParser.ParseException
public org.apache.lucene.search.Query newDefaultFieldQuery(String query,
org.apache.lucene.queryParser.QueryParser.Operator operator)
throws org.apache.lucene.queryParser.ParseException
Query for querying the default field.
The query is parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expands to different query types combined by a BooleanQuery.
The query parser will use the given default operator.
org.apache.lucene.queryParser.ParseException
public org.apache.lucene.search.Query newDefaultFieldQuery(String query)
throws org.apache.lucene.queryParser.ParseException
Query for querying the default field.
The query is parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expands to different query types combined by a BooleanQuery.
The query parser uses the default query operator (AND), which can be configured by the search property "defaultQueryParserOperator"
org.apache.lucene.queryParser.ParseExceptionpublic MoreLikeThisQuery newDefaultMoreLikeThisQuery(String identifier)
Query for matching all documents similar to the given one (by identifier).
The default field must have term vectors enabled.
The query may be configured by setting its properties after creation.
newFieldedMoreLikeThisQuery(java.lang.String, java.lang.String)
public MoreLikeThisQuery newFieldedMoreLikeThisQuery(String identifier,
String fieldName)
Query for matching all documents whose contents on a specific field are similar to the given document's one (by identifier).
This is based on the indexed terms in the given field name. The field must have term vectors enabled.
The query may be configured by setting its properties after creation.
newDefaultMoreLikeThisQuery(java.lang.String)
public org.apache.lucene.search.Query newDateRangeQuery(String fieldName,
Date min,
Date max)
Query for querying a FieldConfig.DataType.DATETIME field.
min - Minimum value as Date or null if lower bound openmax - Maximum value as Date or null if upper bound open
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
public org.apache.lucene.search.Query newDateRangeQuery(String fieldName,
Calendar min,
Calendar max)
Query for querying a FieldConfig.DataType.DATETIME field.
min - Minimum value as Calendar or null if lower bound open; the Calendar is internally converted to a Datemax - Maximum value as Calendar or null if upper bound open; the Calendar is internally converted to a Date
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
public org.apache.lucene.search.Query newDateRangeQuery(String fieldName,
String min,
String max)
throws ParseException
Query for querying a FieldConfig.DataType.DATETIME field.
min - Minimum value as String or null if lower bound open; the String is parsed by LenientDateParsermax - Maximum value as String or null if upper bound open; the String is parsed by LenientDateParser
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
ParseException - if one of the boundaries is not a parseable date string
public org.apache.lucene.search.Query newNumericRangeQuery(String fieldName,
Double min,
Double max)
Query for querying a FieldConfig.DataType.NUMBER field.
min - Minimum value as Double or null if lower bound openmax - Maximum value as Double or null if upper bound open
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
public org.apache.lucene.search.Query newNumericRangeQuery(String fieldName,
Number min,
Number max)
Query for querying a FieldConfig.DataType.NUMBER field.
min - Minimum value as Number or null if lower bound open; the Number is internally converted to a Double,
but it can be any numeric Java typemax - Maximum value as Number or null if upper bound open; the Number is internally converted to a Double,
but it can be any numeric Java type
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
public org.apache.lucene.search.Query newNumericRangeQuery(String fieldName,
String min,
String max)
Query for querying a FieldConfig.DataType.NUMBER field.
min - Minimum value as String or null if lower bound open; the String is parsed according to Java standardmax - Maximum value as String or null if upper bound open; the String is parsed according to Java standard
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
NumberFormatException - if one of the boundaries is not a parseable numeric stringpublic org.apache.lucene.search.Query newMatchAllDocsQuery()
MatchAllDocsQuery. Use this to generate a query that matches all documents. It is useful under
two circumstances:
BooleanQuery containing this MatchAllDocsQuery with BooleanClause.Occur.MUST
and the excluding query with BooleanClause.Occur.MUST_NOTThe current version is equivalent to
Query q=new MatchAllDocsQuery();
but this should be avoided to make further extensions to this class possible (e.g. in future indexes may contain documents
marked as "deleted" that should not be returned).
public org.apache.lucene.search.SortField newFieldBasedSort(String fieldName,
boolean reverse)
SortField instance to sort the results of a query based on a field.
IllegalFieldConfigException - if the configuration of fieldName is not valid for sorting or it is unknown:
The field must be indexed, but not be tokenized, and does not need to be stored (unless you happen to want it back with the rest of your document data).
Please note: You cannot sort fields that are only stored!
public org.apache.lucene.search.Sort newSort(org.apache.lucene.search.SortField... sortFields)
Sort instance to sort the results of a query based on different fields (like a SELECT ... ORDER BY clause in SQL).
This implementation constructs Sort in a way that a search result is sorted by relevance
if all other criterias are the same for two search results.
sortFields - a VARARG parameter consisting of a number of previously generated SortField (using newFieldBasedSort(java.lang.String, boolean))
public List<String> suggest(String fieldName,
String query,
org.apache.lucene.queryParser.QueryParser.Operator operator,
int count)
throws IOException
fieldName - contains the field name for a field-specific input field. If you want suggestion for the default field use suggest(String,int)query - is the query string the user have typed in. It will be parsed and the last term found in it is expandedoperator - is the default operator used by the query parsercount - limits the number of results
query
IllegalFieldConfigException - if the configuration of fieldName does not match data type TOKENIZEDTEXT or it is unknown
IOException
public List<String> suggest(String fieldName,
String query,
int count)
throws IOException
fieldName - contains the field name for a field-specific input field. If you want suggestion for the default field use suggest(String,int)query - is the query string the user have typed in. It will be parsed and the last term found in it is expandedcount - limits the number of results
query
IllegalFieldConfigException - if the configuration of fieldName does not match data type TOKENIZEDTEXT or it is unknown
IOException
public List<String> suggest(String query,
int count)
throws IOException
query - is the query string the user have typed in. It will be parsed and the last term found in it is expandedcount - limits the number of results
query
IOException
public List<String> suggest(String query,
org.apache.lucene.queryParser.QueryParser.Operator operator,
int count)
throws IOException
query - is the query string the user have typed in. It will be parsed and the last term found in it is expandedoperator - is the default operator used by the query parsercount - limits the number of results
query
IOException
public List<String> listTerms(String fieldName,
String prefix,
int count)
throws IOException
FieldConfig.DataType.STRING.
fieldName - contains the field name for which terms should be listed.prefix - limits the returned list to terms starting with prefix. Set to "" for a full list.count - limits the number of results
IllegalFieldConfigException - if the configuration of fieldName does not match data type STRING or it is unknown
IOException
public List<String> listTerms(String fieldName,
int count)
throws IOException
FieldConfig.DataType.STRING.
fieldName - contains the field name for which terms should be listed.count - limits the number of results
IllegalFieldConfigException - if the configuration of fieldName does not match data type STRING or it is unknown
IOException
public SearchResultList search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Sort sort,
boolean loadXml,
Collection<String> fieldsToLoad)
throws IOException
query - the previously constructed querysort - if you want to sort search results supply a Sort instance that describes the search (use newSort(org.apache.lucene.search.SortField...) for that).
Supply null for default sorting (by relevance backwards).loadXml - return the XML blob of search results.fieldsToLoad - a collection of field names that should be made available. null to return all fields.
IOException
public SearchResultList search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Sort sort,
boolean loadXml,
String... fieldName)
throws IOException
IOExceptionsearch(Query,Sort,boolean,Collection)
public SearchResultList search(org.apache.lucene.search.Query query,
org.apache.lucene.search.Sort sort)
throws IOException
IOExceptionsearch(Query,Sort,boolean,Collection)
public SearchResultList search(org.apache.lucene.search.Query query,
boolean loadXml,
Collection<String> fieldsToLoad)
throws IOException
IOExceptionsearch(Query,Sort,boolean,Collection)
public SearchResultList search(org.apache.lucene.search.Query query,
boolean loadXml,
String... fieldName)
throws IOException
IOExceptionsearch(Query,Sort,boolean,String...)
public SearchResultList search(org.apache.lucene.search.Query query)
throws IOException
IOExceptionsearch(Query,Sort)
public void search(SearchResultCollector collector,
org.apache.lucene.search.Query query,
boolean loadXml,
Collection<String> fieldsToLoad)
throws IOException
SearchResultCollector.
Note: Scores of returned documents are raw scores and not normalized to 0.0<score<=1.0.
collector - a class implementing interface SearchResultCollectorquery - the previously constructed queryloadXml - return the XML blob of search resultsfieldsToLoad - a collection of field names that should be made available. null to return all fields.
IOException
public void search(SearchResultCollector collector,
org.apache.lucene.search.Query query,
boolean loadXml,
String... fieldName)
throws IOException
SearchResultCollector.
This version uses VARARGs to list field names to return.
IOExceptionsearch(SearchResultCollector,Query,boolean,Collection)
public void search(SearchResultCollector collector,
org.apache.lucene.search.Query query)
throws IOException
SearchResultCollector.
All fields are returned.
IOExceptionsearch(SearchResultCollector,Query,boolean,Collection)public void setCollectorBufferSize(int bufferSize)
SearchResultCollector.
The buffer is filled with document ids and scores during search and when full, all notifications to the
collector are done in a bulk operation which fetches the document fields from index. Fetching document fields
on every found document id degrades performance by a order of magnitude because of heavy I/O.
Default size is 32768 which is suitable for most use cases. If you exspect
large counts of documents to be processed and you have a lot of memory increase this value.
The buffer is not needed to buffer the whole documents, it only contains ids and scores, so large values are ok.
search(SearchResultCollector,Query,boolean,Collection)
public SearchResultItem getDocument(String identifier,
boolean loadXml,
Collection<String> fieldsToLoad)
throws IOException
SearchResultItem will be set to 1.0.
IOException
public SearchResultItem getDocument(String identifier,
boolean loadXml,
String... fieldName)
throws IOException
IOExceptiongetDocument(String,boolean,Collection)
public SearchResultItem getDocument(String identifier)
throws IOException
IOExceptiongetDocument(String,boolean,Collection)public Config getConfig()
public IndexConfig getIndexConfig()
public UUID storeQuery(org.apache.lucene.search.Query query)
This function can be used to store a query a user generated in your web interface for later use, e.g. if you generate a query by the web service interface of panFMP and want to use it later in a Java Servlet for generating a geographical map of document locations using the Collector API. In this case you store the query using the web service and supply the hash code as a parameter to the servlet. See the examples for that.
This method returns a UUID, the AXIS webservice itsself uses Strings (as there is no spec. for UUIDs in WSDL files),
you may convert the result to a String with UUID.toString().
query - the query to store.
readStoredQuery(java.util.UUID)public org.apache.lucene.search.Query readStoredQuery(UUID uuid)
uuid - the UUID returned by storeQuery(org.apache.lucene.search.Query). The AXIS webservice itsself uses Strings
(as there is no spec. for UUIDs in WSDL files); you may convert the String from the webservice
with UUID.fromString(String) to an UUID.
null if the hash code does not specify a query. This method may return null,
even if a query identified by this hash existed in the past, when the query store is full and older (LRU) queries are removed by
previous storeQuery(org.apache.lucene.search.Query) calls.storeQuery(org.apache.lucene.search.Query)
protected org.apache.lucene.search.Query parseQuery(String fieldName,
String query,
org.apache.lucene.queryParser.QueryParser.Operator operator)
throws org.apache.lucene.queryParser.ParseException
fieldName - the expanded field name of the field that is used as default when creating queries (when no prefix-notation is used)query - the query string entered by the useroperator - the default operator passed to QueryParser
org.apache.lucene.queryParser.ParseExceptionQueryParser
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||