de.pangaea.metadataportal.search
Class SearchService

java.lang.Object
  extended by de.pangaea.metadataportal.search.SearchService

public class SearchService
extends Object

This class is the main entry point to panFMP's search engine.

To start a query with panFMP do the following:

 import de.pangaea.metadataportal.search.*;
 import org.apache.lucene.search.*;
 import java.util.List;
 ...

 // create a search service
 SearchService service=new SearchService("config.xml", "indexname");
 // build a query
 BooleanQuery bq=service.newBooleanQuery();
 bq.add(service.newDefaultFieldQuery("a search query for the simple search"), BooleanClause.Occur.MUST);
 bq.add(service.newNumericRangeQuery("longitude", -20.0, 10.0), BooleanClause.Occur.MUST);
 bq.add(service.newNumericRangeQuery("latitude", null, 30.5), BooleanClause.Occur.MUST);
 

You have two possibilities to start the search:

Both methods use the standard forms of search(Query,Sort) that return the whole XML document and all stored fields. This is like a select * from table in SQL. This is not recommended especially when collecting a large number of results! It is better to fetch only fields needed for processing (like a SQL select column1,column2 from table). This can be done by special search(Query,Sort,boolean,Collection) methods accepting lists of fields.

To configure this class, use search properties in your config file (these are the defaults):

<queryParserClass>org.apache.lucene.queryParser.QueryParser</queryParserClass>
<defaultQueryParserOperator>AND</defaultQueryParserOperator>
More search properties are listed in LuceneCache.

Author:
Uwe Schindler

Field Summary
protected  LuceneCache cache
           
protected  int collectorBufferSize
           
protected  QueryParser.Operator defaultQueryParserOperator
           
protected  IndexConfig index
           
protected  Class<? extends QueryParser> queryParserClass
           
protected  Constructor<? extends QueryParser> queryParserConstructor
           
 
Constructor Summary
SearchService(String cfgFile, String indexId)
          Main constructor that initializes a SearchService.
 
Method Summary
 Config getConfig()
          Return the underlying configuration
 SearchResultItem getDocument(String identifier)
          Reads one document from index using its identifier.
 SearchResultItem getDocument(String identifier, boolean loadXml, Collection<String> fieldsToLoad)
          Reads one document from index using its identifier.
 SearchResultItem getDocument(String identifier, boolean loadXml, String... fieldName)
          Reads one document from index using its identifier.
 IndexConfig getIndexConfig()
          Return the underlying index configuration
 List<String> listTerms(String fieldName, int count)
          Returns a list of terms for fields of type FieldConfig.DataType.STRING.
 List<String> listTerms(String fieldName, String prefix, int count)
          Returns a list of terms for fields of type FieldConfig.DataType.STRING.
 BooleanQuery newBooleanQuery()
          Constructs a BooleanQuery.
 Query newDateRangeQuery(String fieldName, Calendar min, Calendar max)
          Constructs a Query for querying a FieldConfig.DataType.DATETIME field.
 Query newDateRangeQuery(String fieldName, Date min, Date max)
          Constructs a Query for querying a FieldConfig.DataType.DATETIME field.
 Query newDateRangeQuery(String fieldName, String min, String max)
          Constructs a Query for querying a FieldConfig.DataType.DATETIME field.
 Query newDefaultFieldQuery(String query)
          Constructs a Query for querying the default field.
 Query newDefaultFieldQuery(String query, QueryParser.Operator operator)
          Constructs a Query for querying the default field.
 MoreLikeThisQuery newDefaultMoreLikeThisQuery(String identifier)
          Constructs a Query for matching all documents similar to the given one (by identifier).
 SortField newFieldBasedSort(String fieldName, boolean reverse)
          Constructs a SortField instance to sort the results of a query based on a field.
 MoreLikeThisQuery newFieldedMoreLikeThisQuery(String identifier, String fieldName)
          Constructs a Query for matching all documents whose contents on a specific field are similar to the given document's one (by identifier).
 Query newMatchAllDocsQuery()
          Constructs a MatchAllDocsQuery.
 Query newNumericRangeQuery(String fieldName, Double min, Double max)
          Constructs a Query for querying a FieldConfig.DataType.NUMBER field.
 Query newNumericRangeQuery(String fieldName, Number min, Number max)
          Constructs a Query for querying a FieldConfig.DataType.NUMBER field.
 Query newNumericRangeQuery(String fieldName, String min, String max)
          Constructs a Query for querying a FieldConfig.DataType.NUMBER field.
 Sort newSort(SortField... sortFields)
          Constructs a Sort instance to sort the results of a query based on different fields (like a SELECT ...
 Query newTextQuery(String fieldName, String query)
          Constructs a Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or FieldConfig.DataType.STRING field.
 Query newTextQuery(String fieldName, String query, QueryParser.Operator operator)
          Constructs a Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or FieldConfig.DataType.STRING field.
protected  Query parseQuery(String fieldName, String query, QueryParser.Operator operator)
          Override in a subclass to use another query parser.
 Query readStoredQuery(UUID uuid)
          Reads a query identified by a hash code from the cache.
 SearchResultList search(Query query)
          Executes search and returns search results with default sorting by relevance.
 SearchResultList search(Query query, boolean loadXml, Collection<String> fieldsToLoad)
          Executes search and returns search results with default sorting by relevance.
 SearchResultList search(Query query, boolean loadXml, String... fieldName)
          Executes search and returns search results with default sorting by relevance.
 SearchResultList search(Query query, Sort sort)
          Executes search and returns search results.
 SearchResultList search(Query query, Sort sort, boolean loadXml, Collection<String> fieldsToLoad)
          Executes search and returns search results.
 SearchResultList search(Query query, Sort sort, boolean loadXml, String... fieldName)
          Executes search and returns search results.
 void search(SearchResultCollector collector, Query query)
          Executes search and feeds search results to the supplied SearchResultCollector.
 void search(SearchResultCollector collector, Query query, boolean loadXml, Collection<String> fieldsToLoad)
          Executes search and feeds search results to the supplied SearchResultCollector.
 void search(SearchResultCollector collector, Query query, boolean loadXml, String... fieldName)
          Executes search and feeds search results to the supplied SearchResultCollector.
 void setCollectorBufferSize(int bufferSize)
          Sets the buffer size of search methods using a SearchResultCollector.
 UUID storeQuery(Query query)
          Stores a query for later use in the cache.
 List<String> suggest(String query, int count)
          Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces.
 List<String> suggest(String query, QueryParser.Operator operator, int count)
          Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces.
 List<String> suggest(String fieldName, String query, int count)
          Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces.
 List<String> suggest(String fieldName, String query, QueryParser.Operator operator, int count)
          Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cache

protected LuceneCache cache

index

protected IndexConfig index

queryParserClass

protected Class<? extends QueryParser> queryParserClass

queryParserConstructor

protected Constructor<? extends QueryParser> queryParserConstructor

defaultQueryParserOperator

protected QueryParser.Operator defaultQueryParserOperator

collectorBufferSize

protected int collectorBufferSize
Constructor Detail

SearchService

public SearchService(String cfgFile,
                     String indexId)
              throws Exception
Main constructor that initializes a SearchService. The underlying LuceneCache is a singleton per config file, so you can create more than one instance of this class without additional memory consumption.

Parameters:
cfgFile - file name and path of configuration file
Throws:
Exception
Method Detail

newBooleanQuery

public BooleanQuery newBooleanQuery()
Constructs a BooleanQuery. Use this query type to combine different query types from the factory methods (native Lucene Query are useable, too). The current version is equivalent to
 BooleanQuery bq=new BooleanQuery();
 
but this should be avoided to make further extensions to this class possible.


newTextQuery

public Query newTextQuery(String fieldName,
                          String query,
                          QueryParser.Operator operator)
                   throws ParseException
Constructs a Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or FieldConfig.DataType.STRING field. String fields are not parsed by the query parser. They will be matched exact. Tokenized text fields are parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expand to different query types combined by a BooleanQuery. The query parser will use the given default operator.

Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
ParseException

newTextQuery

public Query newTextQuery(String fieldName,
                          String query)
                   throws ParseException
Constructs a Query for querying a FieldConfig.DataType.TOKENIZEDTEXT or FieldConfig.DataType.STRING field. String fields are not parsed by the query parser. They will be matched exact. Tokenized text fields are parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expand to different query types combined by a BooleanQuery. The query parser uses the default query operator (AND), which can be configured by the search property "defaultQueryParserOperator"

Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
ParseException

newDefaultFieldQuery

public Query newDefaultFieldQuery(String query,
                                  QueryParser.Operator operator)
                           throws ParseException
Constructs a Query for querying the default field. The query is parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expands to different query types combined by a BooleanQuery. The query parser will use the given default operator.

Throws:
ParseException

newDefaultFieldQuery

public Query newDefaultFieldQuery(String query)
                           throws ParseException
Constructs a Query for querying the default field. The query is parsed by parseQuery(java.lang.String, java.lang.String, org.apache.lucene.queryParser.QueryParser.Operator) and expands to different query types combined by a BooleanQuery. The query parser uses the default query operator (AND), which can be configured by the search property "defaultQueryParserOperator"

Throws:
ParseException

newDefaultMoreLikeThisQuery

public MoreLikeThisQuery newDefaultMoreLikeThisQuery(String identifier)
Constructs a Query for matching all documents similar to the given one (by identifier). The default field must have term vectors enabled. The query may be configured by setting its properties after creation.

See Also:
newFieldedMoreLikeThisQuery(java.lang.String, java.lang.String)

newFieldedMoreLikeThisQuery

public MoreLikeThisQuery newFieldedMoreLikeThisQuery(String identifier,
                                                     String fieldName)
Constructs a Query for matching all documents whose contents on a specific field are similar to the given document's one (by identifier). This is based on the indexed terms in the given field name. The field must have term vectors enabled. The query may be configured by setting its properties after creation.

See Also:
newDefaultMoreLikeThisQuery(java.lang.String)

newDateRangeQuery

public Query newDateRangeQuery(String fieldName,
                               Date min,
                               Date max)
Constructs a Query for querying a FieldConfig.DataType.DATETIME field.

Parameters:
min - Minimum value as Date or null if lower bound open
max - Maximum value as Date or null if upper bound open
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown

newDateRangeQuery

public Query newDateRangeQuery(String fieldName,
                               Calendar min,
                               Calendar max)
Constructs a Query for querying a FieldConfig.DataType.DATETIME field.

Parameters:
min - Minimum value as Calendar or null if lower bound open; the Calendar is internally converted to a Date
max - Maximum value as Calendar or null if upper bound open; the Calendar is internally converted to a Date
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown

newDateRangeQuery

public Query newDateRangeQuery(String fieldName,
                               String min,
                               String max)
                        throws ParseException
Constructs a Query for querying a FieldConfig.DataType.DATETIME field.

Parameters:
min - Minimum value as String or null if lower bound open; the String is parsed by LenientDateParser
max - Maximum value as String or null if upper bound open; the String is parsed by LenientDateParser
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
ParseException - if one of the boundaries is not a parseable date string

newNumericRangeQuery

public Query newNumericRangeQuery(String fieldName,
                                  Double min,
                                  Double max)
Constructs a Query for querying a FieldConfig.DataType.NUMBER field.

Parameters:
min - Minimum value as Double or null if lower bound open
max - Maximum value as Double or null if upper bound open
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown

newNumericRangeQuery

public Query newNumericRangeQuery(String fieldName,
                                  Number min,
                                  Number max)
Constructs a Query for querying a FieldConfig.DataType.NUMBER field.

Parameters:
min - Minimum value as Number or null if lower bound open; the Number is internally converted to a Double, but it can be any numeric Java type
max - Maximum value as Number or null if upper bound open; the Number is internally converted to a Double, but it can be any numeric Java type
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown

newNumericRangeQuery

public Query newNumericRangeQuery(String fieldName,
                                  String min,
                                  String max)
Constructs a Query for querying a FieldConfig.DataType.NUMBER field.

Parameters:
min - Minimum value as String or null if lower bound open; the String is parsed according to Java standard
max - Maximum value as String or null if upper bound open; the String is parsed according to Java standard
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match this query type or it is unknown
NumberFormatException - if one of the boundaries is not a parseable numeric string

newMatchAllDocsQuery

public Query newMatchAllDocsQuery()
Constructs a MatchAllDocsQuery. Use this to generate a query that matches all documents. It is useful under two circumstances:

The current version is equivalent to

 Query q=new MatchAllDocsQuery();
 
but this should be avoided to make further extensions to this class possible (e.g. in future indexes may contain documents marked as "deleted" that should not be returned).


newFieldBasedSort

public SortField newFieldBasedSort(String fieldName,
                                   boolean reverse)
Constructs a SortField instance to sort the results of a query based on a field.

Throws:
IllegalFieldConfigException - if the configuration of fieldName is not valid for sorting or it is unknown: The field must be indexed, but not be tokenized, and does not need to be stored (unless you happen to want it back with the rest of your document data).

Please note: You cannot sort fields that are only stored!


newSort

public Sort newSort(SortField... sortFields)
Constructs a Sort instance to sort the results of a query based on different fields (like a SELECT ... ORDER BY clause in SQL). This implementation constructs Sort in a way that a search result is sorted by relevance if all other criterias are the same for two search results.

Parameters:
sortFields - a VARARG parameter consisting of a number of previously generated SortField (using newFieldBasedSort(java.lang.String, boolean))

suggest

public List<String> suggest(String fieldName,
                            String query,
                            QueryParser.Operator operator,
                            int count)
                     throws IOException
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces.

Parameters:
fieldName - contains the field name for a field-specific input field. If you want suggestion for the default field use suggest(String,int)
query - is the query string the user have typed in. It will be parsed and the last term found in it is expanded
operator - is the default operator used by the query parser
count - limits the number of results
Returns:
a list of query strings correlated to the parameter query
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match data type TOKENIZEDTEXT or it is unknown
IOException

suggest

public List<String> suggest(String fieldName,
                            String query,
                            int count)
                     throws IOException
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces. The query parser uses the default query operator (AND), which can be configured by the search property "defaultQueryParserOperator"

Parameters:
fieldName - contains the field name for a field-specific input field. If you want suggestion for the default field use suggest(String,int)
query - is the query string the user have typed in. It will be parsed and the last term found in it is expanded
count - limits the number of results
Returns:
a list of query strings correlated to the parameter query
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match data type TOKENIZEDTEXT or it is unknown
IOException

suggest

public List<String> suggest(String query,
                            int count)
                     throws IOException
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces. The query parser uses the default query operator (AND), which can be configured by the search property "defaultQueryParserOperator"

Parameters:
query - is the query string the user have typed in. It will be parsed and the last term found in it is expanded
count - limits the number of results
Returns:
a list of query strings correlated to the parameter query
Throws:
IOException

suggest

public List<String> suggest(String query,
                            QueryParser.Operator operator,
                            int count)
                     throws IOException
Returns a list of query strings that can be displayed as a "suggest" drop-down box in search interfaces.

Parameters:
query - is the query string the user have typed in. It will be parsed and the last term found in it is expanded
operator - is the default operator used by the query parser
count - limits the number of results
Returns:
a list of query strings correlated to the parameter query
Throws:
IOException

listTerms

public List<String> listTerms(String fieldName,
                              String prefix,
                              int count)
                       throws IOException
Returns a list of terms for fields of type FieldConfig.DataType.STRING.

Parameters:
fieldName - contains the field name for which terms should be listed.
prefix - limits the returned list to terms starting with prefix. Set to "" for a full list.
count - limits the number of results
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match data type STRING or it is unknown
IOException

listTerms

public List<String> listTerms(String fieldName,
                              int count)
                       throws IOException
Returns a list of terms for fields of type FieldConfig.DataType.STRING.

Parameters:
fieldName - contains the field name for which terms should be listed.
count - limits the number of results
Throws:
IllegalFieldConfigException - if the configuration of fieldName does not match data type STRING or it is unknown
IOException

search

public SearchResultList search(Query query,
                               Sort sort,
                               boolean loadXml,
                               Collection<String> fieldsToLoad)
                        throws IOException
Executes search and returns search results. If the query was previously executed, it may return the results from cache.

Parameters:
query - the previously constructed query
sort - if you want to sort search results supply a Sort instance that describes the search (use newSort(org.apache.lucene.search.SortField...) for that). Supply null for default sorting (by relevance backwards).
loadXml - return the XML blob of search results.
fieldsToLoad - a collection of field names that should be made available. null to return all fields.
Throws:
IOException

search

public SearchResultList search(Query query,
                               Sort sort,
                               boolean loadXml,
                               String... fieldName)
                        throws IOException
Executes search and returns search results. If the query was previously executed, it may return the results from cache. This version uses VARARGs to list field names to return.

Throws:
IOException
See Also:
search(Query,Sort,boolean,Collection)

search

public SearchResultList search(Query query,
                               Sort sort)
                        throws IOException
Executes search and returns search results. If the query was previously executed, it may return the results from cache. All fields are returned.

Throws:
IOException
See Also:
search(Query,Sort,boolean,Collection)

search

public SearchResultList search(Query query,
                               boolean loadXml,
                               Collection<String> fieldsToLoad)
                        throws IOException
Executes search and returns search results with default sorting by relevance. If the query was previously executed, it may return the results from cache.

Throws:
IOException
See Also:
search(Query,Sort,boolean,Collection)

search

public SearchResultList search(Query query,
                               boolean loadXml,
                               String... fieldName)
                        throws IOException
Executes search and returns search results with default sorting by relevance. If the query was previously executed, it may return the results from cache. This version uses VARARGs to list field names to return.

Throws:
IOException
See Also:
search(Query,Sort,boolean,String...)

search

public SearchResultList search(Query query)
                        throws IOException
Executes search and returns search results with default sorting by relevance. If the query was previously executed, it may return the results from cache. All fields are returned.

Throws:
IOException
See Also:
search(Query,Sort)

search

public void search(SearchResultCollector collector,
                   Query query,
                   boolean loadXml,
                   Collection<String> fieldsToLoad)
            throws IOException
Executes search and feeds search results to the supplied SearchResultCollector.

Note: Scores of returned documents are raw scores and not normalized to 0.0<score<=1.0.

Parameters:
collector - a class implementing interface SearchResultCollector
query - the previously constructed query
loadXml - return the XML blob of search results
fieldsToLoad - a collection of field names that should be made available. null to return all fields.
Throws:
IOException

search

public void search(SearchResultCollector collector,
                   Query query,
                   boolean loadXml,
                   String... fieldName)
            throws IOException
Executes search and feeds search results to the supplied SearchResultCollector. This version uses VARARGs to list field names to return.

Throws:
IOException
See Also:
search(SearchResultCollector,Query,boolean,Collection)

search

public void search(SearchResultCollector collector,
                   Query query)
            throws IOException
Executes search and feeds search results to the supplied SearchResultCollector. All fields are returned.

Throws:
IOException
See Also:
search(SearchResultCollector,Query,boolean,Collection)

setCollectorBufferSize

public void setCollectorBufferSize(int bufferSize)
Sets the buffer size of search methods using a SearchResultCollector. The buffer is filled with document ids and scores during search and when full, all notifications to the collector are done in a bulk operation which fetches the document fields from index. Fetching document fields on every found document id degrades performance by a order of magnitude because of heavy I/O.

Default size is 32768 which is suitable for most use cases. If you exspect large counts of documents to be processed and you have a lot of memory increase this value. The buffer is not needed to buffer the whole documents, it only contains ids and scores, so large values are ok.

See Also:
search(SearchResultCollector,Query,boolean,Collection)

getDocument

public SearchResultItem getDocument(String identifier,
                                    boolean loadXml,
                                    Collection<String> fieldsToLoad)
                             throws IOException
Reads one document from index using its identifier. The score of the returned SearchResultItem will be set to 1.0.

Throws:
IOException

getDocument

public SearchResultItem getDocument(String identifier,
                                    boolean loadXml,
                                    String... fieldName)
                             throws IOException
Reads one document from index using its identifier. This version uses VARARGs to list field names to return.

Throws:
IOException
See Also:
getDocument(String,boolean,Collection)

getDocument

public SearchResultItem getDocument(String identifier)
                             throws IOException
Reads one document from index using its identifier. Returns all stored fields and XML.

Throws:
IOException
See Also:
getDocument(String,boolean,Collection)

getConfig

public Config getConfig()
Return the underlying configuration


getIndexConfig

public IndexConfig getIndexConfig()
Return the underlying index configuration


storeQuery

public UUID storeQuery(Query query)
Stores a query for later use in the cache. The query can be retrieved again using the returned UUID.

This function can be used to store a query a user generated in your web interface for later use, e.g. if you generate a query by the web service interface of panFMP and want to use it later in a Java Servlet for generating a geographical map of document locations using the Collector API. In this case you store the query using the web service and supply the hash code as a parameter to the servlet. See the examples for that.

This method returns a UUID, the AXIS webservice itsself uses Strings (as there is no spec. for UUIDs in WSDL files), you may convert the result to a String with UUID.toString().

Parameters:
query - the query to store.
Returns:
a UUID code identifying the query.
See Also:
readStoredQuery(java.util.UUID)

readStoredQuery

public Query readStoredQuery(UUID uuid)
Reads a query identified by a hash code from the cache.

Parameters:
uuid - the UUID returned by storeQuery(org.apache.lucene.search.Query). The AXIS webservice itsself uses Strings (as there is no spec. for UUIDs in WSDL files); you may convert the String from the webservice with UUID.fromString(String) to an UUID.
Returns:
the stored query or null if the hash code does not specify a query. This method may return null, even if a query identified by this hash existed in the past, when the query store is full and older (LRU) queries are removed by previous storeQuery(org.apache.lucene.search.Query) calls.
See Also:
storeQuery(org.apache.lucene.search.Query)

parseQuery

protected Query parseQuery(String fieldName,
                           String query,
                           QueryParser.Operator operator)
                    throws ParseException
Override in a subclass to use another query parser.

Parameters:
fieldName - the expanded field name of the field that is used as default when creating queries (when no prefix-notation is used)
query - the query string entered by the user
operator - the default operator passed to QueryParser
Throws:
ParseException
See Also:
QueryParser


Copyright ©2007-2013 panFMP Developers c/o Uwe Schindler