Product tutorials, how-tos, and fully-documented APIs.

Querying Riak Search

    Query Syntax

    Search queries use the same syntax as Lucene and supports most Lucene operators including term searches, field searches, boolean operators, grouping, lexicographical range queries, and wildcards (at the end of a word only).

    Querying has two distinct stages, planning and execution. During query planning, the system creates a directed graph of the query, grouping points on the graph in order to maximize data locality and minimize inter-node traffic. Single term queries can be executed on a single node, while range queries are executed using the minimal set of nodes that cover the query.

    As the query executes, Riak Search uses a series of merge-joins, merge-intersections, and filters to generate the resulting set of matching bucket/key pairs.

    Terms and Phrases

    A query can be as simple as a single term (ie: “red”) or a series of terms surrounded by quotes called a phrase (“See spot run”). The term (or phrase) is analyzed using the default analyzer for the index.

    The index schema contains a setting that determines whether a phrase is treated as an AND operation or an OR operation. By default, a phrase is treated as an OR operation. In other words, a document is returned if it matches any one of the terms in the phrase.

    Fields

    You can specify a field to search by putting it in front of the term or phrase to search. For example:

    color:red
    

    Or:

    title:"See spot run"
    

    You can further specify an index by prefixing the field with the index name. For example:

    products.color:red
    

    Or:

    books.title:"See spot run"
    

    If your field contains special characters, such as ('+','-','/','[',']','(',')',':' or space), then either surround the phrase in single quotes, or escape each special character with a backslash.

    books.url:'http://mycompany.com/url/to/my-book#foo'
    

    -or-

    books.url:http\:\/\/mycompany.com\/url\/to\/my\-book\#foo
    

    Wildcard Searches

    Terms can include wildcards in the form of an asterisk ( * ) to allow prefix matching, or a question mark ( ? ) to match a single character.

    Currently, the wildcard must come at the end of the term in both cases.

    For example:

    Proximity Searches

    Proximity searching allows you to find terms that are within a certain number of words from each other. To specify a proximity search, use the tilde argument on a phrase.

    For example:

    "See spot run"~20
    

    Will find documents that have the words “see”, “spot”, and “run” all within the same block of 20 words.

    Range Searches

    Range searches allow you to find documents with terms in between a specific range. Ranges are calculated lexicographically. Use square brackets to specify an inclusive range, and curly braces to specify an exclusive range.

    The following example will return documents with words containing “red” and “rum”, plus any words in between.

    "field:[red TO rum]"
    

    The following example will return documents with words in between “red” and “rum”:

    "field:{red TO rum}"
    

    Boosting a Term

    A term (or phrase) can have its score boosted using the caret operator along with an integer boost factor.

    In the following example, documents with the term “red” will have their score boosted:

    red^5 OR blue
    

    Boolean Operators - AND, OR, NOT

    Queries can use the boolean operators AND, OR, and NOT. The boolean operators must be capitalized.

    The following example return documents containing the words “red” and “blue” but not “yellow”.

    red AND blue AND NOT yellow
    

    The required ( + ) operator can be used in place of “AND”, and the prohibited ( - ) operator can be used in place of “AND NOT”. For example, the query above can be rewritten as:

    +red +blue -yellow
    

    Grouping

    Clauses in a query can be grouped using parentheses. The following query returns documents that contain the terms “red” or “blue”, but not “yellow”:

    (red OR blue) AND NOT yellow
    

    Querying via the Command Line

    To run a single query from the command line, use:

    bin/search-cmd search [INDEX] QUERY
    

    For example:

    bin/search-cmd search books "title:\\"See spot run\\""
    

    This will display a list of Document ID values matching the query. To conduct a document search, use the search-doc command. For example:

    bin/search-cmd search-doc books "title:\\"See spot run\\""
    

    Querying via the Erlang Command Line

    To run a query from the Erlang shell, use search:search(Query) or search:search(Index, Query). For example:

    search:search(<<"books">>, <<"author:joyce">>).
    

    This will display a list of Document ID values matching the query. To conduct a document search, use or {{search:search_doc(Index, Query)}}. For example:

    search:search_doc(<<"books">>, <<"author:joyce">>).
    

    Querying via the Solr Interface

    Riak Search supports a Solr-compatible interface for searching documents via HTTP. By default, the select endpoint is located at http://hostname:8098/solr/select.

    Alternatively, the index can be included in the URL, for example http://hostname:8098/solr/INDEX/select.

    The following parameters are supported:

    Limitations on Presort

    When trying to paginate results using presort, note that the results may only be sorted by the search score or sorted by the key order. There is currently no way to pre-sort on an arbitrary field. This generally means that if you with to paginate on some field, build your keys to include that field value, and use presort=key.

    To query data in the system with Curl:

    curl "http://localhost:8098/solr/books/select?start=0&rows=10000&q=prog*"
    

    Querying via the Riak Client API

    The Riak Client APIs have been updated to support querying of Riak Search. See the client documentation for more information. Currently, the Ruby, Python, PHP, and Erlang clients are supported.

    The API takes a default search index as well as as search query, and returns a list of bucket/key pairs. Some clients transform this list into objects specific to that client.

    Querying Integrated with Map/Reduce

    The Riak Client APIs that integrate with Riak Search also support using a search query to generate inputs for a map/reduce operation. This allows you to perform powerful analysis and computation across your data based on a search query. See the client documentation for more information. Currently, the Ruby, Python, PHP, and Erlang clients are supported.

    Kicking off a map/reduce query with the same result set over HTTP would use a POST body like this:

    {
      "inputs": {
                 "bucket":"mybucket",
                 "query":"foo OR bar"
                },
      "query":...
     }
    

    or

    {
      "inputs": {
                 "bucket":"mybucket",
                 "query":"foo OR bar",
                 "filter":"field2:baz"
                },
      "query":...
     }
    

    The phases in the “query” field should be exactly the same as usual. An initial map phase will be given each object matching the search for processing, but an initial link phase or reduce phase will also work.

    The query field specifies the search query. All syntax available in other Search interfaces is available in this query field. The optional filter field specifies the query filter.

    The old but still functioning syntax is:

    {
      "inputs": {
                 "module":"riak_search",
                 "function":"mapred_search",
                 "arg":["customers","first_name:john"]
                },
      "query":...
     }
    

    The “arg” field of the inputs specification is always a two-element list. The first element is the name of the bucket you wish to search, and the second element is the query to search for.

    Querying via HTTP/Curl

    Developers who are using a language without an official Riak API or prefer to use the pure HTTP API can still execute a search-based map/reduce operation.

    The syntax is fairly simple. In the “inputs” section of your map/reduce query, use the new “modfun” specification, naming “riak_search” as your module, “mapred_search” as your function, and your index and query as the arguments.

    For example, if you wanted to search the “customers” bucket for objects that had the text “john” in their “first_name” field, you would normally issue a Solr query like:

    $ curl http://localhost:8098/solr/customers/select?q=first_name:john
    

    Query Scoring

    Documents are scored using roughly these formulas

    The key difference is in how Riak Search calculates the Inverse Document Frequency. The equations described on the /Similarity/ page require knowledge of the total number of documents in a collection. Riak Search does not maintain this information for a collection, so instead uses the count of the total number of documents associated with each term in the query.