Download Riak 2.0

Using Secondary Indexes

Note: Riak Search preferred for querying

If you're interested in non-primary-key-based querying in Riak, i.e. if you're looking to go beyond straightforward K/V operations, we now recommend Riak Search rather than secondary indexes for a variety of reasons. Riak Search has a far more capacious querying API and can be used with all of Riak's storage backends.

When using the LevelDB or Memory backend (or both at the same time), you can use Riak to retrieve objects using external indexes.

Introduction

Secondary indexing (2i) in Riak gives developers the ability to tag an object stored in Riak, at write time, with one or more queryable values. Those values can be either a binary or string, such as sensor_1_data or admin_user or click_event, or an integer, such as 99 or 141121.

Since key/value data is completely opaque to 2i, applications must attach metadata to objects that tell Riak 2i exactly which attribute(s) to index and what the values of those indexes should be.

Riak Search serves analogous purposes but is quite different because it parses key/value data itself and builds indexes on the basis of Solr schemas.

Features

Note on 2i and strong consistency

Secondary indexes do not currently work with the strong consistency feature introduced in Riak version 2.0. If you store objects in strongly consistent buckets and attach secondary index metadata to those objects, you can still perform strongly consistent operations on those objects but the secondary indexes will be ignored.

When to Use Secondary Indexes

Secondary indexes are useful when you want to find data on the basis of something other than objects' bucket type, bucket, and key, i.e. when you want objects to be discoverable based on more than their location alone.

2i works best for objects whose value is stored in an opaque blob, like a binary file, because those objects don't offer any clues that enable you to discover them later. Indexing enables you to tag those objects and find all objects with the same tag in a specified bucket later on.

2i is thus recommended when your use case requires an easy-to-use search mechanism that does not require a schema (as does Riak Search) and a basic query interface, i.e. an interface that enables an application to tell Riak things like “fetch all objects tagged with the string Milwaukee_Bucks” or “fetch all objects tagged with numbers between 1500 and 1509.”

2i is also recommended if your use case requires anti-entropy. Since secondary indexes are just metadata attached to key/value objects, 2i piggybacks off of read-repair.

When Not to Use Secondary Indexes

Query Interfaces and Examples

Typically, the result set from a 2i query is a list of object keys from the specified bucket that include the index values in question. As we'll see below, when executing range queries in Riak 1.4 or higher, it is possible to retrieve the index values along with the object keys.

Inserting the Object with Secondary Indexes

In this example, the key john_smith is used to store user data in the bucket users, which bears the default bucket type. Let's say that an application would like add a Twitter handle and an email address to this object as secondary indexes.

bucket = client.bucket_type('default').bucket('users')
obj = Riak::RObject.new(bucket, 'john_smith')
obj.content_type = 'application/json'
obj.raw_data = '{"user_data":{ ... }}'
obj.indexes['twitter_bin'] = 'jsmith123'
obj.indexes['email_bin'] = 'jsmith@basho.com'
obj.store

# In the Ruby client (and all clients), if you do not specify a bucket
# type, the client will use the default type. And so the following set
# of commands would be equivalent to the one above:

bucket = client.bucket('users')
# repeat the same commands for building the object
obj.store
Location johnSmithKey = new Location(new Namespace("default", "users"), "john_smith");

// In the Java client (and all clients), if you do not specify a bucket type,
// the client will use the default type. And so the following store command
// would be equivalent to the one above:
Location johnSmithKey = new Location(new Namespace("users"), "john_smith");

RiakObject obj = new RiakObject()
        .setContentType("application/json")
        .setValue(BinaryValue.create("{'user_data':{ ... }}"));

obj.getIndexes().getIndex(StringBinIndex.named("twitter")).add("jsmith123");
obj.getIndexes().getIndex(StringBinIndex.named("email")).add("jsmith@basho.com");

StoreValue store = new StoreValue.Builder(obj)
        .withLocation(johnSmithKey)
        .build();
client.execute(store);
bucket = client.bucket_type('default').bucket('users')
# In the Python client (and all clients), if you do not specify a bucket type,
# the client will use the default type. And so the following store command
# would be equivalent to the one above:
bucket = client.bucket('users')

obj = RiakObject(client, bucket, 'john_smith')
obj.content_type = 'text/plain'
obj.data = '...user data...'
obj.add_index('twitter_bin', 'jsmith123')
obj.add_index('email_bin', 'jsmith@basho.com')
obj.store()
Obj = riakc_obj:new({<<"default">>, <<"users">>},
                    <<"john_smith">>,
                    <<"...user data...">>,
                    <<"text/plain">>),
%% In the Erlang client (and all clients), if you do not specify a bucket type,
%% the client will use the default type. And so the following object would be
%% equivalent to the one above:

Obj = riakc_obj:new(<<"users">>,
                    <<"john_smith">>,
                    <<"...user data...">>,
                    <<"text/plain">>),
MD1 = riakc_obj:get_update_metadata(Obj),
MD2 = riakc_obj:set_secondary_index(
    MD1,
    [{{binary_index, "twitter"}, [<<"jsmith123">>]},
     {{binary_index, "email"}, [<<"jsmith@basho.com">>]}]),
Obj2 = riakc_obj:update_metadata(Obj, MD2),
riakc_pb_socket:put(Pid, Obj2).
curl -XPOST localhost:8098/types/mytype/buckets/users/keys/john_smith \
  -H 'x-riak-index-twitter_bin: jsmith123' \
  -H 'x-riak-index-email_bin: jsmith@basho.com' \
  -H 'Content-Type: application/json' \
  -d '{"userData":"data"}'
Getting started with Riak clients

If you are connecting to Riak using one of Basho's official client libraries, you can find more information about getting started with your client in our quickstart guide.

This has accomplished the following:

Querying the Object with Secondary Indexes

Let's query the users bucket on the basis of Twitter handle to make sure that we can find our stored object:

bucket = client.bucket('users')
bucket.get_index('twitter_bin', 'jsmith123')

# This is equivalent to the following:
bucket = client.bucket_type('default').bucket('users')
bucket.get_index('twitter_bin', 'jsmith123')
Namespace usersBucket = new Namespace("users");
BinIndexQuery biq = new BinIndexQuery.Builder(usersBucket, "twitter", "jsmith123")
        .build();
BinIndexQuery.Response response = client.execute(biq);
List<BinIndexQuery.Response.Entry> entries = response.getEntries();
for (BinIndexQuery.Response.Entry entry : entries) {
    System.out.println(entry.getRiakObjectLocation().getKey());
}
bucket = client.bucket('users') # equivalent to client.bucket_type('default').bucket('users')
bucket.get_index('twitter_bin', 'jsmith123').results
{ok, Results} =
    riakc_pb_socket:get_index(Pid,
                              <<"users">>, %% bucket
                              {binary_index, "twitter"}, %% index name
                              <<"jsmith123">>). %% index
curl localhost:8098/buckets/users/index/twitter_bin/jsmith123

The response:

["john_smith"]
john_smith
['john_smith']
{ok,{index_results_v1,[<<"john_smith">>],
                      undefined,undefined}}.
{
  "keys": [
    "john_smith"
  ]
}

Examples

To run the following examples, make sure that Riak is configured to use an index-capable storage backend, such as LevelDB or Memory.

Indexing Objects

The following example indexes four different objects. Notice that we're storing both integer and string (aka binary) fields. Field names are automatically lowercased, some fields have multiple values, and duplicate fields are automatically de-duplicated, as in the following example:

bucket = client.bucket_type('indexes').bucket('people')

obj1 = Riak::RObject.new(bucket, 'larry')
obj1.content_type = 'text/plain'
obj1.raw_data = 'My name is Larry'
obj1.indexes['field1_bin'] = 'val1'
obj1.indexes['field2_int'] = 1001
obj1.store

obj2 = Riak::RObject.new(bucket, 'moe')
obj2.content_type = 'text/plain'
obj2.raw_data = 'My name is Larry'
obj2.indexes['Field1_bin'] = 'val2'
obj2.indexes['Field2_int'] = 1002
obj2.store

obj3 = Riak::RObject.new(bucket, 'curly')
obj3.content_type = 'text/plain'
obj3.raw_data = 'My name is Curly'
obj3.indexes['FIELD1_BIN'] = 'val3'
obj3.indexes['FIELD2_INT'] = 1003
obj3.store

obj4 = Riak::RObject.new(bucket, 'veronica')
obj4.content_type = 'text/plain'
obj4.raw_data = 'My name is Veronica'
obj4.indexes['field1_bin'] = %w{val4, val4, val4a, val4b}
obj4.indexes['field2_int'] = [1004, 1004, 1005, 1006]
obj4.indexes['field2_int'] = 1004
obj4.indexes['field2_int'] = 1004
obj4.indexes['field2_int'] = 1004
obj4.indexes['field2_int'] = 1007
obj4.store
Namespace peopleBucket = new Namespace("indexes", "people");

RiakObject larry = new RiakObject()
        .setValue(BinaryValue.create("My name is Larry"));
larry.getIndexes().getIndex(StringBinIndex.named("field1")).add("val1");
larry.getIndexes().getIndex(LongIntIndex.named("field2")).add(1001L);
StoreValue storeLarry = new StoreValue.Builder(larry)
        .withLocation(peopleBucket.setKey("larry"))
        .build();
client.execute(storeLarry);

RiakObject moe = new RiakObject()
        .setValue(BinaryValue.create("Ny name is Moe"));
moe.getIndexes().getIndex(StringBinIdex.named("Field1")).add("val2");
moe.getIndexes().getIndex(LongIntIndex.named("Field2")).add(1002L);
StoreValue storeMoe = new StoreValue.Builder(moe)
        .withLocation(peopleBucket.setKey("moe"))
        .build();
client.execute(storeMoe);

RiakObject curly = new RiakObject()
        .setValue(BinaryValue.create("My name is Curly"));
curly.getIndexes().getIndex(StringBinIndex.named("FIELD1")).add("val3");
curly.getIndexes().getIndex(LongIntIndex.named("FIELD2")).add(1003L);
StoreValue storeCurly = new StoreValue.Builder(curly)
        .withLocation(peopleBucket.setKey("curly"))
        .build();
client.execute(storeCurly);

RiakObject veronica = new RiakObject()
        .setValue(BinaryValue.create("My name is Veronica"));
veronica.getIndexes().getIndex(StringBinIndex.named("field1"))
        .add("val4").add("val4");
veronica.getIndexes().getIndex(LongIntIndex.named("field2"))
        .add(1004L).add(1005L).add(1006L).add(1004L).add(1004L).add(1007L);
bucket = client.bucket_type('indexes').bucket('people')

obj1 = RiakObject(client, bucket, 'larry')
obj1.content_type = 'text/plain'
obj1.data = 'My name is Larry'
obj1.add_index('field1_bin', 'val1').add_index('field2_int', 1001)
obj1.store()

obj2 = RiakObject(client, bucket, 'moe')
obj2.content_type = 'text/plain'
obj2data = 'Moe'
obj2.add_index('Field1_bin', 'val2').add_index('Field2_int', 1002)
obj2.store()

obj3 = RiakObject(client, bucket, 'curly')
obj3.content_type = 'text/plain'
obj3.data = 'Curly'
obj3.add_index('FIELD1_BIN', 'val3').add_index('FIELD2_INT', 1003)
obj3.store()

obj4 = RiakObject(client, bucket, 'veronica')
obj4.content_type = 'text/plain'
obj4.data = 'Veronica'
obj4.add_index('field1_bin', 'val4').add_index('field1_bin', 'val4').add_index('field2_int', 1004).add_index('field2_int', 1004).add_index('field2_int', 1005).add_index('field2_int', 1006).add_index('field2_int', 1004).add_index('field2_int', 1004).add_index('field2_int', 1004).add_index('field2_int', 1007)
obj4.store()
Larry = riakc_obj:new(
    {<<"indexes">>, <<"people">>},
    <<"larry">>,
    <<"My name is Larry">>,
    <<"text/plain">>),
LarryMetadata = riakc_obj:get_update_metadata(Larry),
LarryIndexes = riakc_obj:set_secondary_index(
    LarryMetadata,
    [{{binary_index, "field1"}, [<<"val1">>]}, {{integer_index, "field2"}, [1001]}]
),
LarryWithIndexes = riakc_obj:update_metadata(Larry, LarryIndexes).

Moe = riakc_obj:new(
    {<<"indexes">>, <<"people">>},
    <<"moe">>,
    <<"My name is Moe">>,
    <<"text/plain">>),
MoeMetadata = riakc_obj:get_update_metadata(Moe),
MoeIndexes = riakc_obj:set_secondary_index(
    MoeMetadata,
    [{{binary_index, "Field1"}, [<<"val2">>]}, {{integer_index, "Field2"}, [1002]}]
),
MoeWithIndexes = riakc_obj:update_metadata(Moe, MoeIndexes).

Curly = riakc_obj:new(
    {<<"indexes">>, <<"people">>},
    <<"curly">>,
    <<"My name is Curly">>,
    <<"text/plain">>),
CurlyMetadata = riakc_obj:get_update_metadata(Curly),
CurlyIndexes = riakc_obj:set_secondary_index(
    CurlyMetadata,
    [{{binary_index, "FIELD1"}, [<<"val3">>]}, {{integer_index, "FIELD2"}, [1003]}]
),
CurlyWithIndexes = riakc_obj:update_metadata(Curly, CurlyIndexes).

Veronica = riakc_obj:new(
    {<<"indexes">>, <<"people">>},
    <<"veronica">>,
    <<"My name is Veronica">>,
    <<"text/plain">>),
VeronicaMetadata = riakc_obj:get_update_metadata(Veronica),
VeronicaIndexes = riakc_obj:set_secondary_index(
    VeronicaMetadata,
    [{{binary_index, "field1"}, [<<"val4">>]}, {{binary_index, "field1"}, [<<"val4">>]}, {{integer_index, "field2"}, [1004]}, {{integer_index, "field2"}, [1004]}, {{integer_index, "field2"}, [1005]}, {{integer_index, "field2"}, [1006]}, {{integer_index, "field2"}, [1004]}, {{integer_index, "field2"}, [1004]}, {{integer_index, "field2"}, [1007]}]
),
VeronicaWithIndexes = riakc_obj:update_metadata(Veronica, VeronicaIndexes).
curl -v -XPUT localhost:8098/types/indexes/buckets/people/keys/larry \
  -H "x-riak-index-field1_bin: val1" \
  -H "x-riak-index-field2_int: 1001" \
  -d 'My name is Larry'

curl -v -XPUT localhost:8098/types/indexes/buckets/people/keys/moe \
  -H "x-riak-index-Field1_bin: val2" \
  -H "x-riak-index-Field2_int: 1002" \
  -d 'My name is Moe'

curl -v -XPUT localhost:8098/types/indexes/buckets/people/keys/curly \
  -H "X-RIAK-INDEX-FIELD1_BIN: val3" \
  -H "X-RIAK-INDEX-FIELD2_INT: 1003" \
  -d 'My name is Curly'

curl -v -XPUT 127.0.0.1:8098/types/indexes/buckets/people/keys/veronica \
  -H "x-riak-index-field1_bin: val4, val4, val4a, val4b" \
  -H "x-riak-index-field2_int: 1004, 1004, 1005, 1006" \
  -H "x-riak-index-field2_int: 1004" \
  -H "x-riak-index-field2_int: 1004" \
  -H "x-riak-index-field2_int: 1004" \
  -H "x-riak-index-field2_int: 1007" \
  -d 'My name is Veronica'

The above objects will end up having the following secondary indexes, respectively:

As these examples show, there are safeguards in Riak that both normalize the names of indexes and prevent the accumulation of redundant indexes.

Invalid Field Names and Types

The following examples demonstrate what happens when an index field is specified with an invalid field name or type. The system responds with 400 Bad Request and a description of the error.

Invalid field name:

bucket = client.bucket('mybucket')
obj = Riak::RObject.new(bucket, 'mykey')
obj.indexes['field2_foo'] = 1001

# The Ruby client will let you get away with this...at first. But when you
# attempt to store the object, you will get an error response such as this:

NoMethodError: undefined method 'map' for 1001:Fixnum
// The Java client will not allow you to provide invalid index names,
// because you are not required to add "_bin" or "_int" to the end of
// those names
bucket = client.bucket_type('mytype').bucket('mybucket')
obj = RiakObject(client, bucket, 'mykey')
obj.add_index('field2_foo', 1001)

# Result:
riak.RiakError: "Riak 2i fields must end with either '_bin' or '_int'."
Obj = riakc_obj:new(
    {<<"mytype">>, <<"mybucket">>},
    <<"mykey">>,
    <<"some data">>,
    <<"text/plain">>
),
MD1 = riakc_obj:get_update_metadata(Obj),
MD2 = riakc_obj:set_secondary_index(MD1, [{{foo_index, "field2"}, [1001]}]).

%% The Erlang client will return an error message along these lines:
** exception error: no function clause matching
                    riakc_obj:set_secondary_index( ... ).
curl -XPUT 127.0.0.1:8098/types/mytype/buckets/mybucket/keys/mykey \
  -H "x-riak-index-field2_foo: 1001" \
  -d 'data1'

# Response
Unknown field type for field: 'field2_foo'.

Incorrect data type:

bucket = client.bucket('mybucket')
obj = Riak::RObject.new(bucket, 'mykey')
obj.indexes['field2_int'] = 'bar'

# The Ruby client will let you get away with this...at first. But when you
# attempt to store the object, you will get an error response such as this:

NoMethodError: undefined method 'map' for 1001:Fixnum
Location key = new Location(new Namespace("mybucket"), "mykey");
RiakObject obj = new RiakObject();
obj.getIndexes().getIndex(LongIntIndex.named("field2")).add("bar");

// The Java client will return a response indicating a type mismatch.
// The output may look something like this:

Error:(46, 68) java: no suitable method found for add(java.lang.String)
    method com.basho.riak.client.query.indexes.RiakIndex.add(java.lang.Long) is not applicable
      (argument mismatch; java.lang.String cannot be converted to java.lang.Long)
    method com.basho.riak.client.query.indexes.RiakIndex.add(java.util.Collection<java.lang.Long>) is not applicable
      (argument mismatch; java.lang.String cannot be converted to java.util.Collection<java.lang.Long>)
bucket = client.bucket_type('mytype').bucket('mybucket')
obj = RiakObject(client, bucket, 'mykey')
obj.add_index('field2_int', 'bar')

# The Python client will let you get away with this...at first. But when you
# attempt to store the object, you will get an error response such as this:
riak.RiakError: '{precommit_fail,[{field_parsing_failed,{<<"field2_int">>,<<"bar">>}}]}'
Obj = riakc_obj:new(
    {<<"mytype">>, <<"mybucket">>},
    <<"mykey">>,
    <<"some data">>,
    <<"text/plain">>
),
MD1 = riakc_obj:get_update_metadata(Obj),
MD2 = riakc_obj:set_secondary_index(MD1, [{{integer_index, "field2"}, [<<"bar">>]}]).

%% The Erlang client will return an error message along these lines:
** exception error: bad argument
     in function  integer_to_list/1
        called as integer_to_list(<<"bar">>) ...
curl -XPUT 127.0.0.1:8098/types/mytype/buckets/mybucket/keys/mykey \
  -H "x-riak-index-field2_int: bar" \
  -d 'data1'

# Response
HTTP/1.1 400 Bad Request

Could not parse field 'field2_int', value 'bar'.

Querying

Note on 2i queries and the R parameter

For all 2i queries, the R parameter is set to 1, which means that queries that are run while handoffs and related operations are underway may not return all keys as expected.

Exact Match

The following examples perform an exact match index query.

Query a binary index:

bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field1_bin', 'val1')
Namespace myBucket = new Namespace("mytype", "mybucket");
BinIndexQuery biq = new BinIndexQuery.Builder(myBucket, "field1", "val1").build();
BinIndexQuery.Response response = client.execute(biq);
bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field1_bin', 'val1')
{ok, Results} = riakc_pb_socket:get_index(
    Pid,
    {<<"mytype">>, <<"mybucket">>}, %% bucket type and bucket name
    {binary_index, "field2"},
    <<"val1">>
).
curl localhost:8098/types/mytype/buckets/mybucket/index/field1_bin/val1

Query an integer index:

bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field1_int', 1001)
Namespace myBucket = new Namespace("mytype", "mybucket");
IntIndexQuery iiq = new IntIndexQuery.Builder(myBucket, "field1", 1001L)
        .build();
IntIndexQuery.Response response = client.execute(iiq);
bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field1_int', 1001)
{ok, Results} = riakc_pb_socket:get_index(
    Pid,
    {<<"mytype">>, <<"mybucket">>}, %% bucket type and bucket name
    {integer_index, "field1"},
    1001
).
curl localhost:8098/types/mytype/buckets/mybucket/index/field2_int/1001

The following example performs an exact match query and pipes the results into a MapReduce job:

curl -XPOST localhost:8098/mapred \
  -H "Content-Type: application/json" \
  -d @-<<EOF
{
  "inputs": {
    "bucket": "mybucket",
    "index": "field1_bin",
    "key":"val3"
  },
  "query": [
    {
      "reduce": {
        "language":"erlang",
        "module": "riak_kv_mapreduce",
        "function": "reduce_identity",
        "keep": true
      }
    }
  ]
}
EOF

Range

The following examples perform a range query.

Query a binary index…

bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field1_bin', 'val2'..'val4')
Namespace myBucket = new Namespace("mytype", "mybucket");
BinIndexQuery biq = new BinIndexQuery.Builder(myBucket, "field1", "val2", "val4")
        .build();
BinIndexQuery.Response response = client.execute(biq);
bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field1_bin', 'val2', 'val4')
{ok, Results} = riakc_pb_socket:get_index_range(
    Pid,
    {<<"mytype">>, <<"mybucket">>}, %% bucket type and bucket name
    {binary_index, "field1"}, %% index name
    <<"val2">>, <<"val4">> %% range query for keys between "val2" and "val4"
).
curl localhost:8098/types/mytype/buckets/mybucket/index/field1_bin/val2/val4

Or query an integer index…

bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field2_int', 1002..1004)
Namespace myBucket = new Namespace("mytype", "mybucket");
IntIndexQuery iiq = new IntIndexQuery.Builder(myBucket, "field2", 1002L, 1004L)
        .build();
IntIndexQuery.Response response = client.execute(iiq);
bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('field2_int', 1002, 1004)
{ok, Results} = riakc_pb_socket:get_index_range(
    Pid,
    {<<"mytype">>, <<"mybucket">>}, %% bucket type and bucket name
    {integer_index, "field2"}, %% index name
    1002, 1004 %% range query for keys between "val2" and "val4"
).
curl localhost:8098/types/mytype/buckets/mybucket/index/field2_int/1002/1004

The following example performs a range query and pipes the results into a MapReduce job:

curl -XPOST localhost:8098/mapred\
  -H "Content-Type: application/json" \
  -d @-<<EOF
{
  "inputs": {
    "bucket": "mybucket",
    "index": "field1_bin",
    "start": "val2",
    "end": "val4"
  },
  "query": [
    {
      "reduce": {
        "language": "erlang",
        "module": "riak_kv_mapreduce",
        "function": "reduce_identity",
        "keep": true
      }
    }
  ]
}
EOF

Range with terms

When performing a range query, it is possible to retrieve the matched index values alongside the Riak keys using return_terms=true. An example from a small sampling of Twitter data with indexed hash tags:

bucket = client.bucket_type('mytype').bucket('tweets')
bucket.get_index('hashtags_bin', 'rock'..'rocl', return_terms: true)
Namespace tweetsBucket = new Namespace("mytype", "tweets");
BinIndexQuery biq = new BinIndexQuery.Builder(tweetsBucket, "hashtags", "rock", "rocl")
        .withKeyAndIndex(true)
        .build();
BinIndexQuery.Response response = client.execute(biq);
bucket = client.bucket_type('mytype').bucket('tweets')
bucket.get_index('hashtags_bin', 'rock', 'rocl', return_terms=True)
{ok, Results} = riakc_pb_socket:get_index_range(
    Pid,
    {<<"mytype">>, <<"mybucket">>}, %% bucket type and bucket name
    {binary_index, "hashtags"},     %% index name
    <<"rock">>, <<"rocl">>          %% range query for keys between "val2" and "val4"
).
curl localhost:8098/types/mytype/buckets/tweets/index/hashtags_bin/rock/rocl?return_terms=true

Response:

{
  "results": [
    {
      "rock": "349224101224787968"
    },
    {
      "rocks": "349223639880699905"
    }
  ]
}

Pagination

When asking for large result sets, it is often desirable to ask the servers to return chunks of results instead of a firehose. You can do so using max_results=<n>, where n is the number of results you'd like to receive.

Assuming more keys are available, a continuation value will be included in the results to allow the client to request the next page.

Here is an example of a range query with both return_terms and pagination against the same Twitter data set.

bucket = client.bucket_type('mytype').bucket('tweets')
bucket.get_index('hashtags_bin', 'ri'..'ru', max_results: 5)
Namespace tweetsBucket = new Namespace("mytype", "tweets");
BinIndexQuery biq = new BinIndexQuery.Builder(tweetsBucket, "hashtags", "ri", "ru")
        .withMaxResults(5)
        .build();
BinIndexQuery.Response response = client.execute(biq);
bucket = client.bucket_type('mytype').bucket('tweets')
bucket.get_index('hashtags_bin', 'ri', 'ru', max_results=5)
{ok, Results} = riakc_pb_socket:get_index_range(
    Pid,
    {<<"mytype">>, <<"tweets">>}, %% bucket type and bucket name
    {binary_index, "hashtags"}, %% index name
    <<"ri">>, <<"ru">>, %% range query from "ri" to "ru"
    {max_results, 5}
).
curl localhost:8098/types/mytype/buckets/tweets/index/hashtags_bin/ri/ru?max_results=5&return_terms=true

Here is an example JSON response (your client-specific response may differ):

{
  "continuation": "g2gCbQAAAAdyaXBqYWtlbQAAABIzNDkyMjA2ODcwNTcxMjk0NzM=",
  "results": [
    { "rice": "349222574510710785" },
    { "rickross": "349222868095217664" },
    { "ridelife": "349221819552763905" },
    { "ripjake": "349220649341952001" },
    { "ripjake": "349220687057129473" }
  ]
}

Take the continuation value from the previous result set and feed it back into the query.

bucket = client.bucket_type('mytype').bucket('tweets')
bucket.get_index(
  'hashtags_bin',
  'ri'..'ru',
  continuation: 'g2gCbQAAAAdyaXBqYWtlbQAAABIzNDkyMjA2ODcwNTcxMjk0NzM',
  max_results: 5,
  return_terms: true
)
Namespace tweetsBucket = new Namespace("mytype", "tweets");
BinIndexQuery biq = new BinIndexQuery.Builder(tweetsBucket, "hashtags", "ri", "ru")
        .withContinuation(BinaryValue.create("g2gCbQAAAAdyaXBqYWtlbQAAABIzNDkyMjA2ODcwNTcxMjk0NzM"))
        .withMaxResults(5)
        .withKeyAndIndex(true)
        .build();
BinIndexQuery.Response response = client.execute(biq);
{ok, Results} = riakc_pb_socket:get_index_range(
    Pid,
    {<<"mytype">>, <<"tweets">>}, %% bucket type and bucket name
    {binary_index, "hashtags"}, %% index name
    <<"ri">>, <<"ru">>, %% range query from "ri" to "ru"
    [
        {continuation, <<"g2gCbQAAAAdyaXBqYWtlbQAAABIzNDkyMjA2ODcwNTcxMjk0NzM">>},
        {max_results, 5},
        {return_terms, true}
    ]
).
bucket = client.bucket_type('mytype').bucket('tweets')
bucket.get_index(
    'hashtags_bin',
    'ri', 'ru',
    continuation='g2gCbQAAAAdyaXBqYWtlbQAAABIzNDkyMjA2ODcwNTcxMjk0NzM',
    max_results=5,
    return_terms=True
)
curl localhost:8098/types/mytype/buckets/tweets/index/hashtags_bin/ri/ru?continuation=g2gCbQAAAAdyaXBqYWtlbQAAABIzNDkyMjA2ODcwNTcxMjk0NzM=&max_results=5&return_terms=true

The result:

{
    "continuation": "g2gCbQAAAAlyb2Jhc2VyaWFtAAAAEjM0OTIyMzcwMjc2NTkxMjA2NQ==",
    "results": [
        {
            "ripjake": "349221198774808579"
        },
        {
            "ripped": "349224017347100672"
        },
        {
            "roadtrip": "349221207155032066"
        },
        {
            "roastietime": "349221370724491265"
        },
        {
            "robaseria": "349223702765912065"
        }
    ]
}

Streaming

It is also possible to stream results:

bucket = client.bucket_type('mytype').bucket('mybucket')
bucket.get_index('myindex_bin', 'foo', stream: true)
/*
  It is not currently possible to stream results using the Java client
*/
bucket = client.bucket_type('mytype').bucket('mybucket')
keys = []
for key in bucket.stream_index('myindex_bin', 'foo'):
    keys.append(key)
{ok, KeyStream} = riakc_pb_socket:get_index_eq(
    Pid,
    {<<"mytype">>, <<"mybucket">>}, %% bucket type and bucket name
    {binary_index, "myindex"}, %% index name and type
    <<"foo">>, %% value of the index
    [{stream, true}] %% enable streaming
).
curl localhost:8098/types/mytype/buckets/mybucket/index/myindex_bin/foo?stream=true

Streaming can also be combined with pagination and return_terms.

Sorting

As of Riak 1.4, the result set is sorted on index values (when executing range queries) and object keys. See the pagination example above: hash tags (2i keys) are returned in ascending order, and the object keys (Twitter IDs) for the messages which contain the ripjake hash tag are also returned in ascending order.

Retrieve all Bucket Keys via the $bucket Index

The following example retrieves the keys for all objects stored in the bucket mybucket using an exact match on the special $bucket index.

curl localhost:8098/types/mytype/buckets/mybucket/index/\$bucket/_

Count Bucket Objects via $bucket Index

The following example performs a secondary index lookup on the $bucket index like in the previous example and pipes this into a MapReduce that counts the number of records in the mybucket bucket. In order to improve efficiency, the batch size has been increased from the default size of 20.

curl -XPOST localhost:8098/mapred\
  -H "Content-Type: application/json" \
  -d @-<<EOF
{
  "inputs": {
    "bucket": "mybucket",
    "index": "$bucket",
    "key":"mybucket"
  },
  "query": [
             {
               "reduce": {
                 "language": "erlang",
                 "module": "riak_kv_mapreduce",
                 "function": "reduce_count_inputs",
                 "arg": { 
                   "reduce_phase_batch_size":1000
                 }
               }
             }
           ]
}
EOF