Download Riak 2.0

Configuring Riak CS

For Riak CS to operate properly it must know how to connect to Riak. A Riak CS node typically runs on the same server as its corresponding Riak node, which means that changes will only be necessary if Riak is configured using non-default settings.

Riak CS's settings reside in CS node's app.config file, which is typically located in the /etc directory. Configurable parameters related to Riak CS specifically can be found in the riak_cs section of that file. That section looks something like this:

{riak_cs, [
    {parameter1, value},
    {parameter2, value},
    %% and so on...
]},

The sections below walk you through some of the main configuration categories that you will likely encounter while operating Riak CS. For a comprehensive listing of available parameters, see the Full Configuration Listing section below.

Host and Port

To connect Riak CS to Riak, make sure that the following parameters are set to the host and port used by Riak:

Note on IP addresses

The IP address you enter here must match the IP address specified for the Protocol Buffers interface in the Riak app.config file unless Riak CS is running on a completely different network, in which case address translation is required.

After making any changes to the app.config file in Riak CS, restart the node if it is already running.

Specifying the Stanchion Node

If you're running a single Riak CS node, you don't have to change the Stanchion settings because Stanchion runs on the local host. If your Riak CS system has multiple nodes, however, you must specify the IP address and port for the Stanchion node and whether or not SSL is enabled.

The Stanchion settings reside in the Riak CS app.config file, which is located in the /etc directory of each Riak CS node. The settings appear in the riak_cs config section of the file.

To set the host and port for Stanchion, do the following:

Enabling SSL

SSL is disabled by default in Stanchion, i.e. the stanchion_ssl variable is set to false. If Stanchion is configured to use SSL, change this variable to true. The following example configuration would set the Stanchion host to localhost, the port to 8085 (the default), and set up Stanchion to use SSL:

{riak_cs, [
    %% Other configs

    {stanchion_ip, "127.0.0.1"},
    {stanchion_host, 8085},
    {stanchion_ssl, true},

    %% Other configs
]}

Specifying the Node Name

You can also set a more useful name for the Riak CS node, which is helpful to identify the node from which requests originate during troubleshooting. This setting resides in the Riak CS vm.args configuration file, which is also located in the /etc directory. This would set the name of the Riak CS node to riak_cs@127.0.0.1:

-name riak_cs@127.0.0.1

Change 127.0.0.1 to the IP address or hostname for the server on which Riak CS is running.

Specifying the Admin User

The admin user is authorized to perform actions such as creating users or obtaining billing statistics. An admin user account is no different from any other user account. You must create an admin user to use Riak CS.

Note on anonymous user creation

Before creating an admin user, you must first set {anonymous_user_creation, true} in the Riak CS app.config. You may disable this again once the admin user has been created.

To create an account for the admin user, use an HTTP POST request with the username you want to use for the admin account. The following is an example:

curl -H 'Content-Type: application/json' \
  -XPOST http://localhost:8080/riak-cs/user \
  --data '{"email":"foobar@example.com", "name":"admin user"}'

The JSON response will look something like this:

{
  "Email": "foobar@example.com",
  "DisplayName": "adminuser",
  "KeyId": "324ABC0713CD0B420EFC086821BFAE7ED81442C",
  "KeySecret": "5BE84D7EEA1AEEAACF070A1982DDA74DA0AA5DA7",
  "Name": "admin user",
  "Id":"8d6f05190095117120d4449484f5d87691aa03801cc4914411ab432e6ee0fd6b",
  "Buckets": []
}

You can optionally send and receive XML if you set the Content-Type to application/xml, as in this example:

Once the admin user exists, you must specify the credentials of the admin user on each node in the Riak CS system. The admin user credential settings reside in the Riak CS app.config file, which is located in the etc/riak-cs directory. The settings appear in the Riak CS config section of the file. Paste the key_id string between the quotes for the admin_key. Paste the key_secret string into the admin_secret variable, as shown here:

%% Admin user credentials
{admin_key, "LXAAII1MVLI93IN2ZMDD"},
{admin_secret, "5BE84D7EEA1AEEAACF070A1982DDA74DA0AA5DA7"},

Once the admin user exists, you must specify the credentials of the admin user in the app.config file. Those will be the same credentials that you received as a JSON object when you ran the POST request to create the user.

Bucket Restrictions

If you wish, you can limit the number of buckets created per user. The default maximum is 100. Please note that if a user exceeds the bucket creation limit, they are still able to perform other actions, including bucket deletion. You can change the default limit using the max_buckets_per_user parameter in each node's app.config file. The example configuration below would set the maximum to 1000:

{riak_cs, [
    %% Other configs

    {max_buckets_per_user, 1000},

    %% Other configs
]}

If you want to avoid setting a limit on per-user bucket creation, you can set max_buckets_per_user to unlimited.

Connection Pools

Riak CS uses two distinct connection pools for communication with Riak: a primary and a secondary pool.

The primary connection pool is used to service the majority of API requests related to the upload or retrieval of objects. It is identified in the configuration file as request_pool. The default size of this pool is 128.

The secondary connection pool is used strictly for requests to list the contents of buckets. The separate connnection pool is maintained in order to improve performance. This secondary connection pool is identified in the configuration file as bucket_list_pool. The default size of this pool is 5.

The following shows the connection_pools default configuration entry that can be found in the app.config file:

{riak_cs, [
    %% Other configs

    {connection_pools,
    [
     {request_pool, {128, 0} },
     {bucket_list_pool, {5, 0} }
    ]},

    %% Other configs
]}

The value for each pool is represented as a pair with the first element representing the normal size of the pool. This is representative of the number of concurrent requests of a particular type that a Riak CS node may service. The second element represents the number of allowed overflow pool requests that are allowed. It is not recommended that you use any value other than 0 for the overflow amount unless careful analysis and testing has shown it to be beneficial for a particular use case.

Tuning

We strongly recommend you that you increase the value of the _pb_backlog_ setting in Riak. When a Riak CS node is started, each connection pool begins to establish connections to Riak. This can result in a thundering herd problem in which connections in the pool believe they are connected to Riak, but in reality some of the connections have been reset. Due to TCP RST packet rate limiting (controlled by net.inet.icmp.icmplim) some of the connections may not receive notification until they are used to service a user's request. This manifests as an {error, disconnected} message in the Riak CS logs and an error to returned to the user.

Enabling SSL in Riak CS

%%{ssl, [
%%    {certfile, "./etc/cert.pem"},
%%    {keyfile, "./etc/key.pem"}
%%   ]},

Then replace the text in quotes with the path and filename for your SSL encryption files. By default, there's a cert.pem and a key.pem in each node's /etc directory. You're free to use those or to supply your own.

Please note that you must also provide a certificate authority, aka a CA cert and specify its location using the cacertfile parameter. Unlike certfile and keyfile, the cacertfile parameter is not commented out. You will need to add it yourself. Here's an example configuration with this parameter included:

{ssl, [
       {certfile, "./etc/cert.pem"},
       {keyfile, "./etc/key.pem"},
       {cacertfile, "./etc/cacert.pem"}
      ]},
      %% Other configs

Instructions on creating your own CA cert can be found here.

Proxy vs. Direct Configuration

Riak CS can interact with S3 clients in one of two ways:

Proxy

To establish a proxy configuration, configure your client's proxy settings to point to Riak CS cluster's address. Then configure your client with Riak CS credentials.

When Riak CS receives the request to be proxied, it services the request itself and responds back to the client as if the request went to S3.

On the server side, the cs_root_host in the riak_cs section of the app.config configuration file must be set to s3.amazonaws.com because all of the bucket URLs request by the client will be destined for s3.amazonaws.com. This is the default.

Note: One issue with proxy configurations is that many GUI clients only allow for one proxy to be configured for all connections. For customers trying to connect to both S3 and Riak CS, this can prove problematic.

Direct

The establish a direct configuration, the cs_root_host in the riak_cs section of app.config must be set to the FQDN of your Riak CS endpoint, as all of the bucket URLs will be destined for the FQDN endpoint.

You will also need wildcard DNS entries for any child of the endpoint to resolve to the endpoint itself. Here's an example:

data.riakcs.net
*.data.riakcs.net

Garbage Collection Settings

The following options are available to make adjustments to the Riak CS garbage collection system. More details about garbage collection in Riak CS are available in Garbage Collection.

There are two configuration options designed to provide improved performance for Riak CS when using Riak 1.4.0 or later. These options take advantage of additions to Riak that are not present prior to version 1.4.0.

Other Riak CS Settings

The app.config file includes other settings, such as whether to create log files and where to store them. These settings have default values that work in most cases.

Full Configuration Listing

Config Subsection Description Default
cs_ip The IP address of the Riak CS node. "127.0.0.1"
cs_port The port on which the Riak CS node listens. 8080
riak_ip The IP address for the Riak node accessed by this Riak CS node. "127.0.0.1"
riak_pb_port The Protocol Buffers port for the Riak node accessed by this Riak CS node. 8087
stanchion_ip The IP address for the Stanchion node associated with this Riak CS node. "127.0.0.1"
stanchion_port The port on which the Stanchion node associated with this Riak CS node listens.
stanchion_ssl Whether SSL is enabled on the Stanchion node associated with this Riak CS node. false
anonymous_user_creation Whether users can be created by a currently anonymous user. We recommend enabling anonymous user creation only if your use case specifically demands allowing anonymous users to create accounts. You may also need to temporarily enable anonymous user creation when you are first setting up your Riak CS installation, so that you can create an admin user. false
admin_key The secret key that admin users must use to authenticate themselves, particularly for access to things like the /riak-cs/stats endpoint. Please note that the admin credentials set on Riak CS nodes must match the admin key set in Stanchion’s app.config file. "admin-key"
admin_secret The secret that admin users must use to authenticate themselves. See the instructions for the admin_key setting above for more information. "admin-secret"
admin_ip The IP address to listen on for admin-related tasks. This setting is commented out by default, which means that this value is the same as cs_ip by default. Only uncomment this setting if you wish to use a different IP for admin tasks. "127.0.0.1" (commented)
admin_port The port to listen on for admin-related tasks. This setting is commented out by default, which means that this value is the same as cs_port by default. Only uncomment this setting if you wish to use a different port for admin tasks. 8000 (commented)
certfile ssl The location of the SSL cert used by this Riak node. This setting is commented by default. Only uncomment if you need to use SSL. Otherwise, Riak CS will be available via HTTP only. ./etc/cert.pem
keyfile ssl The location of the SSL keyfile used by this Riak node. This setting is commented by default. Only uncomment if you need to use SSL. Otherwise, Riak CS will be available via HTTP only. ./etc/key.pem (commented)
cs_root_host The root host name that Riak CS accepts. A CS bucket would be accessible via a URL like http://bucket.s3.example.com/object/name if this parameter were set to s3.example.com. s3.amazonaws.com
request_pool connection_pools Sets the fixed and overflow sizes of this Riak CS node’s request pool, expressed as a {FixedSize, OverflowSize} tuple. {128, 0}
bucket_list_pool connection_pools Sets the fixed and overflow sizes of this Riak CS node’s connection pool, expressed as a {FixedSize, OverflowSize} tuple. {5, 0}
rewrite_module The module used to handle object rewrites in Riak CS. riak_cs_s3_rewrite
auth_module The module used to handle authentication in Riak CS. riak_cs_s3_auth
fold_objects_for_list_keys Setting this option to true enables Riak CS to use a more efficient method of retrieving Riak CS bucket contents from Riak. Using this option provides improved performance and stability, especially for buckets that contain millions of objects or more. This option should not be enabled unless Riak 1.4.10 is being used. false
n_val_1_get_requests This option, if set to true, causes Riak CS to use a special request option when retrieving the blocks of an object. More specifically, this option instructs Riak to send a request for the object block to a single eligible vnode instead of all eligible vnodes. This differs from a standard r request option in that r affects how many vnode responses to wait for before returning and has no effect on how many vnodes are actually contacted. Enabling this option has the effect of greatly reducing the intra-cluster bandwidth used by Riak when retrieving objects with Riak CS. This option is harmless if used with a version of Riak prior to 1.4.0, but the option to disable it is provided as a safety measure. true
cs_version The Riak CS version in use on this node. This setting is used to selectively enable new features for the current version to better support rolling upgrades. If you’re installing Riak CS anew you will not need to change this setting; if you’re performing a rolling upgrade, keep the original value set in the old app.config until all nodes have been upgraded and then set to the new value. If this parameter is not defined, Riak CS will use 0.
access_log_flush_factor How often the access log gets flushed. The value must be an integer: 1 means once per archive period, 2 means twice per period, etc. The length of archive periods is determined by the access_archive_period setting explained below. 1
access_log_flush_size You can also set the access log to be flushed when the log exceeds the number of accesses that you choose for this parameter. 1000000
access_archive_period How long each archive access period lasts, set as an integer number of seconds. This setting should be a multiple of access_log_flush_size. 3600 (one hour)
access_archiver_max_backlog The number of access logs that are allowed to accumulate in the archiver’s log queue before it begins skipping to catch up. Set as an integer. 2
storage_schedule Determines when storage calculation batches are automatically started. This parameter takes a list of HHMM UTC times. If the list is empty, i.e. the parameter is set to [], no automatic calculations will take place; otherwise, [“0600”] will set calculations to be triggered at 6 am UTC every day, [“0600”, “1945”] will set calculations to be triggered at 6 am and 7:45 pm UTC, etc. []
storage_archive_period How large each storage archive object is. This should be chosen in such a way that each storage_schedule entry falls in a different period. Set as an integer number of seconds. 86400 (1 day)
usage_request_limit The number of archive periods a user can request in one usage read, applied independently to access and storage. Specified as an integer number of intervals. To give an example, 744 (the default) means one month at 1-hour intervals. 744
leeway_seconds The number of seconds that must elapse before an object version that has been explicitly deleted or overwritten is eligible for garbage collection. Set as an integer number of seconds. 86400 (1 day)
gc_interval The time interval, in seconds, at which the garbage collection daemon runs. This daemon searches for and reaps eligible object versions. 900 (15 minutes)
gc_retry_interval The number of seconds that must elapse before another attempt is made to write a record for an object manifest in the pending_delete state to the garbage collection eligiblity bucket. In general, this condition should be rare, but could happen if an error condition caused the original record in the garbage collection eligiblity bucket to be removed prior to the reaping process completing. 21600 (6 hours)
gc_paginated_indexes This option indicates whether the garbage collection daemon should use paginated secondary index (2i) queries when searching the garbage collection bucket for eligible records to reap. Setting this option to true (the default) is generally more efficient and is recommended for cases where the underlying Riak nodes are of version 1.4.0 or above. true
trust_x_forwarded_for If your load balancer adds an X-Forwarded-For header and it is reliable (i.e. it is able to guarantee that it is not added by a malicious user), set this option to true. Otherwise, Riak CS takes the source IP address as an input (which is the default). false
dtrace_support If your Erlang VM supports DTrace or user-space SystemTamp, set this option to true. false

Webmachine Settings

Config Subsection Description Default
webmachine_log_handler log_handlers
riak_cs_access_log_handler log_handlers

lager Settings

lager is the logging framework used by Riak CS.

Config Subsection Description Default
handlers

SASL Settings

sasl is Erlang's built-in error logger.

Config Subsection Description Default
sasl_error_logger Whether to use Erlang’s built-in error logger. Set to true to enable it or false to disable it. It is disabled by default. false

Erlang VM Settings in vm.args

In addition to an app.config file, each Riak CS node has a vm.args file that you can use to pass arguments to the Erlang VM on which Riak runs. A full listing of configurable parameters for vm.args can be found in our configuration files documentation.