Download Riak 2.0 RC1

Cluster Capacity Planning

This is a short document that outlines the various elements and variables that should be considered when planning your Riak cluster. Your use case and environment variables will obviously be specific to what you're building, but this document should set you on the right path to planning and launching the suitable Riak cluster.

RAM

RAM should be viewed as the most important resource when sizing your Riak cluster. Aside from helping you keep more data closer to your users, memory will also be required when running complex MapReduce queries, and caching data to provide low latency request times.

Bitcask and Memory Requirements

Your choice of local storage backend for Riak directly impacts your RAM needs. Though Riak has pluggable backend storage, it ships with Bitcask by default, and this is the recommended production backend. Why? Because it's purpose built for:

Bitcask's one major requirement, however, is that it must keep the entire “keydir” in memory. The “keydir” is a hash table that maps each concatenated bucket+key name in a Bitcask (“a Bitcask” is the name for each file contained within each Bitcask backend ) to a fixed-size structure giving the file, offset, and size of the most recently written entry for that bucket+key on disk.

If you want to read more about what the keydir is and what it entails, and some more about Bitcask in general, go here and here. (You should read these.)

When you calculate that your RAM needs will exceed your hardware resources–in other words, if you can't afford the RAM to enable you to use Bitcask—we recommend that you use LevelDB.

Check out Bitcask Capacity Planning for more details on designing a Bitcask-backed cluster.

LevelDB

If RAM requirements for Bitcask are prohibitive, Basho recommends use of the LevelDB backend. While LevelDB doesn't require a large amount of RAM to operate, supplying it with the maximum amount of memory available will lead to higher performance.

For more information see LevelDB.

Disk

Now that you have an idea of how much RAM you'll need, it's time to think about disk space. Disk space needs are much easier to calculate and essentially boil down to this simple equation:

Estimated Total Objects * Average Object Size * n_val

For example, with

then you would need just over approximately 286 GB of disk space in the entire cluster to accommodate your data.

(Here at Basho, we believe that databases should be durable out of the box. When we built Riak, we did so in a way that you could write to disk while keeping response times below your users' expectations. So this calculation assumes you'll be keeping the entire data set on disk.)

Many of the considerations taken when configuring a machine to serve a database can be applied to configuring a node for Riak as well. Mounting disks with noatime and having separate disks for your OS and Riak data lead to much better performance. See System Planning for more information.

Disk Space Planning and Ownership Handoff

When Riak nodes fail or leave the cluster for some other reason, other nodes in the cluster begin engaging in the process of ownership handoff, whereby the remaining nodes assume ownership of the data partitions handled by the node that has left. While this is an expected state of affairs in Riak, one side effect is that this requires more intensive disk space usage from the other nodes, in rare cases to the point of filling the disk of one or more of those nodes.

When making disk space planning decisions, we recommend that you:

Another possibility worth considering is using Riak with a filesystem that allows for growth, for example LVM), RAID, or ZFS.

Read/Write Profile

Read/write ratios, as well as the distribution of key access, should influence the configuration and design of your cluster. If your use case is write heavy you will need less RAM for caching, in addition if only a certain portion of keys is accessed regularly, such as a pareto distribution, you won't need as much RAM available to cache those keys' values.

Number of Nodes

The number of nodes (i.e. physical servers) in your Riak Cluster depends on the number of times data is Replicated across the cluster. To ensure that the cluster is always available to respond to read and write requests, Basho recommends a “sane default” of N=3 replicas. This requirement can be met with a three or four node cluster (you can tweak nodes installed through the Five Minute Install). However, for production deployments we recommend using no fewer than 5 nodes, as node failures in smaller clusters can compromise the fault-tolerance of the system. Additionally, in clusters smaller than 5 nodes, a high percentage of the nodes (75-100% of them) will need to respond to each request, putting undue load on the cluster that may degrade performance. For more details on this recommendation, see this blog post.

Ring Size/Number of Partitions

Ring size is the number of partitions that make up your Riak Cluster. This is a number that is configured before you cluster is started, and is set in your app.config file under the ring_creation_size parameter.

The default number of partitions in a Riak cluster is 64. This works for smaller clusters, but if you plan to grow your cluster past 5 nodes it is recommended you consider a larger ring size. Ring sizes must be a power of 2. The minimum number of partitions recommended per node is 10, and you can determine the number of partitions that will be allocated per node by dividing the number of partitions by the number of nodes.

For most moderate-sized Riak clusters (8-16 nodes) 128, 256, and 512 partitions are excellent options that will allow you to incrementally grow (or shrink) your cluster. If you're unsure about the best number of partitions to use, consult the Riak Mailing List for some suggestions.

Other Factors

Riak is built to run in a clustered environment and while it will compensate for network partitions they cause increased load on the system. In addition, running in a virtualized environment that lacks low latency IO access can drastically decrease performance. Before putting your Riak cluster in production is recommended you gain a full understanding of your environment's behavior so you know how your cluster performs under load for an extended period of time. Doing so will help you size your cluster for future growth and lead to optimal performance.

Basho recommends using Basho Bench for benchmarking the performance of your cluster.

Bandwidth

Riak uses Erlang's built in distribution capabilities to provide reliable access to data. A Riak cluster can be deployed in many different network topologies, but it is recommended to have as little latency between nodes as possible. High latency leads to sub-optimal performance. It is not recommended to deploy a single Riak cluster across two data centers. If you need this capability Basho offers an inter-datacenter replication option that is built to keep multiple Riak clusters in sync across several geographically diverse deployments.

IO

In general the biggest bottleneck for Riak will be the amount of IO available to it, especially in the case of write-heavy work loads. Riak functions much like any other database and the design of your disk access should take this into account. Because Riak is clustered and your data is stored on multiple physical nodes, you can consider forgoing a tradition RAID setup for redundancy and focus on providing the least latency possible using SATA Drives or SSDs, for example.

Additional resources