Download Riak 2.0

Riak Compared to Couchbase

This is intended to be a brief, objective, and technical comparison of Riak and Couchbase (i.e. Couchbase Server). The Couchbase version described is 2.0. The Riak version described is Riak 2.x. If you feel this comparison is unfaithful for whatever reason, please submit an issue or send an email to docs@basho.com.

At A Very High Level

Couchbase vs CouchDB

Keep in mind that Couchbase and CouchDB are two separate database projects. CouchDB is a document database providing replication, MapReduce and an HTTP API. Couchbase uses CouchDB as its backend, “wrapping” it with advanced features like caching, and is designed to be clustered.

Feature/Capability Comparison

The table below gives a high level comparison of Riak and Couchbase features/capabilities. To keep this page relevant in the face of rapid development on both sides, low-level details are found in links to the online documentation for both Riak and Couchbase.

Feature/Capability Riak Couchbase
Data Model Riak stores key/value pairs under keys in buckets. Using bucket types you can set bucket-level configurations for things like replication properties. In addition to basic key/value lookup, Riak has a variety of features for discovering objects, including Riak Search and secondary indexes. Couchbase is a JSON-based document datastore. Like other document datastores, records have no intrinsic relationships, and are stored in buckets. Value size is limited to 20Mbyte.
Storage Model Riak has a modular, extensible local storage system that lets you plug in a backend store of your choice to suit your use case. The default backend is Bitcask. You can also write your own storage backend for Riak using our backend API. Couchbase 2.0 is largely memory-based, asynchronously persisting data using a CouchDB fork and C library “couchstore” (prior versions of Couchbase use the SQLite storage engine).
Data Access and APIs Riak offers two primary interfaces (in addition to raw Erlang access): Riak client libraries are wrappers around these APIs, and client support exists for dozens of languages. Basho currently has officially supported clients for Java, Ruby, Python, and Erlang. Couchbase provides drivers in several languages to access data through its binary memcached protocol. Couchbase also provides a REST API to monitor and manage a cluster (though it is not used to directly manage stored data).
Query Types and Queryability There are currently five ways to query data in Riak: Couchbase also provides four query options Hadoop support is also possible through a plugin that streams data to a Hadoop Distributed File System (HDFS) or Hive for processing.
Data Versioning and Consistency Riak uses a data structure called a vector clock to reason about causality and staleness of stored values. Vector clocks enable clients to always write to the database in exchange for consistency conflicts being resolved at read time by either application or client code. Vector clocks can be configured to store copies of a given datum based on size and age of said datum. There is also an option to disable vector clocks and fall back to simple time-stamp based “last-write-wins”. Couchbase is strongly consistent within a datacenter, replicating data between nodes in a cluster for failover. Inter-datacenter replication follows an eventually consistent CouchDB replication model. Via CouchDB, documents are internally revisioned (stored in a “_rev” value). However, prior revisions will be removed on a file compaction operation, making them unreliable.
Concurrency In Riak, any node in the cluster can coordinate a read/write operation for any other node. Riak stresses availability for writes and reads, and puts the burden of resolution on the client at read time. Couchbase claims to be ACID-compliant on a per-item basis, but has no multi-operation transactions. Couchbase clients connect to a server list (or via a proxy) where keys are sharded across the nodes. Couchbase nodes inherit memcached’s default (and recommended) connection limit of 10k.
Replication Riak’s replication system is heavily influenced by the Dynamo Paper and Dr. Eric Brewer’s CAP Theorem. Riak uses consistent hashing to replicate and distribute N copies of each value around a Riak cluster composed of any number of physical machines. Under the hood, Riak uses virtual nodes to handle the distribution and dynamic rebalancing of data, thus decoupling the data distribution from physical assets. The Riak APIs expose tunable consistency and availability parameters that let you select which level configuration is best for your use case. Replication is configurable at the bucket level when first storing data in Riak. Subsequent reads and writes to that data can have request-level parameters.
  • [[Reading, Writing, and Updating Data]]
Couchbase supports two types of replication. For intra-datacenter clusters, Couchbase uses membase-style replication, which favors immediate consistency in the face of a network partition. For multi-datacenter deployments, CouchDB’s master-master replication is used.
Scaling Out and In Riak allows you to elastically grow and shrink your cluster while evenly balancing the load on each machine. No node in Riak is special or has any particular role. In other words, all nodes are masterless. When you add a physical machine to Riak, the cluster is made aware of its membership via gossiping of ring state. Once it’s a member of the ring, it’s assigned an equal percentage of the partitions and subsequently takes ownership of the data belonging to those partitions. The process for removing a machine is the inverse of this. Riak also ships with a comprehensive suite of command line tools to help make node operations simple and straightforward. Couchbase scales elastically by auto-sharding. They can be rebalanced to grow or shrink through the administrative interface.
Multi-Datacenter Replication and Awareness Riak features two distinct types of replication. Users can replicate to any number of nodes in one cluster (which is usually contained within one datacenter over a LAN) using the Apache 2.0 licensed database. Riak Enterprise, Basho’s commercial extension to Riak, is required for Multi-Datacenter deployments (meaning the ability to run active Riak clusters in N datacenters). Couchbase 2.0 supports cross-datacenter replication (XDCR).
Graphical Monitoring/Admin Console Riak ships with Riak Control, an open source graphical console for monitoring and managing Riak clusters. Couchbase provides a web-based monitoring/admin console.