This is intended to be a brief, objective and technical comparison of Riak and Couchbase (i.e. Couchbase Server). The Couchbase version described is 2.0. The Riak version described is Riak 1.2.x. If you feel this comparison is unfaithful at all for whatever reason, please fix it or send an email to email@example.com.
At A Very High Level
- Riak is Apache 2.0 licensed; According to Couchbase, they have two free versions: Couchbase open source is Apache 2.0 licensed; Couchbase Server Community Edition (free version) is licensed under a community agreement
- Riak is written primarily in Erlang with some bits in C; Couchbase is written in Erlang and C/C++
The table below gives a high level comparison of Riak and Couchbase features/capabilities. To keep this page relevant in the face of rapid development on both sides, low level details are found in links to Riak and Couchbase online documentation.
|Data Model||Riak stores key/value pairs in a higher level namespace called a bucket.||Couchbase is a JSON-based document datastore. Like other document datastores, records have no intrinsic relationships, and are stored in buckets. Value size is limited to 20Mbyte.|
|Storage Model||Riak has a modular, extensible local storage system which lets you plug-in a backend store of your choice to suit your use case. The default backend is Bitcask. backend API.||Couchbase 2.0 is largely memory-based, asynchronously persisting data using a CouchDB fork and C library “couchstore” (prior versions of Couchbase use the SQLite storage engine).|
|Data Access and APIs||Riak offers two primary interfaces (in addition to raw Erlang access):||Couchbase provides drivers in several languages to access data through its binary memcached protocol. Couchbase also provides a REST API to monitor and manage a cluster (though it is not used to directly manage stored data).|
|Query Types and Query-ability||There are currently four ways to query data in Riak
||Couchbase also provides four query options|
|Data Versioning and Consistency||Riak uses a data structure called a vector clock to reason about causality and staleness of stored values. Vector clocks enable clients to always write to the database in exchange for consistency conflicts being resolved at read time by either application or client code. Vector clocks can be configured to store copies of a given datum based on size and age of said datum. There is also an option to disable vector clocks and fall back to simple time-stamp based “last-write-wins”.||Couchbase is strongly consistent within a datacenter, replicating data between nodes in a cluster for failover. Inter-datacenter replication follows an eventually consistent CouchDB replication model. Via CouchDB, documents are internally revisioned (stored in a “_rev” value). However, prior revisions will be removed on a file compaction operation, making them unreliable.||Concurrency||In Riak, any node in the cluster can coordinate a read/write operation for any other node. Riak stresses availability for writes and reads, and puts the burden of resolution on the client at read time.||Couchbase claims to be ACID-compliant on a per-item basis, but has no multi-operation transactions. Couchbase clients connect to a server list (or via a proxy) where keys are sharded across the nodes. Couchbase nodes inherit memcached’s default (and recommended) connection limit of 10k.|
|Replication||Riak’s replication system is heavily influenced by the Dynamo Paper and Dr. Eric Brewer’s CAP Theorem. Riak uses consistent hashing to replicate and distribute N copies of each value around a Riak cluster composed of any number of physical machines. Under the hood, Riak uses virtual nodes to handle the distribution and dynamic rebalancing of data, thus decoupling the data distribution from physical assets.||Couchbase supports two types of replication. For intra-datacenter clusters, Couchbase uses membase-style replication, which favors immediate consistency in the face of a network partition. For multi-datacenter deployments, CouchDB’s master-master replication is used.|
|Scaling Out and In||Riak allows you to elastically grow and shrink your cluster while evenly balancing the load on each machine. No node in Riak is special or has any particular role. In other words, all nodes are masterless. When you add a physical machine to Riak, the cluster is made aware of its membership via gossiping of ring state. Once it’s a member of the ring, it’s assigned an equal percentage of the partitions and subsequently takes ownership of the data belonging to those partitions. The process for removing a machine is the inverse of this. Riak also ships with a comprehensive suite of command line tools to help make node operations simple and straightforward.||Couchbase scales elastically by auto-sharding. They can be rebalanced to grow or shrink through the administrative interface.|
|Multi-Datacenter Replication and Awareness||Riak features two distinct types of replication. Users can replicate to any number of nodes in one cluster (which is usually contained within one datacenter over a LAN) using the Apache 2.0 licensed database. Riak Enterprise, Basho’s commercial extension to Riak, is required for Multi-Datacenter deployments (meaning the ability to run active Riak clusters in N datacenters).||Couchbase 2.0 supports cross-datacenter replication (XDCR).|
|Graphical Monitoring/Admin Console||Riak ships with Riak Control, an open source graphical console for monitoring and managing Riak clusters.||Couchbase provides a web-based monitoring/admin console.|