This is intended to be a brief, objective and technical comparison of Riak and CouchDB. The CouchDB version described is 1.2.x. The Riak version described is Riak 1.2.x. If you feel this comparison is unfaithful at all for whatever reason, please fix it or send an email to email@example.com.
At A Very High Level
- Riak and CouchDB are both Apache 2.0 licensed
- Riak is written primarily in Erlang with some bits in C; CouchDB is written in Erlang
The table below gives a high level comparison of Riak and CouchDB features/capabilities. To keep this page relevant in the face of rapid development on both sides, low level details are found in links to Riak and CouchDB online documentation.
|Data Model||Riak stores key/value pairs in a higher level namespace called a bucket.||CouchDB’s data format is JSON stored as documents (self-contained records with no intrinsic relationships), grouped into “database” namespaces.|
|Storage Model||Riak has a modular, extensible local storage system which lets you plug-in a backend store of your choice to suit your use case. The default backend is Bitcask. backend API.||CouchDB stores data to disk by “append-only” files. As the files continue to grow, they require occasional compaction.|
|Data Access and APIs||Riak offers two primary interfaces (in addition to raw Erlang access):||CouchDB provides an HTTP API for both data access and administration.|
|Query Types and Query-ability||There are currently four ways to query data in Riak
||CouchDB is generally queried by direct ID lookups, or by creating MapReduce “views” that CouchDB runs to create a queryable index for querying by or computing other attributes. In addition, the ChangesAPI shows documents in the order they were last modified. Finally, there exist some community plugins to expand CouchDB’s queryability, such as the CouchDB-Lucene full-text search plugin.|
|Data Versioning and Consistency||Riak uses a data structure called a vector clock to reason about causality and staleness of stored values. Vector clocks enable clients to always write to the database in exchange for consistency conflicts being resolved at read time by either application or client code. Vector clocks can be configured to store copies of a given datum based on size and age of said datum. There is also an option to disable vector clocks and fall back to simple time-stamp based “last-write-wins”.||CouchDB replicates newer document versions between nodes, making it an eventually consistent system. CouchDB uses Multi-Version Concurrency Control (MVCC) to avoid locking the database file during writes. Conflicts are left to the application to resolve at write time. Older document versions (called revisions) may be lost when the append-only database file is compacted.||Concurrency||In Riak, any node in the cluster can coordinate a read/write operation for any other node. Riak stresses availability for writes and reads, and puts the burden of resolution on the client at read time.||Because of CouchDB’s append-only value mutation, individual instances will not lock. When distributed, CouchDB won’t allow updating similarly keyed document without a preceding version number, and conflicts must be manually resolved before concluding a write.|
|Replication||Riak’s replication system is heavily influenced by the Dynamo Paper and Dr. Eric Brewer’s CAP Theorem. Riak uses consistent hashing to replicate and distribute N copies of each value around a Riak cluster composed of any number of physical machines. Under the hood, Riak uses virtual nodes to handle the distribution and dynamic rebalancing of data, thus decoupling the data distribution from physical assets.||CouchDB incrementally replicates document changes between nodes. It can be deployed with master/master or master/slave replication. Replication can be finely controlled by way of replication filters.|
|Scaling Out and In||Riak allows you to elastically grow and shrink your cluster while evenly balancing the load on each machine. No node in Riak is special or has any particular role. In other words, all nodes are masterless. When you add a physical machine to Riak, the cluster is made aware of its membership via gossiping of ring state. Once it’s a member of the ring, it’s assigned an equal percentage of the partitions and subsequently takes ownership of the data belonging to those partitions. The process for removing a machine is the inverse of this. Riak also ships with a comprehensive suite of command line tools to help make node operations simple and straightforward.||Out of the box, CouchDB is focused on a master-master replication of values (using MVCC to help with conflict resolution). There are external projects that help manage a CouchDB cluster, such as BigCouch (also Apache 2.0 licensed), that shards values across multiple nodes.|
|Multi-Datacenter Replication and Awareness||Riak features two distinct types of replication. Users can replicate to any number of nodes in one cluster (which is usually contained within one datacenter over a LAN) using the Apache 2.0 licensed database. Riak Enterprise, Basho’s commercial extension to Riak, is required for Multi-Datacenter deployments (meaning the ability to run active Riak clusters in N datacenters).||CouchDB can be configured to run in multiple datacenters. Robust awareness will generally require a third part solution, or by developing replication filters.|
|Graphical Monitoring/Admin Console||Riak ships with Riak Control, an open source graphical console for monitoring and managing Riak clusters.||CouchDB ships with a graphical interface called Futon.|