Product tutorials, how-tos, and fully-documented APIs.

Multi Data Center Replication: Operations

    Riak Enterprise Only
    This documentation applies only to Riak Enterprise, Basho's commercial extension to Riak. To talk to us about using Riak Enterprise, let us know.

    The riak-repl Command

    Replication is controlled by the riak-repl command. Usage:

    add-listener Adds a listener (primary) to the given node, IP address and port.

    add-nat-listener Adds a NAT aware listener (primary) to the given node, IP address, port, NAT IP, and NAT port. If a non-NAT listener already exists with the same internal ip and port, it is “upgraded” to a NAT Listener.

    del-listener Removes and shuts down a listener (primary) on the given node, IP address and port.

    add-site Adds a site (secondary) to the local node, connecting to the specified listener.

    del-site Removes a site (secondary) from the local node by name.

    status Gets status information about replication. Reports counts on how much data has been transmitted, transfer rates, message queue lengths of clients and servers, number of full synchronizations, and connection status. This command only displays useful information on the leader node.

    start-fullsync Manually initiates full synchronization with connected sites.

    cancel-fullsync Cancels any full synchronizations in progress. If a partition is in progress, synchronization will stop after that partition completes. During cancellation, riak-repl status will show 'cancelled' in the status.

    pause-fullsync Pauses any full synchronizations in progress. If a partition is in progress, synchronization will pause after that partition completes. While paused, riak-repl status will show 'paused' in the status information. Fullsync may be cancelled while paused.

    resume-fullsync Resumes any full synchronizations that were paused. If full synchronization was running at the time of the pause, the next partition will be synchronized. If not, it will wait until the next start-fullsync command/ fullsync_interval.

    riak-repl status output

    The following definitions describe the output of riak-repl status. Please note that many of these statistics will only appear on the current leader node.

    All counts will be reset to 0 upon restarting Riak EE.

    Field Description
    listener_[nodeid] Defines a replication listener that is running on node [nodeid]
    [sitename]_ips Defines a replication site
    client_bytes_recv The total number of bytes the client has received since the server has been started
    client_bytes_sent The total number of bytes sent to all connected sites
    client_connect_errors The number of TCP/IP connection errors
    client_connects A count of the number of site connections made to this node.
    client_redirect If a client connects to a non-leader node, it will be redirected to a leader node
    client_rx_kbps A snapshot of the client (site) received kilobits/second taken once a minute. The past 8 snapshots are stored in this list. Newest snapshots appear on the left side of the list
    client_tx_kbps A snapshot of the client (site) sent kilobits/second taken once a minute. The past 8 snapshots are stored in this list. Newest snapshots appear on the left side of the list
    elections_elected If the replication leader node becomes unresponsive or unavailable, a new leader node in the cluster will be elected
    elections_leader_changed The number of times a Riak node has surrendered leadership
    objects_dropped_no_clients If the real-time replication work queue is full and there aren't any clients to receive objects, then objects will be dropped from the queue. These objects will be synchronized during a full synchronization
    objects_dropped_no_leader If a client (site) cannot connect to a leader, objects will be dropped during real-time replication
    objects_forwarded The number of Riak objects forwarded to the leader the participate in replication. Please note that this value will only be accurate on a non-leader node.
    objects_sent The number of objects sent via real-time replication
    server_bytes_recv The total number of bytes the server (listener) has received
    server_bytes_sent The total number of bytes the server (listener) has sent
    server_connect_errors The number of listener to site connection errors
    server_connects The number of times the listener connects to the client site
    server_fullsyncs The number of full-synchronizations that have occurred since the server was started
    server_rx_kbps A snapshot of the server (listener) received kilobits/second taken once a minute. The past 8 snapshots are stored in this list. Newest snapshots appear on the left side of the list
    server_tx_kbps A snapshot of the server (listener) sent kilobits/second taken once a minute. The past 8 snapshots are stored in this list. Newest snapshots appear on the left side of the list
    leader Which node is the current leader of the cluster
    local_leader_message_queue_len The length of the object queue on the leader
    local_leader_heap_size The amount of memory the leader is using
    client_stats See Client Statistics
    server_stats See Server Statistics

    Client Statistics

    Field Description
    node A unique ID for the Riak node that the client (site) in running on
    site The connected site name configured with riak-repl add-site
    strategy A replication strategy defines an implementation of the Riak Replication protocol. Valid values are: keylist, syncv1
    fullsync_worker The Erlang process ID of the full synchronization worker
    waiting_to_retry The listeners currently waiting to retry replication after a failure
    connected A list of connected clients
    • connected The IP address and port of a connected client (site)
    • cluster_name The name of the connected client (site)
    • connecting The PID, IP address and port of a client currently establishing a connection
    state State shows what the current replication strategy is currently processing. The following definitions appear in the status output if keylist strategy is being used. They can be used by Basho support to identify replication issues.
    • request_partition
    • wait_for_fullsync
    • send_keylist
    • wait_ack

    Server Statistics

    Field Description
    node A unique ID for the Riak node that the server (listener) in running on
    site The connected site name configured with riak-repl add-site
    strategy A replication strategy defines an implementation of the Riak Replication protocol. Valid values are: keylist, syncv1
    fullsync_worker The Erlang process ID of the full synchronization worker
    bounded_queue See Bounded Queue
    state State shows what the current replication strategy is currently processing. The following definitions appear in the status output if keylist strategy is being used. They can be used by Basho support to identify replication issues.
    • wait_for_partition
    • build_keylist
    • wait_keylist
    • diff_bloom
    • diff_keylist
    message_queue_len The number of Erlang messages that are waiting to be process by the server

    Keylist Strategy

    These similar fields are under both keylist_server and keylist_client fields. Any differences are described in the table.

    Field Description
    fullsync On the client, the number of partitions that remain to be processed. On the server, the partition currently being processed by full-synchronization replication.
    partition_start The number of elapsed seconds since replication has started on a given partition
    stage_start The number of elapsed seconds since replication has started on a given stage
    get_pool_size The number of Riak GET finite state workers available to process requests

    Bounded Queue

    The bounded queue is responsible for holding objects that are waiting to participate in real-time replication. Please see the Riak EE MDC Replication Configuration guide for more information.

    Field Description
    queue_pid The Erlang process ID of the bounded queue
    dropped_count The number of objects that failed to be enqueued in the bounded_queue due to the queue being full. These objects will be replicated during the next full synchronization
    queue_length The number of Riak objects currently in the bounded queue
    queue_byte_size The size of all objects currently in the queue
    queue_max_size The number of bytes the queue can hold before objects are dropped. These objects will be replicated during the next full synchronization
    queue_percentage The percentage of the queue that is full
    queue_pending The current count of 'in flight' objects we've sent that the client has not acknowledged
    queue_max_pending The maximum number of objects that can be 'in flight' before we refuse to send any more.