Product tutorials, how-tos, and fully-documented APIs.

Inspecting a Riak Node

    When inspection of a Riak node to gather metrics on performance or potential issues is desired, a number of tools are available to help, and are either included with Riak itself or made available through the Riak community.

    This guide provides starting points and details on some of the available tools for inspecting a Riak node.

    riak-admin status

    riak-admin status is a subcommand of the riak-admin command that is included with every installation of Riak. The status subcommand provides data related to current operating status for a node. The output of riak-admin status is categorized and detailed below.

    Please note, for some counters such as node_get_fsm_objsize a minimum of 5 transactions is required for statistics to be generated.

    One-minute

    One-minute Counters are data points delineating the number of times a particular activity has occurred within the last minute on this node.

    Sample one minute counters:

    FSM_Time

    FSM_Time Counters represent the amount of time in microseconds required to traverse the GET or PUT Finite State Machine code, offering a picture of general node health. From your application's perspective, FSM_Time effectively represents experienced latency. Mean, Median, and 95th-, 99th-, and 100th-percentile (Max) counters are displayed. These are one-minute stats.

    Sample finite state machine time counters:

    GET_FSM_Siblings

    GET_FSM_Sibling Stats offer a count of the number of siblings encountered by this node on the occasion of a GET request. These are one-minute stats.

    Sample finite state machine sibling counters:

    GET_FSM_Objsize

    GET_FSM_Objsize is a window on the sizes of objects flowing through this node's GET_FSM. The size of an object is obtained by summing the length of the bucket name, key, the serialized vector clock, the value, and the serialized metadata of each sibling. GET_FSM_Objsize and GET_FSM_Siblings are inextricably linked. These are one-minute stats.

    Sample finite state machine object size counters:

    Totals

    Total Counters are data points that represent the total number of times a particular activity has occurred since this node was started.

    Sample total counters:

    CPU and Memory

    CPU statistics are taken directly from Erlang’s cpu\_sup module. Documentation for which can be found at ErlDocs: cpu_sup.

    Memory statistics are taken directly from the Erlang virtual machine. Documentation for which can be found at ErlDocs: Memory.

    Miscellaneous Information

    Miscellaneous Information stats are data points that provide details particular to this node.

    Sample miscellaneous information statistics:

    Pipeline Metrics

    The following metrics from from riak_pipe are generated during MapReduce operations.

    Application and Subsystem Versions

    The specific version of each Erlang application and subsystem which makes up a Riak node is present in riak-admin status output.

    Riak Search Statistics

    The following statistics related to Riak Search message queues are available.

    Note that under ideal operation and with the exception of riak_search_vnodes_running these statistics should contain low values (e.g., 0-10). Presence of higher values could be indicative of an issue.

    Riaknostic

    Riaknostic is a small suite of diagnostic checks that can be run against a Riak node to discover common problems, and recommend how to resolve them. These checks are derived from the experience of the Basho Client Services Team as well as numerous public discussions on the mailing list, #riak IRC channel, and other online media.

    As of Riak version 1.3, Riaknostic is installed by default.

    Riaknostic is included with Riak and exposed through the riak-admin diag command. It is an open source project developed by Basho Technologies and Riak community members. The code is available in the Riaknostic Github repository.

    Related Resources