Run a Sensu cluster

To deploy Sensu for use outside of a local development environment, first decide whether you want to run a Sensu cluster.

A Sensu cluster is a group of at least three sensu-backend nodes, each connected to a shared database provided either by Sensu’s embedded etcd or an external etcd cluster. Creating a Sensu cluster ultimately configures an etcd cluster.

Clustering improves Sensu’s availability, reliability, and durability. It allows you to absorb the loss of a backend node, prevent data loss, and distribute the network load of agents. If you have a healthy clustered backend, you only need to make Sensu API calls to any one of the cluster members. The cluster protocol will replicate your changes to all cluster members.

Scaling a single backend to a cluster or migrating a cluster from cleartext HTTP to encrypted HTTPS without downtime can require a number of tedious steps. For this reason, we recommend that you decide whether your deployment will require clustering as part of your initial planning effort.

No matter whether you deploy a single backend or a clustered configuration, begin by securing Sensu with transport layer security (TLS). The first step in setting up TLS is to generate the certificates you need. Then, follow our Secure Sensu guide to make Sensu production-ready.

After you’ve secured Sensu, continue reading this document to set up and update a clustered configuration.

NOTE: We recommend using a load balancer to evenly distribute agent connections across a cluster.

Configure a cluster

The sensu-backend arguments for its store mirror the etcd configuration flags, but the Sensu configuration options are prefixed with etcd. For more detailed descriptions of the different arguments, read the etcd documentation or Sensu backend reference.

You can configure a Sensu cluster in a couple different ways — we’ll show you a few below — but you should adhere to some etcd cluster guidelines as well:

The recommended etcd cluster size is 3, 5 or 7, which is decided by the fault tolerance requirement. A 7-member cluster can provide enough fault tolerance in most cases. While a larger cluster provides better fault tolerance, the write performance reduces since data needs to be replicated to more machines. It is recommended to have an odd number of members in a cluster. Having an odd cluster size doesn’t change the number needed for majority, but you gain a higher tolerance for failure by adding the extra member. etcd2 Admin Guide

We also recommend using stable platforms to support your etcd instances (review etcd’s supported platforms).

NOTE: If a cluster member is started before it is configured to join a cluster, the member will persist its prior configuration to disk. For this reason, you must remove any previously started member’s etcd data by stopping sensu-backend and deleting the contents of /var/lib/sensu/sensu-backend/etcd before proceeding with cluster configuration.

Docker

If you prefer to stand up your Sensu cluster within Docker containers, check out the Sensu Go Docker configuration. This configuration defines three sensu-backend containers and three sensu-agent containers.

Traditional computer instance

NOTE: The remainder of this guide describes on-disk configuration. If you are using an ephemeral computer instance, you can use sensu-backend start --help to list etcd command line flags. The configuration file entries in the rest of this guide translate to sensu-backend flags.

Sensu backend configuration

WARNING: You must update the default configuration for Sensu’s embedded etcd with an explicit, non-default configuration to secure etcd communication in transit. If you do not properly configure secure etcd communication, your Sensu configuration will be vulnerable to unauthorized manipulation via etcd client connections.

The examples in this section are configuration snippets from /etc/sensu/backend.yml using a three-node cluster. The nodes are named backend-1.example.com, backend-2.example.com and backend-3.example.com with IP addresses 10.0.0.1, 10.0.0.2 and 10.0.0.3, respectively.

NOTE: This backend configuration assumes you have set up and installed the sensu-backend on all the nodes used in your cluster. Follow the Install Sensu guide if you have not already done this.

Store configuration for backend-1.example.com/10.0.0.1

etcd-advertise-client-urls: "https://10.0.0.1:2379"
etcd-listen-client-urls: "https://10.0.0.1:2379"
etcd-listen-peer-urls: "https://0.0.0.0:2380"
etcd-initial-cluster: "backend-1.example.com=https://10.0.0.1:2380,backend-2.example.com=https://10.0.0.2:2380,backend-3.example.com=https://10.0.0.3:2380"
etcd-initial-advertise-peer-urls: "https://10.0.0.1:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: "unique_token_for_this_cluster"
etcd-name: "backend-1.example.com"

Store configuration for backend-2.example.com/10.0.0.2

etcd-advertise-client-urls: "https://10.0.0.2:2379"
etcd-listen-client-urls: "https://10.0.0.2:2379"
etcd-listen-peer-urls: "https://0.0.0.0:2380"
etcd-initial-cluster: "backend-1.example.com=https://10.0.0.1:2380,backend-2.example.com=https://10.0.0.2:2380,backend-3.example.com=https://10.0.0.3:2380"
etcd-initial-advertise-peer-urls: "https://10.0.0.2:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: "unique_token_for_this_cluster"
etcd-name: "backend-2.example.com"

Store configuration for backend-3.example.com/10.0.0.3

etcd-advertise-client-urls: "https://10.0.0.3:2379"
etcd-listen-client-urls: "https://10.0.0.3:2379"
etcd-listen-peer-urls: "https://0.0.0.0:2380"
etcd-initial-cluster: "backend-1.example.com=https://10.0.0.1:2380,backend-2.example.com=https://10.0.0.2:2380,backend-3.example.com=https://10.0.0.3:2380"
etcd-initial-advertise-peer-urls: "https://10.0.0.3:2380"
etcd-initial-cluster-state: "new"
etcd-initial-cluster-token: "unique_token_for_this_cluster"
etcd-name: "backend-3.example.com"

WARNING: To properly secure etcd communication, replace the default URLs for etcd-advertise-client-urls, etcd-listen-client-urls, etcd-listen-peer-urls, and etcd-initial-cluster in the store configurations for your backends with non-default values.

Specify the same etcd-initial-cluster-token value for all three backends. This allows etcd to generate unique cluster IDs and member IDs even for clusters that have otherwise identical configurations and prevents cross-cluster-interaction.

After you configure each node as described in these examples, start each sensu-backend:

sudo systemctl start sensu-backend
Add Sensu agents to clusters

Each Sensu agent should have the following entries in /etc/sensu/agent.yml to ensure the agent is aware of all cluster members. This allows the agent to reconnect to a working backend if the backend it is currently connected to goes into an unhealthy state.

Here is an example backend-url configuration for all agents connecting to the cluster over WebSocket:

backend-url:
  - "ws://10.0.0.1:8081"
  - "ws://10.0.0.2:8081"
  - "ws://10.0.0.3:8081"

You should now have a highly available Sensu cluster! Confirm cluster health and try other cluster management commands with sensuctl.

Manage and monitor clusters with sensuctl

Sensuctl includes several commands to help you manage and monitor your cluster. Run sensuctl cluster -h for additional help information.

Get cluster health status

Get cluster health status and etcd alarm information:

sensuctl cluster health

The cluster health response will list the health status for each cluster member, similar to this example:

       ID            Name                                      Error                           Healthy  
────────────────── ────────────────────── ─────────────────────────────────────────────────── ─────────
a32e8f613b529ad4   backend-1.example.com                                                        true
c3d9f4b8d0dd1ac9   backend-2.example.com  dial tcp 10.0.0.2:2379: connect: connection refused   false
c8f63ae435a5e6bf   backend-3.example.com                                                        true

Add a cluster member

To add a new member node to an existing cluster:

  1. Configure the new node’s store in its /etc/sensu/backend.yml configuration file. For the new node backend-4.example.com with IP address 10.0.0.4:

    etcd-advertise-client-urls: "https://10.0.0.4:2379"
    etcd-listen-client-urls: "https://10.0.0.4:2379"
    etcd-listen-peer-urls: "https://0.0.0.0:2380"
    etcd-initial-cluster: "backend-1.example.com=https://10.0.0.1:2380,backend-2.example.com=https://10.0.0.2:2380,backend-3.example.com=https://10.0.0.3:2380,backend-4.example.com=https://10.0.0.4:2380"
    etcd-initial-advertise-peer-urls: "https://10.0.0.4:2380"
    etcd-initial-cluster-state: "existing"
    etcd-initial-cluster-token: "unique_token_for_this_cluster"
    etcd-name: "backend-4.example.com"

    NOTE: To make sure the new member is added to the correct cluster, specify the same etcd-initial-cluster-token value that you used for the other members in the cluster.

    Also, when you are adding a cluster member, make sure the etcd-initial-cluster-state value is existing, not new.

  2. Run the sensuctl command to add the new cluster member:

    sensuctl cluster member-add backend-4.example.com https://10.0.0.4:2380

    You will receive a sensuctl response to confirm that the new member was added:

    added member 2f7ae42c315f8c2d to cluster
  3. Start the new backend:

    sudo systemctl start sensu-backend
  4. Add the new cluster member’s WebSocket backend-url in /etc/sensu/agent.yml for all agents that connect to the cluster over WebSocket:

    backend-url:
      - "ws://10.0.0.1:8081"
      - "ws://10.0.0.2:8081"
      - "ws://10.0.0.3:8081"
      - "ws://10.0.0.4:8081"

List cluster members

List the ID, name, peer URLs, and client URLs of all nodes in a cluster:

sensuctl cluster member-list

You will receive a sensuctl response that lists all cluster members:

       ID            Name                        Peer URLs                Client URLs
────────────────── ─────────────────────── ───────────────────────── ─────────────────────────
a32e8f613b529ad4   backend-1.example.com    https://10.0.0.1:2380     https://10.0.0.1:2379  
c3d9f4b8d0dd1ac9   backend-2.example.com    https://10.0.0.2:2380     https://10.0.0.2:2379
c8f63ae435a5e6bf   backend-3.example.com    https://10.0.0.3:2380     https://10.0.0.3:2379
2f7ae42c315f8c2d   backend-4.example.com    https://10.0.0.4:2380     https://10.0.0.4:2379

Remove a cluster member

Remove a faulty or decommissioned member node from a cluster:

sensuctl cluster member-remove 2f7ae42c315f8c2d

You will receive a sensuctl response to confirm that the cluster member was removed:

Removed member 2f7ae42c315f8c2d from cluster

Replace a faulty cluster member

To replace a faulty cluster member to restore a cluster’s health:

  1. Get cluster health status and etcd alarm information:

    sensuctl cluster health

    In the response, for a faulty cluster member, the Error column will include an error message and the Healthy column will list false. In this example, the response indicates that cluster member backend-4 is faulty:

           ID            Name                                       Error                           Healthy  
    ────────────────── ─────────────────────── ─────────────────────────────────────────────────── ─────────
    a32e8f613b529ad4   backend-1.example.com                                                        true
    c3d9f4b8d0dd1ac9   backend-2.example.com                                                        true
    c8f63ae435a5e6bf   backend-3.example.com                                                        true
    2f7ae42c315f8c2d   backend-4.example.com  dial tcp 10.0.0.4:2379: connect: connection refused   false

  2. Remove the faulty cluster member — in this example, backend-4 — using its ID. Removing the faulty cluster member prevents the cluster size from growing.

    sensuctl cluster member-remove 2f7ae42c315f8c2d

    The response should indicate that the cluster member was removed:

    Removed member 2f7ae42c315f8c2d from cluster

  3. Follow the steps in Add a cluster member to configure the replacement cluster member.

    NOTE: You can use the same name and IP address as the removed faulty member for the replacement cluster member. When updating the replacement member’s backend configuration file, make sure the etcd-initial-cluster-state value is existing, not new.

If replacing the faulty cluster member does not resolve the problem, read the etcd operations guide for more information.

Update a cluster member

Update the peer URLs of a member in a cluster:

sensuctl cluster member-update c8f63ae435a5e6bf https://10.0.0.4:2380

You will receive a sensuctl response to confirm that the cluster member was updated:

Updated member with ID c8f63ae435a5e6bf in cluster

Cluster security

Read Secure Sensu for information about cluster security.

Use an external etcd cluster

WARNING: You must update the example configuration for external etcd with an explicit, non-default configuration to secure etcd communication in transit. If you do not properly configure secure etcd communication, your Sensu configuration will be vulnerable to unauthorized manipulation via etcd client connections.

To use Sensu with an external etcd cluster, you must have etcd 3.3.2 or newer. To stand up an external etcd cluster, follow etcd’s clustering guide using the same store configuration. Do not configure external etcd in Sensu via backend command line flags or the backend configuration file (/etc/sensu/backend.yml).

Configure key space access

Follow these steps to configure read and write access to the /sensu.io/ key space for your users so you can initialize a backend that uses etcd authentication.

  1. Add the sensu user:

    etcdctl user add sensu

  2. Enter the sensu user password when prompted.

  3. Create the sensu_readwrite role:

    etcdctl role add sensu_readwrite

  4. Grant read/write permissions to the sensu_readwrite role under the /sensu.io/ key space:

    etcdctl role grant-permission sensu_readwrite readwrite --from-key '/sensu.io/'

  5. Grant the sensu_readwrite role to the sensu user:

    etcdctl user grant-role sensu sensu_readwrite

  6. Confirm that the grant is configured correctly:

    /opt/etcd/etcdctl user get USERNAME --detail

    You should see the following output:

    User: USERNAME
    
    Role sensu_readwrite
    KV Read:
      [/sensu.io/, <open ended>
    KV Write:
      [/sensu.io/, <open ended>

Etcd does not enable authentication by default, so additional configuration may be needed before etcd will enforce these controls. See the etcd operators documentation for details.

Start etcd

In this example, you will enable client-to-server and peer communication authentication using self-signed TLS certificates. To start etcd for backend-1.example.com based on the three-node configuration example:

etcd \
--listen-client-urls "https://10.0.0.1:2379" \
--advertise-client-urls "https://10.0.0.1:2379" \
--listen-peer-urls "https://10.0.0.1:2380" \
--initial-cluster "backend-1.example.com=https://10.0.0.1:2380,backend-2.example.com=https://10.0.0.2:2380,backend-3.example.com=https://10.0.0.3:2380" \
--initial-advertise-peer-urls "https://10.0.0.1:2380" \
--initial-cluster-state "new" \
--name "backend-1.example.com" \
--trusted-ca-file=./ca.pem \
--cert-file=./backend-1.example.com.pem \
--key-file=./backend-1.example.com-key.pem \
--client-cert-auth \
--peer-trusted-ca-file=./ca.pem \
--peer-cert-file=./backend-1.example.com.pem \
--peer-key-file=./backend-1.example.com-key.pem \
--peer-client-cert-auth \
--auto-compaction-mode revision \
--auto-compaction-retention 2

NOTE: Without the auto-compaction-mode and auto-compaction-retention flags, your database may quickly reach etcd’s maximum database size limit.

Tell Sensu to use this external etcd data source by adding the sensu-backend flag --no-embed-etcd to the original configuration and the path to a client certificate created using your CA:

sensu-backend start \
--etcd-trusted-ca-file=./ca.pem \
--etcd-cert-file=./backend-1.example.com.pem \
--etcd-key-file=./backend-1.example.com-key.pem \
--etcd-client-urls='https://10.0.0.1:2379 https://10.0.0.2:2379 https://10.0.0.3:2379' \
--no-embed-etcd

NOTE: The etcd and sensu-backend certificates must share a CA, and the etcd-client-urls value must be a space-delimited list or a YAML array.

Authenticate with username and password for external etcd

Managed database services (database-as-a-service, or DBaaS) often support external etcd authentication via username and password rather than client certificates.

To use username and password authentication to connect to external etcd, add the SENSU_BACKEND_ETCD_CLIENT_USERNAME and SENSU_BACKEND_ETCD_CLIENT_PASSWORD environment variables to the environment file. Replace <your_username> and <your_password> with the username and password you use for your external etcd provider:

SENSU_BACKEND_ETCD_CLIENT_USERNAME=<your_username>
SENSU_BACKEND_ETCD_CLIENT_PASSWORD=<your_password>

Read Configuration via environment variables to learn how to create and save environment variables.

The SENSU_BACKEND_ETCD_CLIENT_USERNAME and SENSU_BACKEND_ETCD_CLIENT_PASSWORD environment variables do not have corresponding configuration flags. To use username/passsword authentication for external etcd, you must configure these environment variables in the environment file.

Migrate from embedded etcd to external etcd

To migrate from embedded etcd to external etcd, first decide whether you need to migrate all of your etcd data or just your Sensu configurations.

If you need to migrate all etcd data, you must create an etcd snapshot. Use the snapshot to restore your entire cluster after setting up the new external cluster.

If you need to migrate only your Sensu configuration, you can use sensuctl dump to create a backup and use sensuctl create to import your configuration to the new external cluster.

NOTE: The sensuctl dump command does not export user passwords, and sensuctl create does not restore API keys from a sensuctl dump backup. For this reason, you must use the etcd snapshot and restore process to migrate your entire embedded cluster to external etcd.

After you create the backups you need, follow Use an external etcd cluster to configure Sensu to use the external cluster as your datastore.

Troubleshoot clusters

Failure modes

Read the etcd failure modes documentation for information about cluster failure modes.

Disaster recovery

Read Restore your Sensu configuration for disaster recovery for instructions for recovering a Sensu cluster as well as best practices for backups.

Redeploy a cluster

To redeploy a cluster due to an issue like loss of quorum among cluster members, etcd corruption, or hardware failure, read Remove and redeploy a cluster.