Scaling Sensu - Overview

In this article we’ll provide brief overviews of the various ways that you can scale your Sensu deployment, from scaling individual components, to scaling across regions.

Scaling Sensu at a Single Site
Scaling Sensu Across Multiple Sites

Sensu Components

A typical Sensu deployment consists of four pieces:

Sensu Server
Sensu API
Redis (data store)
RabbitMQ (message bus)

There can be variation when it comes to the message bus and data store components, but using Redis as the data store and RabbitMQ as the message bus is the most common (and supported) way of deploying those components.

Sensu Server

The sensu-server process is the workhorse of any deployment. It performs a number of tasks including check scheduling and publishing, monitoring clients via keepalives, and event processing. To scale this component, add the desired number of Sensu servers and point them at your RabbitMQ instance where they’ll do their own internal leader election.

Sensu API

The sensu-api component is a stateless HTTP frontend. It can be scaled with traditional HTTP load-balancing strategies (HAproxy, Nginx, etc.). Configure each additional API instance to point to your Redis instance, and add the API instance to your load balancing pool.

Redis

Redis can be scaled out in several different ways. Using Redis Sentinel is the primarily supported way of scaling Redis. You can read more about installing and configuring Sentinel in our Redis reference documentation.

RabbitMQ

RabbitMQ can be used in a clustered configuration for Sensu. You can read more about configuring RabbitMQ clusters in our RabbitMQ reference documentation.

Scaling Sensu at a Single Site

Each Sensu component can be scaled independently at a single site, whether you need to ensure that Redis is highly available or you need to scale out the number of consumers (sensu-server instances) to keep your RabbitMQ queue depth to manageable levels. We’ll put all of these elements together in the next guide.

Scaling Sensu Across Multiple Sites

Every distributed system, Sensu included, must take into account special considerations when scaling across multiple sites (datacenters) where the networking (WAN) will be unreliable.

For the purpose of this documentation each site will be referred to as a “datacenter”.

Strategy 1: Isolated Clusters Aggregated by Uchiwa

This strategy involves building isolated, independent Sensu server/clusters at each datacenter, and then using Uchiwa’s multi-datacenter configuration option to get an aggregate view of the events and clients.

Pros

WAN instability does not lead to flapping Sensu checks
Sensu operation continues un-interrupted during a WAN outage
The overall architecture is easier to understand and troubleshoot

Cons

WAN outages mean a whole datacenter can go dark and not set off alerts (cross-datacenter checks are therefore essential)
WAN instability can lead to a lack of visibility as Uchiwa may not be able to connect to the remote Sensu APIs
Requires all the Sensu infrastructure in every datacenter

Strategy 2: Centralized Sensu and Distributed RabbitMQ

Sensu clients only need to connect to a RabbitMQ server to submit events. One scaling strategy is to centralize the Sensu infrastructure in one location, and have remote sites only have a remote RabbitMQ broker, which in turn forwards events to the central cluster.

This is done either by the RabbitMQ Federation plugin or via the Shovel plugin. (See a comparison here)

NOTE: This is picking availability and partition tolerance over consistency with RabbitMQ.

Pros

Fewer infrastructure components necessary at remote datacenters
All Sensu server alerts originate from a single source

Cons

WAN instability can result in floods of client keepalive alerts. (The Sensu Enterprise check dependencies filter can help with this.)
Increased RabbitMQ configuration complexity
All clients appear to be in the same datacenter in Uchiwa

Strategy 3: Centralized Sensu and Directly Connected Clients

All Sensu clients execute checks locally. Their only interaction with Sensu servers is to push events onto RabbitMQ. Therefore, remote clients can connect directly to a remote RabbitMQ broker over the WAN.

Pros

Very simple architecture, no additional infrastructure needed at remote sites
Centralized alert handling

Cons

Keepalive failures are now indistinguishable from WAN instability
Lots of remote clients means lots of TCP connections over the WAN
All clients appear to be in the same datacenter in Uchiwa

Scaling Sensu - Overview

Sensu Components

Sensu Server

Sensu API

Redis

RabbitMQ

Scaling Sensu at a Single Site

Scaling Sensu Across Multiple Sites

Strategy 1: Isolated Clusters Aggregated by Uchiwa

Pros

Cons

Strategy 2: Centralized Sensu and Distributed RabbitMQ

Pros

Cons

Strategy 3: Centralized Sensu and Directly Connected Clients

Pros

Cons

Register your Sensu installation — Get rewards!

About Sensu

Navigation

Resources

Community

Get In Touch

Join Our Community