Tune Sensu

This page describes tuning options that may help restore proper operation if you experience performance issues with your Sensu installation.

NOTE: Before you tune your Sensu installation, read Troubleshoot Sensu, Hardware requirements, and Deployment architecture for Sensu. These pages describe common problems and solutions, planning and optimization considerations, and other recommendations that may resolve your issue without tuning adjustments.

Latency tolerances for etcd

If you use embedded etcd for storage, you might notice high network or storage latency.

To make etcd more latency-tolerant, increase the values for the etcd election timeout and etcd heartbeat interval backend configuration options. For example, you might increase etcd-election-timeout from 3000 to 5000 and etcd-heartbeat-interval from 300 to 500.

Read the etcd tuning documentation for etcd-specific tuning best practices.

Advanced backend configuration options for etcd

The backend reference describes other advanced configuration options in addition to etcd election timeout and heartbeat interval.

Adjust these values with caution. Improper adjustment can increase memory and CPU usage or result in a non-functioning Sensu instance.

Input/output operations per second (IOPS)

The speed with which write operations can be completed is important to Sensu cluster performance and health. Make sure to provision Sensu backend infrastructure to provide sustained input/output operations per second (IOPS) appropriate for the rate of observability events the system will be required to process.

Read Backend recommended configuration and Hardware sizing for details.

PostgreSQL settings

The datastore reference lists the PostgreSQL configuration parameters and settings we recommend as a starting point for your postgresql.conf file. Adjust the parameters and settings as needed based on your hardware and performance observations.

Read the PostgreSQL parameters documentation for information about setting parameters.

Agent reconnection rate

COMMERCIAL FEATURE: Access the agent-rate-limit backend configuration option in the packaged Sensu Go distribution. For more information, read Get started with commercial features.

It may take several minutes for all agents to reconnect after a sensu-backend restart, especially if you have a large number of agents. The agent reconnection rate depends on deployment variables like the number of CPUs, disk space, network speeds, whether you’re using a load balancer, and even physical distance between agents and backends.

Although many variables affect the agent reconnection rate, a reasonable estimate is approximately 100 agents per backend per second. If you observe slower agent reconnection rates in your Sensu deployment, consider using the agent-rate-limit backend configuration option.

The agent-rate-limit backend configuration option allows you to set the maximum number of agent transport WebSocket connections per second, per backend. Set the agent-rate-limit to 100 to improve agent reconnection rate and reduce the time required for all of your agents to reconnect after a backend restart.

Splay and proxy check scheduling

Adjust the splay and splay_coverage check attributes to tune proxy check executions across an interval. Read Fine-tune proxy check scheduling with splay for an example.

Tokens and resource re-use

Tokens are placeholders in a check, hook, or dynamic runtime asset definition that the agent replaces with entity information before execution. You can use tokens to fine-tune check, hook, and asset attributes on a per-entity level while reusing resource definitions.

Read the tokens reference for token syntax and examples.

Occurrences and alert fatigue

Use the occurrences and occurrences_watermark event attributes in event filters to tune incident notifications and reduce alert fatigue.