Intro to Checks
The purpose of this guide is to help Sensu users create monitoring checks. At the conclusion of this guide, you - the user - should have several Sensu checks in place to monitor and measure machine resources, applications, and services. Each Sensu monitoring check in this guide demonstrates one or more check definition features, for more information please refer to the Sensu checks reference documentation.
Objectives
What will be covered in this guide:
- Creation of standard checks (functional tests)
- Creation of metric collection checks (server resources, etc)
- Creation of metric analysis checks (querying time series data, etc)
What are Sensu checks?
Sensu checks allow you to monitor server resources, services, and application
health, as well as collect & analyze metrics; they are executed on servers
running the Sensu client. Checks are essentially commands (or scripts) that
output data to STDOUT
or STDERR
and produce an exit status code to indicate
a state. The common exit status codes used are 0
for OK
, 1
for WARNING
,
2
for CRITICAL
, and 3
or greater to indicate UNKNOWN
or CUSTOM
.
Sensu checks use the same specification as Nagios, therefore, Nagios check
plugins may be used with Sensu.
Create a standard check
Standard Sensu checks are used to determine the health of server resources,
services and applications. A standard check will query a resource for
information to determine its state. Once a standard check has determined the
resource state, it outputs a human readable message, and exits with the
appropriate exit status code to indicate its state/severity (OK
, WARNING
,
etc.).
Monitor the cron service
The following instructions install the check dependencies and configure the Sensu check definition in order to monitor the Cron service.
Install dependencies
The check-process.rb
script provided by the Sensu Process Checks Plugin
can reliably detect if a service such as Cron is running or not. The following
instructions will install the Sensu Process Checks Plugin (version
0.0.6) using Sensu’s embedded Ruby, providing the check-process.rb
script.
sudo sensu-install -p process-checks:0.0.6
Create the check definition for Cron
The following is an example Sensu check definition, a JSON configuration file
located at /etc/sensu/conf.d/check_cron.json
. This check definition uses the
check-process.rb
script (installed above) to determine if the Cron
service is running. The check is named cron
and it runs check-process.rb -p cron
on Sensu clients with the production
subscription, every 60
seconds
(interval).
NOTE: Sensu services must be restarted in order to pick up configuration changes. Sensu Enterprise can be reloaded.
{
"checks": {
"cron": {
"command": "check-process.rb -p cron",
"subscribers": [
"production"
],
"interval": 60
}
}
}
For a full listing of the check-process.rb
command line arguments, run
/opt/sensu/embedded/bin/check-process.rb -h.
Currently, the Cron check definition requires that check requests be sent to
Sensu clients with the production
subscription. This is known as pubsub
check. Optionally, a check may use standalone
mode, which allows clients to
schedule their own check executions. The following is an example of the Cron
check using standalone
mode (true
). The Cron check will now be executed
every 60
seconds on each Sensu client with the check definition. A Sensu check
definition with "standalone": true
does not need to specify subscribers
.
{
"checks": {
"cron": {
"command": "check-process.rb -p cron",
"standalone": true,
"interval": 60
}
}
}
By default, Sensu checks use the default
Sensu event handler for events they
create. To specify a different Sensu event handler for a check, use the
handler
attribute. The debug
event handler used in this example will log the
Sensu event data to the Sensu server (or Sensu Enterprise) log.
{
"checks": {
"cron": {
"command": "check-process.rb -p cron",
"standalone": true,
"interval": 60,
"handler": "debug"
}
}
}
Using multiple handlers
To specify multiple Sensu event handlers, use the handlers
attribute (plural).
NOTE: if both handler
and handlers
(plural) check definition attributes are
used, handlers
will take precedence.
{
"checks": {
"cron": {
"command": "check-process.rb -p cron",
"standalone": true,
"interval": 60,
"handlers": ["default", "debug"]
}
}
}
Create a metric collection check
Metric collection checks are used to collect measurements from server resources, services, and applications. Metric collection checks can output metric data in a variety of metric formats:
Measuring CPU utilization
Install dependencies
The metrics-cpu.rb
script provided by the Sensu CPU Checks Plugin
collects and outputs CPU metrics in the Graphite plaintext format. The following
instructions will install the Sensu CPU Checks Plugin (version 0.0.3) using
Sensu’s embedded Ruby, providing the metrics-cpu.rb
script.
sudo sensu-install -p cpu-checks:0.0.3
Create the check definition for CPU metrics
The following is an example Sensu check definition, a JSON configuration file
located at /etc/sensu/conf.d/cpu_metrics.json
. This check definition uses the
metrics-cpu.rb
script (installed above) to collect CPU metrics and output
them in the Graphite plaintext format.
By default, Sensu checks with an exit status code of 0
(for OK
) do not
create events unless they indicate a change in state from a non-zero status to a
zero status (i.e. resulting in a resolve
action; see: Sensu Events).
Metric collection checks will output metric data regardless of the check exit
status code, however, they usually exit 0
. To ensure events are always created
for a metric collection check, the check type
of metric
is used.
The check is named cpu_metrics
, and it runs metrics-cpu.rb
on Sensu clients
with the production
subscription, every 10
seconds (interval). The debug
handler is used to log the graphite plaintext CPU metrics to the Sensu server
(or Sensu Enterprise) log.
NOTE: Sensu services must be restarted in order to pick up configuration changes. Sensu Enterprise can be reloaded.
{
"checks": {
"cpu_metrics": {
"type": "metric",
"command": "metrics-cpu.rb",
"subscribers": [
"production"
],
"interval": 10,
"handler": "debug"
}
}
}
For a full listing of the metrics-cpu.rb
command line arguments, run
/opt/sensu/embedded/bin/metrics-cpu.rb -h.
Create a metric analysis check
A metric analysis check analyzes metric data which may or may not have been collected by a metrics collection check. By querying external metric stores (e.g. Graphite) to perform data evaluations, metric analysis checks allow you to perform powerful analytics based on trends in metric data rather than a single data point. For example, where monitoring and alerting on a single CPU utilization data point can result in false positive events based on momentary spikes, monitoring and alerting on CPU utilization data over a specified period of time will improve alerting accuracy.
Because metric analysis checks require interaction with an external metric store, providing a functional example is outside of the scope of this guide. However, assuming the existence of a Graphite installation that is populated with metric data, the following example checks could be used.
NOTE: If you’ve not configured a Graphite instance for these example checks but would like to, you can head over to Graphite’s quick start guides and get a Graphite Vagrant box spun up fairly quickly, and continue with the examples below.
The following check uses the check-graphite-data.rb
script, provided by the
Sensu Graphite Plugin, to query the Graphite API at localhost:9001
. The
check queries Graphite for a calculated moving average (using the last 10 data
points) of the load balancer session count. The session count moving average is
compared with the provided alert thresholds. A Sensu client running on the
Graphite server would be responsible for scheduling and executing this check
(standalone
mode).
NOTE: Sensu services must be restarted in order to pick up configuration changes. Sensu Enterprise can be reloaded.
{
"checks": {
"session_count": {
"command": "check-graphite-data.rb -s localhost:9001 -t 'movingAverage(lb1.assets_backend.session_current,10)' -w 100 -c 200",
"standalone": true,
"interval": 30
}
}
}
The following check uses the check-graphite-data.rb
script, provided by the
Sensu Graphite Plugin, to query the Graphite API at localhost:9001
for
disk capacity metrics. The Graphite API query uses highestCurrent()
to grab
only the highest disk capacity metric, to be compared with the provided alert
thresholds. This check will trigger an event (alert) when one or more disks on
any machine are at the configured capacity threshold. In this example
configuration, the check is configured to warn at 85% capacity (-w 85
),
and to raise a critical alert at 95% capacity (-c 95
).
{
"checks": {
"disk_capacity": {
"command": "check-graphite-data.rb -s localhost:9001 -t 'highestCurrent(*.disk.*.capacity,1)' -w 85 -c 95 -a 120",
"standalone": true,
"interval": 30
}
}
}
The following instructions will install the Sensu Graphite Plugin (version
0.0.6) using Sensu’s embedded Ruby, providing the check-graphite-data.rb
script.
sudo sensu-install -p graphite:0.0.6
Checking on Other Clients
Sensu supports running checks where the results are considered to be for a client that isn’t actually the one executing the check- regardless of whether that client is a Sensu client or simply a proxy client. There are a number of reasons for this use case, but fundamentally, Sensu handles it the same.
Checks are scheduled normally, but by specifying a Proxy Request in your check, clients that match certain definitions (their client_attributes
) cause the check to run for each one. The attributes supplied must normally match exactly as stated- no variables or directives have any special meaning, but you can still use eval
to perform more complicated filtering with Ruby on the available value
, such as finding clients with particular subscriptions (given that we’re dealing with arrays):
{
"checks": {
"...": "...",
"proxy_requests": {
"client_attributes": {
"user_variable": "some_value",
"subscriptions": "eval: value.include?('a_subscription')"
}
}
}
}