Checks

Checks work with Sensu agents to produce monitoring events automatically. You can use checks to monitor server resources, services, and application health as well as collect and analyze metrics. Read the guide to monitoring server resources to get started. You can discover, download, and share Sensu check assets using Bonsai, the Sensu asset index.

Check commands

Each Sensu check definition specifies a command and the schedule at which it should be executed. Check commands are executable commands that are executed by a Sensu agent.

A command may include command line arguments for controlling the behavior of the command executable. Many common checks are available as assets from Bonsai and support command line arguments so different check definitions can use the same executable.

Sensu advises against requiring root privileges to execute check commands or scripts. The Sensu user is not permitted to kill timed out processes invoked by the root user, which could result in zombie processes.

How and where are check commands executed?

All check commands are executed by Sensu agents as the sensu user. Commands must be executable files that are discoverable on the Sensu agent system (for example: installed in a system $PATH directory).

Check result specification

Although Sensu agents attempt to execute any command defined for a check, successful processing of check results requires adherence to a simple specification.

  • Result data is output to STDOUT or STDERR
    • For service checks, this output is typically a human-readable message.
    • For metric checks, this output contains the measurements gathered by the check.
  • Exit status code indicates state
    • 0 indicates “OK”
    • 1 indicates “WARNING”
    • 2 indicates “CRITICAL”
    • Exit status codes other than 0, 1, or 2 indicate an “UNKNOWN” or custom status

PRO TIP: Those familiar with the Nagios monitoring system may recognize this specification, as it is the same one used by Nagios plugins. As a result, Nagios plugins can be used with Sensu without any modification.

At every execution of a check command – regardless of success or failure – the Sensu agent publishes the check’s result for eventual handling by the event processor (the Sensu backend).

Check scheduling

Checks are scheduled by the Sensu backend, which publishes check execution requests to entities via a publish-subscribe model.

Subscriptions

Checks have a defined set of subscriptions, transport topics to which the Sensu backend publishes check requests. Sensu entities become subscribers to these topics (called subscriptions) via their individual subscriptions attribute. In practice, subscriptions typically correspond to a specific role or responsibility (for example: a webserver or database).

Subscriptions are powerful primitives in the monitoring context because they allow you to effectively monitor for specific behaviors or characteristics corresponding to the function being provided by a particular system. For example, disk capacity thresholds might be more important (or at least different) on a database server as opposed to a webserver; conversely, CPU or memory usage thresholds might be more important on a caching system than on a file server. Subscriptions also allow you to configure check requests for an entire group or subgroup of systems rather than requiring a traditional one-to-one mapping.

To configure subscriptions for a check, use the subscriptions attribute to specify an array of one or more subscription names. Sensu schedules checks once per interval for each agent with a matching subscription. For example, if we have three agents configured with the system subscription, a check configured with the system subscription results in three monitoring events per interval: one check execution per agent per interval. In order for Sensu to execute a check, the check definition must include a subscription that matches the subscription of at least one Sensu agent.

Round-robin checks

By default, Sensu schedules checks once per interval for each agent with a matching subscription: one check execution per agent per interval. Sensu also supports deduplicated check execution when configured with the round_robin check attribute. For checks with round_robin set to true, Sensu executes the check once per interval, cycling through the available agents alphabetically according to agent name.

For example, for three agents configured with the system subscription (agents A, B, and C), a check configured with the system subscription and round_robin set to true results in one monitoring event per interval, with the agent creating the event following the pattern A -> B -> C -> A -> B -> C for the first six intervals.

Round robin check diagram

In the diagram above, the standard check is executed by agents A, B, and C every 60 seconds, while the round-robin check cycles through the available agents, resulting in each agent executing the check every 180 seconds.

To use check ttl and round_robin together, your check configuration must also specify a proxy_entity_name. If you do not specify a proxy_entity_name when using check ttl and round_robin together, your check will stop executing.

PRO TIP: Use round robin to distribute check execution workload across multiple agents when using proxy checks.

Scheduling

You can schedule checks using the interval, cron, and publish attributes. Sensu requires that checks include either an interval attribute (interval scheduling) or a cron attribute (cron scheduling).

Interval scheduling

You can schedule a check to be executed at regular intervals using the interval and publish check attributes. For example, to schedule a check to execute every 60 seconds, set the interval attribute to 60 and the publish attribute to true.

NOTE: When creating an interval check, Sensu calculates an initial offset to splay the check’s first scheduled request. This helps to balance the load of both the backend and the agent, and may result in a delay before initial check execution.

Example interval check

type: CheckConfig
api_version: core/v2
metadata:
  name: interval_check
  namespace: default
spec:
  command: check-cpu.sh -w 75 -c 90
  handlers:
  - slack
  interval: 60
  publish: true
  subscriptions:
  - system
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "name": "interval_check",
    "namespace": "default"
  },
  "spec": {
    "command": "check-cpu.sh -w 75 -c 90",
    "subscriptions": ["system"],
    "handlers": ["slack"],
    "interval": 60,
    "publish": true
  }
}

Cron scheduling

You can also schedule checks using cron syntax. For example, to schedule a check to execute once a minute at the start of the minute, set the cron attribute to * * * * * and the publish attribute to true.

Example cron check

type: CheckConfig
api_version: core/v2
metadata:
  name: cron_check
  namespace: default
spec:
  command: check-cpu.sh -w 75 -c 90
  cron: '* * * * *'
  handlers:
  - slack
  publish: true
  subscriptions:
  - system
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "name": "cron_check",
    "namespace": "default"
  },
  "spec": {
    "command": "check-cpu.sh -w 75 -c 90",
    "subscriptions": ["system"],
    "handlers": ["slack"],
    "cron": "* * * * *",
    "publish": true
  }
}

Ad-hoc scheduling

In addition to automatic execution, you can create checks to be scheduled manually using the checks API. To create a check with ad-hoc scheduling, set the publish attribute to false in addition to an interval or cron schedule.

Example ad-hoc check

type: CheckConfig
api_version: core/v2
metadata:
  name: ad_hoc_check
  namespace: default
spec:
  command: check-cpu.sh -w 75 -c 90
  handlers:
  - slack
  interval: 60
  publish: false
  subscriptions:
  - system
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "name": "ad_hoc_check",
    "namespace": "default"
  },
  "spec": {
    "command": "check-cpu.sh -w 75 -c 90",
    "subscriptions": ["system"],
    "handlers": ["slack"],
    "interval": 60,
    "publish": false
  }
}

Proxy checks

Sensu supports running proxy checks where the results are considered to be for an entity that isn’t actually the one executing the check, regardless of whether that entity is a Sensu agent entity or a proxy entity. Proxy entities allow Sensu to monitor external resources on systems or devices where a Sensu agent cannot be installed, like a network switch or a website. You can create a proxy check using the proxy_entity_name attribute or the proxy_requests attributes.

Using a proxy check to monitor a proxy entity

When executing checks that include a proxy_entity_name, Sensu agents report the resulting events under the specified proxy entity instead of the agent entity. If the proxy entity doesn’t exist, Sensu creates the proxy entity when the event is received by the backend. To avoid duplicate events, we recommend using the round_robin attribute with proxy checks.

Example proxy check using a proxy_entity_name

The following proxy check runs every 60 seconds, cycling through the agents with the proxy subscription alphabetically according to the agent name, for the proxy entity sensu-site.

type: CheckConfig
api_version: core/v2
metadata:
  name: proxy_check
  namespace: default
spec:
  command: http_check.sh https://sensu.io
  handlers:
  - slack
  interval: 60
  proxy_entity_name: sensu-site
  publish: true
  round_robin: true
  subscriptions:
  - proxy
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "name": "proxy_check",
    "namespace": "default"
  },
  "spec": {
    "command": "http_check.sh https://sensu.io",
    "subscriptions": ["proxy"],
    "handlers": ["slack"],
    "interval": 60,
    "publish": true,
    "round_robin": true,
    "proxy_entity_name": "sensu-site"
  }
}

Using a proxy check to monitor multiple proxy entities

The proxy_requests check attributes allow Sensu to run a check for each entity that matches the definitions specified in the entity_attributes, resulting in monitoring events that represents each matching proxy entity. The entity attributes must match exactly as stated; no variables or directives have any special meaning, but you can still use Sensu query expressions to perform more complicated filtering on the available value, such as finding entities with particular subscriptions.

The proxy_requests attributes are a great way to monitor multiple entities using a single check definition when combined with token substitution. Since checks including proxy_requests attributes need to be executed for each matching entity, we recommend using the round_robin attribute to distribute the check execution workload evenly across your Sensu agents.

Example proxy check using proxy_requests

The following proxy check runs every 60 seconds, cycling through the agents with the proxy subscription alphabetically according to the agent name, for all existing proxy entities with the custom label proxy_type set to website.

This check uses token substitution to import the value of the custom entity label url to complete the check command. See the entity reference for information about using custom labels.

type: CheckConfig
api_version: core/v2
metadata:
  name: proxy_check_proxy_requests
  namespace: default
spec:
  command: http_check.sh {{ .labels.url }}
  handlers:
  - slack
  interval: 60
  proxy_requests:
    entity_attributes:
    - entity.labels.proxy_type == 'website'
  publish: true
  round_robin: true
  subscriptions:
  - proxy
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "name": "proxy_check_proxy_requests",
    "namespace": "default"
  },
  "spec": {
    "command": "http_check.sh {{ .labels.url }}",
    "subscriptions": ["proxy"],
    "handlers": ["slack"],
    "interval": 60,
    "publish": true,
    "proxy_requests": {
      "entity_attributes": [
        "entity.labels.proxy_type == 'website'"
      ]
    },
    "round_robin": true
  }
}

Fine-tuning proxy check scheduling with splay

Sensu supports distributing proxy check executions across an interval using the splay and splay_coverage attributes. For example, if we assume that the proxy_check_proxy_requests check in the example above matches three proxy entities, we’d expect to see a burst of three events every 60 seconds. If we add the splay attribute (set to true) and the splay_coverage attribute (set to 90) to the proxy_requests scope, Sensu distributes the three check executions over 90% of the 60-second interval, resulting in three events splayed evenly across a 54-second period.

Check token substitution

Sensu check definitions may include attributes that you may wish to override on an entity-by-entity basis. For example, check commands – which may include command line arguments for controlling the behavior of the check command – may benefit from entity-specific thresholds, etc. Sensu check tokens are check definition placeholders that will be replaced by the Sensu agent with the corresponding entity definition attributes values (including custom attributes).

Learn how to use check tokens with the Sensu tokens reference documentation.

NOTE: Check tokens are processed before check execution, therefore token substitutions will not apply to check data delivered via the local agent socket input.

Check hooks

Check hooks are commands run by the Sensu agent in response to the result of check command execution. The Sensu agent will execute the appropriate configured hook command, depending on the check execution status (ex: 0, 1, 2).

Learn how to use check hooks with the Sensu hooks reference documentation.

Check specification

Top-level attributes

type
description Top-level attribute specifying the sensuctl create resource type. Checks should always be of type CheckConfig.
required Required for check definitions in wrapped-json or yaml format for use with sensuctl create.
type String
example
"type": "CheckConfig"
api_version
description Top-level attribute specifying the Sensu API group and version. For checks in Sensu backend version 5.4, this attribute should always be core/v2.
required Required for check definitions in wrapped-json or yaml format for use with sensuctl create.
type String
example
"api_version": "core/v2"
metadata
description Top-level collection of metadata about the check, including the name and namespace as well as custom labels and annotations. The metadata map is always at the top level of the check definition. This means that in wrapped-json and yaml formats, the metadata scope occurs outside the spec scope. See the metadata attributes reference for details.
required Required for check definitions in wrapped-json or yaml format for use with sensuctl create.
type Map of key-value pairs
example
"metadata": {
  "name": "collect-metrics",
  "namespace": "default",
  "labels": {
    "region": "us-west-1"
  },
  "annotations": {
    "slack-channel" : "#monitoring"
  }
}
spec
description Top-level map that includes the check spec attributes.
required Required for check definitions in wrapped-json or yaml format for use with sensuctl create.
type Map of key-value pairs
example
"spec": {
  "command": "/etc/sensu/plugins/check-chef-client.go",
  "interval": 10,
  "publish": true,
  "subscriptions": [
    "production"
  ]
}

Spec attributes

command
description The check command to be executed.
required true
type String
example
"command": "/etc/sensu/plugins/check-chef-client.go"
subscriptions
description An array of Sensu entity subscriptions that check requests will be sent to. The array cannot be empty and its items must each be a string.
required true
type Array
example
"subscriptions": ["production"]
handlers
description An array of Sensu event handlers (names) to use for events created by the check. Each array item must be a string.
required false
type Array
example
"handlers": ["pagerduty", "email"]
interval
description How often the check is executed, in seconds
required true (unless cron is configured)
type Integer
example
"interval": 60
cron
description When the check should be executed, using cron syntax or these predefined schedules.
required true (unless interval is configured)
type String
example
"cron": "0 0 * * *"
publish
description If check requests are published for the check.
required false
default false
type Boolean
example
"publish": false
timeout
description The check execution duration timeout in seconds (hard stop).
required false
type Integer
example
"timeout": 30

ttl
description The time to live (TTL) in seconds until check results are considered stale. If an agent stops publishing results for the check, and the TTL expires, an event will be created for the agent’s entity.

The check ttl must be greater than the check interval and should allow enough time for the check execution and result processing to complete. For example, for a check that has an interval of 60 (seconds) and a timeout of 30 (seconds), the appropriate ttl is at least 90 (seconds).

To use check ttl and round_robin together, your check configuration must also specify a proxy_entity_name. If you do not specify a proxy_entity_name when using check ttl and round_robin together, your check will stop executing. NOTE: Adding TTLs to checks adds overhead, so use the ttl attribute sparingly.
required false
type Integer
example
"ttl": 100
stdin
description If the Sensu agent writes JSON serialized Sensu entity and check data to the command process’ STDIN. The command must expect the JSON data via STDIN, read it, and close STDIN. This attribute cannot be used with existing Sensu check plugins, nor Nagios plugins etc, as Sensu agent will wait indefinitely for the check process to read and close STDIN.
required false
type Boolean
default false
example
"stdin": true
low_flap_threshold
description The flap detection low threshold (% state change) for the check. Sensu uses the same flap detection algorithm as Nagios.
required false
type Integer
example
"low_flap_threshold": 20
high_flap_threshold
description The flap detection high threshold (% state change) for the check. Sensu uses the same flap detection algorithm as Nagios.
required true (if low_flap_threshold is configured)
type Integer
example
"high_flap_threshold": 60
runtime_assets
description An array of Sensu assets (names), required at runtime for the execution of the command
required false
type Array
example
"runtime_assets": ["ruby-2.5.0"]

check_hooks
description An array of check response types with respective arrays of Sensu hook names. Sensu hooks are commands run by the Sensu agent in response to the result of the check command execution. Hooks are executed, in order of precedence, based on their severity type: 1 to 255, ok, warning, critical, unknown, and finally non-zero.
required false
type Array
example
"check_hooks": [
  {
    "0": [
      "passing-hook","always-run-this-hook"
    ]
  },
  {
    "critical": [
      "failing-hook","collect-diagnostics","always-run-this-hook"
    ]
  }
]

proxy_entity_name
description The entity name, used to create a proxy entity for an external resource (i.e., a network switch).
required false
type String
validated \A[\w\.\-]+\z
example
"proxy_entity_name": "switch-dc-01"

proxy_requests
description Sensu proxy request attributes allow you to assign the check to run for multiple entities according to their entity_attributes. In the example below, the check executes for all entities with entity class proxy and the custom proxy type label website. Proxy requests are a great way to reuse check definitions for a group of entities. For more information, see the proxy requests specification and the guide to monitoring external resources.
required false
type Hash
example
"proxy_requests": {
  "entity_attributes": [
    "entity.entity_class == 'proxy'",
    "entity.labels.proxy_type == 'website'"
  ],
  "splay": true,
  "splay_coverage": 90
}
silenced
description The silences that apply to this check.
type Array
example
"silenced": ["*:routers"]
env_vars
description An array of environment variables to use with command execution. NOTE: To add env_vars to a check, use sensuctl create.
required false
type Array
example
"env_vars": ["RUBY_VERSION=2.5.0", "CHECK_HOST=my.host.internal"]
output_metric_format
description The metric format generated by the check command. Sensu supports the following metric formats:
nagios_perfdata (Nagios Performance Data)
graphite_plaintext (Graphite Plaintext Protocol)
influxdb_line (InfluxDB Line Protocol)
opentsdb_line (OpenTSDB Data Specification)

When a check includes an output_metric_format, Sensu will extract the metrics from the check output and add them to the event data in Sensu metric format. For more information about extracting metrics using Sensu, see the guide.
required false
type String
example
"output_metric_format": "graphite_plaintext"
output_metric_handlers
description An array of Sensu handlers to use for events created by the check. Each array item must be a string. output_metric_handlers should be used in place of the handlers attribute if output_metric_format is configured. Metric handlers must be able to process Sensu metric format. For an example, see the Sensu InfluxDB handler.
required false
type Array
example
"output_metric_handlers": ["influx-db"]

|round_robin | | ————-|—— description | When set to true, Sensu executes the check once per interval, cycling through each subscribing agent in turn. See round-robin checks for more information.

Use the round_robin attribute with proxy checks to avoid duplicate events and distribute proxy check executions evenly across multiple agents. See proxy checks for more information.

To use check ttl and round_robin together, your check configuration must also specify a proxy_entity_name. If you do not specify a proxy_entity_name when using check ttl and round_robin together, your check will stop executing. required | false type | Boolean example |

"round_robin": true

subdue
description Check subdues are not yet implemented in Sensu Go. Although the subdue attribute appears in check definitions by default, it is a placeholder and should not be modified.
example
"subdue": null

Metadata attributes

name
description A unique string used to identify the check. Check names cannot contain special characters or spaces (validated with Go regex \A[\w\.\-]+\z). Each check must have a unique name within its namespace.
required true
type String
example
"name": "check-cpu"
namespace
description The Sensu RBAC namespace that this check belongs to.
required false
type String
default default
example
"namespace": "production"
labels
description Custom attributes to include with event data, which can be queried like regular attributes. You can use labels to organize checks into meaningful collections that can be selected using filters and tokens.
required false
type Map of key-value pairs. Keys can contain only letters, numbers, and underscores, but must start with a letter. Values can be any valid UTF-8 string.
default null
example
"labels": {
  "environment": "development",
  "region": "us-west-2"
}
annotations
description Arbitrary, non-identifying metadata to include with event data. In contrast to labels, annotations are not used internally by Sensu and cannot be used to identify checks. You can use annotations to add data that helps people or external tools interacting with Sensu.
required false
type Map of key-value pairs. Keys and values can be any valid UTF-8 string.
default null
example
 "annotations": {
  "managed-by": "ops",
  "slack-channel": "#monitoring",
  "playbook": "www.example.url"
}

Proxy requests attributes

entity_attributes
description Sensu entity attributes to match entities in the registry, using Sensu query expressions
required false
type Array
example
"entity_attributes": [
  "entity.entity_class == 'proxy'",
  "entity.labels.proxy_type == 'website'"
]
splay
description If proxy check requests should be splayed, published evenly over a window of time, determined by the check interval and a configurable splay coverage percentage. For example, if a check has an interval of 60 seconds and a configured splay coverage of 90%, its proxy check requests would be splayed evenly over a time window of 60 seconds * 90%, 54 seconds, leaving 6s for the last proxy check execution before the the next round of proxy check requests for the same check.
required false
type Boolean
default false
example
"splay": true
splay_coverage
description The percentage of the check interval over which Sensu can execute the check for all applicable entities, as defined in the entity attributes. Sensu uses the splay coverage attribute to determine the amount of time check requests can be published over (before the next check interval).
required required if splay attribute is set to true
type Integer
example
"splay_coverage": 90

Check output truncation attributes

max_output_size
description Maximum size, in bytes, of stored check outputs. When this attribute is set to a non-zero value, the Sensu backend truncates check outputs larger than this value before storing to etcd. max_output_size does not affect data sent to Sensu filters, mutators, and handlers.
required false
type Integer
example
"max_output_size": 1024
discard_output
description Discard check output after extracting metrics. No check output will be sent to the Sensu backend.
required false
type Boolean
example
"discard_output": true

Examples

NOTE: The attribute interval is not required if a valid cron schedule is defined.

type: CheckConfig
api_version: core/v2
metadata:
  name: check_minimum
  namespace: default
spec:
  command: collect.sh
  handlers:
  - slack
  interval: 10
  publish: true
  subscriptions:
  - system
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "namespace": "default",
    "name": "check_minimum"
  },
  "spec": {
    "command": "collect.sh",
    "subscriptions": [
      "system"
    ],
    "handlers": [
      "slack"
    ],
    "interval": 10,
    "publish": true
  }
}

Metric check

type: CheckConfig
api_version: core/v2
metadata:
  annotations:
    slack-channel: '#monitoring'
  labels:
    region: us-west-1
  name: collect-metrics
  namespace: default
spec:
  check_hooks: null
  command: collect.sh
  discard_output: true
  env_vars: null
  handlers: []
  high_flap_threshold: 0
  interval: 10
  low_flap_threshold: 0
  output_metric_format: graphite_plaintext
  output_metric_handlers:
  - influx-db
  proxy_entity_name: ""
  publish: true
  round_robin: false
  runtime_assets: null
  stdin: false
  subscriptions:
  - system
  timeout: 0
  ttl: 0
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "name": "collect-metrics",
    "namespace": "default",
    "labels": {
      "region": "us-west-1"
    },
    "annotations": {
      "slack-channel" : "#monitoring"
    }
  },
  "spec": {
    "command": "collect.sh",
    "handlers": [],
    "high_flap_threshold": 0,
    "interval": 10,
    "low_flap_threshold": 0,
    "publish": true,
    "runtime_assets": null,
    "subscriptions": [
      "system"
    ],
    "proxy_entity_name": "",
    "check_hooks": null,
    "stdin": false,
    "ttl": 0,
    "timeout": 0,
    "round_robin": false,
    "output_metric_format": "graphite_plaintext",
    "output_metric_handlers": [
      "influx-db"
    ],
    "env_vars": null,
    "discard_output": true
  }
}