Intro to Checks

The purpose of this guide is to help Sensu users create monitoring checks. At the conclusion of this guide, you - the user - should have several Sensu checks in place to monitor and measure machine resources, applications, and services. Each Sensu monitoring check in this guide demonstrates one or more check definition features, for more information please refer to the Sensu checks reference documentation.

Objectives

What will be covered in this guide:

  • Creation of standard checks (functional tests)
  • Creation of metric collection checks (server resources, etc)
  • Creation of metric analysis checks (querying time series data, etc)

What are Sensu checks?

Sensu checks allow you to monitor server resources, services, and application health, as well as collect & analyze metrics; they are executed on servers running the Sensu client. Checks are essentially commands (or scripts) that output data to STDOUT or STDERR and produce an exit status code to indicate a state. The common exit status codes used are 0 for OK, 1 for WARNING, 2 for CRITICAL, and 3 or greater to indicate UNKNOWN or CUSTOM. Sensu checks use the same specification as Nagios, therefore, Nagios check plugins may be used with Sensu.

Create a standard check

Standard Sensu checks are used to determine the health of server resources, services and applications. A standard check will query a resource for information to determine its state. Once a standard check has determined the resource state, it outputs a human readable message, and exits with the appropriate exit status code to indicate its state/severity (OK, WARNING, etc.).

Monitor the cron service

The following instructions install the check dependencies and configure the Sensu check definition in order to monitor the Cron service.

Install dependencies

The check-process.rb script provided by the Sensu Process Checks Plugin can reliably detect if a service such as Cron is running or not. The following instructions will install the Sensu Process Checks Plugin (version 0.0.6) using Sensu’s embedded Ruby, providing the check-process.rb script.

sudo sensu-install -p process-checks:0.0.6

Create the check definition for Cron

The following is an example Sensu check definition, a JSON configuration file located at /etc/sensu/conf.d/check_cron.json. This check definition uses the check-process.rb script (installed above) to determine if the Cron service is running. The check is named cron and it runs check-process.rb -p cron on Sensu clients with the production subscription, every 60 seconds (interval).

NOTE: Sensu services must be restarted in order to pick up configuration changes. Sensu Enterprise can be reloaded.

{
  "checks": {
    "cron": {
      "command": "check-process.rb -p cron",
      "subscribers": [
        "production"
      ],
      "interval": 60
    }
  }
}

For a full listing of the check-process.rb command line arguments, run /opt/sensu/embedded/bin/check-process.rb -h.

Currently, the Cron check definition requires that check requests be sent to Sensu clients with the production subscription. This is known as pubsub check. Optionally, a check may use standalone mode, which allows clients to schedule their own check executions. The following is an example of the Cron check using standalone mode (true). The Cron check will now be executed every 60 seconds on each Sensu client with the check definition. A Sensu check definition with "standalone": true does not need to specify subscribers.

{
  "checks": {
    "cron": {
      "command": "check-process.rb -p cron",
      "standalone": true,
      "interval": 60
    }
  }
}

By default, Sensu checks use the default Sensu event handler for events they create. To specify a different Sensu event handler for a check, use the handler attribute. The debug event handler used in this example will log the Sensu event data to the Sensu server (or Sensu Enterprise) log.

{
  "checks": {
    "cron": {
      "command": "check-process.rb -p cron",
      "standalone": true,
      "interval": 60,
      "handler": "debug"
    }
  }
}

Using multiple handlers

To specify multiple Sensu event handlers, use the handlers attribute (plural).

NOTE: if both handler and handlers (plural) check definition attributes are used, handlers will take precedence.

{
  "checks": {
    "cron": {
      "command": "check-process.rb -p cron",
      "standalone": true,
      "interval": 60,
      "handlers": ["default", "debug"]
    }
  }
}

Create a metric collection check

Metric collection checks are used to collect measurements from server resources, services, and applications. Metric collection checks can output metric data in a variety of metric formats:

Measuring CPU utilization

Install dependencies

The metrics-cpu.rb script provided by the Sensu CPU Checks Plugin collects and outputs CPU metrics in the Graphite plaintext format. The following instructions will install the Sensu CPU Checks Plugin (version 0.0.3) using Sensu’s embedded Ruby, providing the metrics-cpu.rb script.

sudo sensu-install -p cpu-checks:0.0.3

Create the check definition for CPU metrics

The following is an example Sensu check definition, a JSON configuration file located at /etc/sensu/conf.d/cpu_metrics.json. This check definition uses the metrics-cpu.rb script (installed above) to collect CPU metrics and output them in the Graphite plaintext format.

By default, Sensu checks with an exit status code of 0 (for OK) do not create events unless they indicate a change in state from a non-zero status to a zero status (i.e. resulting in a resolve action; see: Sensu Events). Metric collection checks will output metric data regardless of the check exit status code, however, they usually exit 0. To ensure events are always created for a metric collection check, the check type of metric is used.

The check is named cpu_metrics, and it runs metrics-cpu.rb on Sensu clients with the production subscription, every 10 seconds (interval). The debug handler is used to log the graphite plaintext CPU metrics to the Sensu server (or Sensu Enterprise) log.

NOTE: Sensu services must be restarted in order to pick up configuration changes. Sensu Enterprise can be reloaded.

{
  "checks": {
    "cpu_metrics": {
      "type": "metric",
      "command": "metrics-cpu.rb",
      "subscribers": [
        "production"
      ],
      "interval": 10,
      "handler": "debug"
    }
  }
}

For a full listing of the metrics-cpu.rb command line arguments, run /opt/sensu/embedded/bin/metrics-cpu.rb -h.

Create a metric analysis check

A metric analysis check analyzes metric data which may or may not have been collected by a metrics collection check. By querying external metric stores (e.g. Graphite) to perform data evaluations, metric analysis checks allow you to perform powerful analytics based on trends in metric data rather than a single data point. For example, where monitoring and alerting on a single CPU utilization data point can result in false positive events based on momentary spikes, monitoring and alerting on CPU utilization data over a specified period of time will improve alerting accuracy.

Because metric analysis checks require interaction with an external metric store, providing a functional example is outside of the scope of this guide. However, assuming the existence of a Graphite installation that is populated with metric data, the following example checks could be used.

NOTE: If you’ve not configured a Graphite instance for these example checks but would like to, you can head over to Graphite’s quick start guides and get a Graphite Vagrant box spun up fairly quickly, and continue with the examples below.

The following check uses the check-graphite-data.rb script, provided by the Sensu Graphite Plugin, to query the Graphite API at localhost:9001. The check queries Graphite for a calculated moving average (using the last 10 data points) of the load balancer session count. The session count moving average is compared with the provided alert thresholds. A Sensu client running on the Graphite server would be responsible for scheduling and executing this check (standalone mode).

NOTE: Sensu services must be restarted in order to pick up configuration changes. Sensu Enterprise can be reloaded.

{
  "checks": {
    "session_count": {
      "command": "check-graphite-data.rb -s localhost:9001 -t 'movingAverage(lb1.assets_backend.session_current,10)' -w 100 -c 200",
      "standalone": true,
      "interval": 30
    }
  }
}

The following check uses the check-graphite-data.rb script, provided by the Sensu Graphite Plugin, to query the Graphite API at localhost:9001 for disk capacity metrics. The Graphite API query uses highestCurrent() to grab only the highest disk capacity metric, to be compared with the provided alert thresholds. This check will trigger an event (alert) when one or more disks on any machine are at the configured capacity threshold. In this example configuration, the check is configured to warn at 85% capacity (-w 85), and to raise a critical alert at 95% capacity (-c 95).

{
  "checks": {
    "disk_capacity": {
      "command": "check-graphite-data.rb -s localhost:9001 -t 'highestCurrent(*.disk.*.capacity,1)' -w 85 -c 95 -a 120",
      "standalone": true,
      "interval": 30
    }
  }
}

The following instructions will install the Sensu Graphite Plugin (version 0.0.6) using Sensu’s embedded Ruby, providing the check-graphite-data.rb script.

sudo sensu-install -p graphite:0.0.6

Checking on Other Clients

Sensu supports running checks where the results are considered to be for a client that isn’t actually the one executing the check- regardless of whether that client is a Sensu client or simply a proxy client. There are a number of reasons for this use case, but fundamentally, Sensu handles it the same.

Checks are scheduled normally, but by specifying a Proxy Request in your check, clients that match certain definitions (their client_attributes) cause the check to run for each one. The attributes supplied must normally match exactly as stated- no variables or directives have any special meaning, but you can still use eval to perform more complicated filtering with Ruby on the available value, such as finding clients with particular subscriptions (given that we’re dealing with arrays):

{
  "checks": {
    "...": "...",
    "proxy_requests": {
      "client_attributes": {
        "user_variable": "some_value",
        "subscriptions": "eval: value.include?('a_subscription')"
      }
    }
  }
}

 Adding a Client

Intro to Filters