Monitor server resources with checks

Sensu checks are commands (or scripts) the Sensu agent executes that output data and produce an exit code to indicate a state. Sensu checks use the same specification as Nagios, so you can use Nagios check plugins with Sensu.

You can use checks to monitor server resources, services, and application health (for example, to check whether NGINX is running) and collect and analyze metrics (for example, to learn how much disk space you have left). This guide includes two check examples to help you monitor server resources (specifically, CPU usage and NGINX status).

To use this guide, you’ll need to install a Sensu backend and have at least one Sensu agent running. Follow the RHEL/CentOS install instructions for the Sensu backend, the Sensu agent, and sensuctl.

Register dynamic runtime assets

You can write shell scripts in the command field of your check definitions, but we recommend using existing check plugins instead. Check plugins must be available on the host where the agent is running for the agent to execute the check. This guide uses dynamic runtime assets to manage plugin installation.

The Sensu CPU Checks dynamic runtime asset includes the check-cpu.rb plugin, which your CPU check will rely on. The Sensu assets packaged from Sensu CPU Checks are built against the Sensu Ruby runtime environment, so you also need to add the Sensu Ruby Runtime dynamic runtime asset. Sensu Ruby Runtime delivers the Ruby executable and supporting libraries the check will need to run the check-cpu.rb plugin.

To register the Sensu CPU Checks dynamic runtime asset, sensu-plugins/sensu-plugins-cpu-checks:4.1.0, run:

sensuctl asset add sensu-plugins/sensu-plugins-cpu-checks:4.1.0 -r cpu-checks-plugins

The response will confirm that the asset was added:

fetching bonsai asset: sensu-plugins/sensu-plugins-cpu-checks:4.1.0
added asset: sensu-plugins/sensu-plugins-cpu-checks:4.1.0

You have successfully added the Sensu asset resource, but the asset will not get downloaded until
it's invoked by another Sensu resource (ex. check). To add this runtime asset to the appropriate
resource, populate the "runtime_assets" field with ["cpu-checks-plugins"].

This example uses the -r (rename) flag to specify a shorter name for the dynamic runtime asset: cpu-checks-plugins.

You can also download dynamic runtime asset definitions from Bonsai and register the asset with sensuctl create --file filename.yml.

Then, use the following sensuctl command to register the Sensu Ruby Runtime dynamic runtime asset, sensu/sensu-ruby-runtime:0.0.10:

sensuctl asset add sensu/sensu-ruby-runtime:0.0.10 -r sensu-ruby-runtime

And use this command to register the nagiosfoundation check plugin collection, which you’ll use later for your webserver check:

sensuctl asset add ncr-devops-platform/nagiosfoundation:0.5.2 -r nagiosfoundation

To confirm that all three dynamic runtime assets are ready to use, run:

sensuctl asset list

The response should list the cpu-checks-plugins, sensu-ruby-runtime, and nagiosfoundation dynamic runtime assets:

         Name                                                      URL                                                 Hash    
 ──────────────────── ────────────────────────────────────────────────────────────────────────────────────────────── ───────── 
  cpu-checks-plugins   //assets.bonsai.sensu.io/.../sensu-plugins-cpu-checks_4.1.0_centos7_linux_amd64.tar.gz         8a01862  
  nagiosfoundation     //assets.bonsai.sensu.io/.../nagiosfoundation-linux-amd64-0.5.2.tgz                            6b4f91b  
  sensu-ruby-runtime   //assets.bonsai.sensu.io/.../sensu-ruby-runtime_0.0.10_ruby-2.4.4_centos_linux_amd64.tar.gz    338b88b 

Because plugins are published for multiple platforms, including Linux and Windows, the output will include multiple entries for each of the dynamic runtime assets.

NOTE: Sensu does not download and install dynamic runtime asset builds onto the system until they are needed for command execution. Read the asset reference for more information about dynamic runtime asset builds.

Configure entity subscriptions

Every Sensu agent has a defined set of subscriptions that determine which checks the agent will execute. For an agent to execute a specific check, you must specify the same subscription in the agent configuration and the check definition. To run the CPU and NGINX webserver checks, you’ll need a Sensu agent with the subscriptions system and webserver.

NOTE: In production, your CPU and NGINX servers would be different entities, with the system subscription specified for the CPU entity and the webserver subscription specified for the NGINX entity. To keep things streamlined, this guide uses one entity to represent both.

To add the system and webserver subscriptions to the entity the Sensu agent is observing, first find your agent entity name:

sensuctl entity list

The ID is the name of your entity.

Replace ENTITY_NAME with the name of your agent entity in the following sensuctl command. Run:

sensuctl entity update ENTITY_NAME
  • For Entity Class, press enter.
  • For Subscriptions, type system,webserver and press enter.

Create a check to monitor a server

Now that the dynamic runtime assets are registered, create a check named check_cpu that runs the command check-cpu.rb -w 75 -c 90 with the cpu-checks-plugins and sensu-ruby-runtime dynamic runtime assets at an interval of 60 seconds for all entities subscribed to the system subscription. This check generates a warning event (-w) when CPU usage reaches 75% and a critical alert (-c) at 90%.

sensuctl check create check_cpu \
--command 'check-cpu.rb -w 75 -c 90' \
--interval 60 \
--subscriptions system \
--runtime-assets cpu-checks-plugins,sensu-ruby-runtime

You should see a confirmation message:

Created

To view the complete resource definition for check_cpu, run:

sensuctl check info check_cpu --format yaml
sensuctl check info check_cpu --format wrapped-json

The sensuctl response will include the complete check_cpu resource definition in the specified format:

---
type: CheckConfig
api_version: core/v2
metadata:
  created_by: admin
  name: check_cpu
  namespace: default
spec:
  check_hooks: null
  command: check-cpu.rb -w 75 -c 90
  env_vars: null
  handlers:
  - slack
  high_flap_threshold: 0
  interval: 60
  low_flap_threshold: 0
  output_metric_format: ""
  output_metric_handlers: null
  proxy_entity_name: ""
  publish: true
  round_robin: false
  runtime_assets:
  - cpu-checks-plugins
  - sensu-ruby-runtime
  secrets: null
  stdin: false
  subdue: null
  subscriptions:
  - system
  timeout: 0
  ttl: 0
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "created_by": "admin",
    "name": "check_cpu",
    "namespace": "default"
  },
  "spec": {
    "check_hooks": null,
    "command": "check-cpu.rb -w 75 -c 90",
    "env_vars": null,
    "handlers": [
      "slack"
    ],
    "high_flap_threshold": 0,
    "interval": 60,
    "low_flap_threshold": 0,
    "output_metric_format": "",
    "output_metric_handlers": null,
    "proxy_entity_name": "",
    "publish": true,
    "round_robin": false,
    "runtime_assets": [
      "cpu-checks-plugins",
      "sensu-ruby-runtime"
    ],
    "secrets": null,
    "stdin": false,
    "subdue": null,
    "subscriptions": [
      "system"
    ],
    "timeout": 0,
    "ttl": 0
  }
}

If you want to share, reuse, and maintain this check just like you would code, you can save it to a file and start building a monitoring as code repository.

Validate the CPU check

The Sensu agent uses websockets to communicate with the Sensu backend, sending event data as JSON messages. As your checks run, the Sensu agent captures check standard output (STDOUT) or standard error (STDERR). This data will be included in the JSON payload the agent sends to your Sensu backend as the event data.

It might take a few moments after you create the check for the check to be scheduled on the entity and the event to return to Sensu backend. Use sensuctl to view the event data and confirm that Sensu is monitoring CPU usage:

sensuctl event list

The response should list the check_cpu check, returning an OK status (0)

     Entity          Check                                                                        Output                                                                     Status   Silenced             Timestamp                             UUID                  
 ────────────── ─────────────── ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────── ────────── ─────────────────────────────── ────────────────────────────────────── 
  sensu-centos   check_cpu       CheckCPU TOTAL OK: total=1.23 user=1.03 nice=0.0 system=0.21 idle=98.77 iowait=0.0 irq=0.0 softirq=0.0 steal=0.0 guest=0.0 guest_nice=0.0        0   false      2021-03-18 19:33:30 +0000 UTC   64e54ae9-117b-4867-c2d1-9b1682a474ab

Create a check to monitor a webserver

In this section, you’ll create a check to monitor an NGINX webserver, similar to the CPU check you created in the previous section but using the webserver subscription rather than system.

Install and configure NGINX

The webserver check requires a running NGINX service, so you’ll need to install and configure NGINX.

NOTE: You may need to install and update the EPEL repository with sudo yum install epel-release and sudo yum update before you can install NGINX.

Install NGINX:

sudo yum install nginx

Enable and start the NGINX service:

systemctl enable nginx && systemctl start nginx

Verify that Nginx is serving webpages:

curl -sI http://localhost

The response should include HTTP/1.1 200 OK to indicates that NGINX processed your request as expected:

HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Wed, 17 Mar 2021 20:51:53 GMT
Content-Type: text/html
Content-Length: 4833
Last-Modified: Fri, 16 May 2014 15:12:48 GMT
Connection: keep-alive
ETag: "73762nw0-12e1"
Accept-Ranges: bytes

With your NGINX service running, you can configure the webserver check.

Create the webserver check definition

Create a check that uses the check_service plugin from the nagiosfoundation collection. The nginx_service check will run at an interval of 15 seconds and determine whether the nginx service is running for all entities subscribed to the webserver subscription.

Run the following sensuctl command to create the nginx_service check:

sensuctl check create nginx_service \
--command 'check_service --name nginx' \
--interval 15 \
--subscriptions webserver \
--runtime-assets nagiosfoundation

You should see a confirmation message:

Created

To view the complete resource definition for nginx_service, run:

sensuctl check info nginx_service --format yaml
sensuctl check info nginx_service --format wrapped-json

The sensuctl response will include the complete nginx_service resource definition in the specified format:

---
type: CheckConfig
api_version: core/v2
metadata:
  created_by: admin
  name: nginx_service
  namespace: default
spec:
  check_hooks: null
  command: check_service --name nginx
  env_vars: null
  handlers: []
  high_flap_threshold: 0
  interval: 15
  low_flap_threshold: 0
  output_metric_format: ""
  output_metric_handlers: null
  proxy_entity_name: ""
  publish: true
  round_robin: false
  runtime_assets:
  - nagiosfoundation
  secrets: null
  stdin: false
  subdue: null
  subscriptions:
  - webserver
  timeout: 0
  ttl: 0
{
  "type": "CheckConfig",
  "api_version": "core/v2",
  "metadata": {
    "created_by": "admin",
    "name": "nginx_service",
    "namespace": "default"
  },
  "spec": {
    "check_hooks": null,
    "command": "check_service --name nginx",
    "env_vars": null,
    "handlers": [],
    "high_flap_threshold": 0,
    "interval": 15,
    "low_flap_threshold": 0,
    "output_metric_format": "",
    "output_metric_handlers": null,
    "proxy_entity_name": "",
    "publish": true,
    "round_robin": false,
    "runtime_assets": [
      "nagiosfoundation"
    ],
    "secrets": null,
    "stdin": false,
    "subdue": null,
    "subscriptions": [
      "webserver"
    ],
    "timeout": 0,
    "ttl": 0
  }
}

As with the check_cpu check, you can share, reuse, and maintain this check just like code.

Validate the webserver check

It might take a few moments after you create the check for the check to be scheduled on the entity and the event to return to Sensu backend. Use sensuctl to view event data and confirm that Sensu is monitoring the NGINX webserver status:

sensuctl event list

The response should list the nginx_service check, returning an OK status (0):

     Entity          Check                                                                       Output                                                                   Status   Silenced             Timestamp                             UUID                  
 ────────────── ─────────────── ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────── ────────── ─────────────────────────────── ────────────────────────────────────── 
  sensu-centos   nginx_service   CheckService OK - nginx in a running state                                                                                                    0   false      2021-03-18 19:38:04 +0000 UTC   ab605f6a-26e2-47c8-a843-765129e74f37

Simulate a critical event

To manually generate a critical event for your nginx_service check, stop the NGINX service. Run:

systemctl stop nginx

When you stop the service, the check will generate a critical event. After a few moments, run:

sensuctl event list

The response should list the nginx_service check, returning a CRITICAL status (2):

     Entity          Check                                                                        Output                                                                     Status   Silenced             Timestamp                             UUID                  
 ────────────── ─────────────── ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────── ────────── ─────────────────────────────── ────────────────────────────────────── 
  sensu-centos   nginx_service   CheckService CRITICAL - nginx not in a running state (State: inactive)                                                                           2   false      2021-03-18 19:42:19 +0000 UTC   cb3et55a-1649-43b9-b559-ebu3aa9352b4

Restart the NGINX service to clear the event:

systemctl start nginx

After a moment, you can verify that the event cleared:

sensuctl event list

The response should list the nginx_service check with an OK status (0).

Next steps

Now that you know how to create checks to monitor CPU usage and NGINX webserver status, read the checks reference and assets reference for more detailed information. Or, learn how to monitor external resources with proxy checks and entities.

You can also create a handler to send alerts to email, PagerDuty, or Slack based on the status events your checks are generating.