The power of AIOps: automatic remediation with StackState and Rundeck

Profile Martin van Vliet
Martin van Vliet
5 min read

Introduction to Rundeck and StackState

Rundeck

Rundeck is a tool to enable self-service in IT Operations, automating actions on your IT infrastructure and making it possible for others to run them, safely, by themselves. Many IT teams use Rundeck to automate recurring procedures so they can focus on more important or challenging tasks. Imagine tasks like removing excess log files if disk space runs low, or restarting services that have become unstable. These relatively simple tasks are not exactly rocket science, but can be a life saver if things go pear-shaped.

StackState's AIOps platform StackState utilizes current IT investments, by combining and analyzing metrics, logs, events and data beyond typical monitoring data, like Google Analytics, CMDBs, CI/CD tools, service registries, automation and incident management tools. StackState uses the variety of data it collects to learn about dependencies, allowing it to build and monitor a topology of dynamic IT landscapes in real time. With its flexible filtering mechanism, users can split the IT landscape into separate views, coherent subsets of related IT components that are monitored as a unit. A view can monitor a single application, an architectural layer such as all storage systems or a combination of both.

Traces Topology

Views are monitored by StackState so that any negative change in the metrics of a component in the view is picked up. If the change makes any of the components unhealthy, StackState produces a state change event that can trigger an alert. Alerts can be sent to alerting systems such as Slack, HipChat or PagerDuty. Leveraging StackState's event handler mechanism also makes it possible to integrate with other systems.

Event handlers to trigger remediation jobs

Event handlers are responsible for handling incoming events and handing them off to external systems. Event handlers operate in a fire-and-forget mode, with an event handler being invoked exactly once for a state change. StackState includes several event handlers out of the box and its flexible architecture allows users to define their own handlers as well. In the remainder of this article, I will describe how to integrate StackState with Rundeck using event handlers so that any view state change in StackState triggers a remediation job in Rundeck.

A scenario explained

Imagine the following scenario. You are monitoring one of the critical systems for your company. The application running on it produces a lot of logging, so much so, that the disk of the system regularly fills up, causing the application to crash. In order to keep the application available, you log into the machine when the disk space is running low and remove the excess logging. When using StackState in combination with Rundeck, this would happen:

  • StackState is monitoring the disk space on the target system in a specially configured view

  • when the disk space runs low, the component representing the system will fail it's health check. StackState updates the view health state

  • the view health state triggers an event handler that invokes an automatic remediation job in Rundeck

  • Rundeck executes the job and logs into the target system, then cleans up the excess log data

  • with the disk space cleaned up, the system has enough disk space to continue running. The component in StackState representing the system returns to a clear health state and so does the StackState view

In the remainder of this article, I will describe how to integrate StackState with Rundeck using event handlers to enable the above scenario.

Configuring StackState and Rundeck for automatic remediation

StackState invokes a Rundeck job through an event handler. The event handler itself is a Groovyfunction that performs an HTTP POST request on the Rundeck server. The URL invoked as similar to the following: http://rundeck.acme.com/api/31/job/e33447e0-d02c-4ff3-aedb-1f0b62c2ac0f/run RundeckTo ensure StackState can invoke Rundeck, the Rundeck server must be configured to allow programmatic access to its API. This can be done by configuring an API token. Once configured, the easiest way to connect StackState to Rundeck is to configure a proxy such as nginx to pass the right credentials. Here is a sample nginx configuration snippet: location /api {proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header 'X-Rundeck-Auth-Token' 'E4rNvVRV378knO9dp3d73O0cs1kd0kCd';proxy_set_header 'Content-Type' 'application/json';proxy_pass http://localhost:4400;} StackState Event handlers are defined in the Settings section of StackState. Create a new event handler and keep the event parameter with the StackState event stream as its input type. The followingcode implements an event handler that connects with Rundeck. It uses the built-in webhookPlugin to perform the HTTP request: def elementNameOpt = event.newStateRef.elementName def rundeckHost = 'rundeck.acme.com'def jobid = 'e33447e0-d02c-4ff3-aedb-1f0b62c2ac0f' def url = 'http://' + rundeckHost + '/api/31/job/' + jobid + '/run' def json = new groovy.json.JsonBuilder()json filter: "tags: " + elementNameOpt.get() webhookPlugin.sendMessage(url, json.toString())  Notice the following:

  • the rundeckHost and jobid variables can be passed in as parameters to the event handler function to make it reusable

  • the JSON payload to the request includes the Rundeck node filter to select which nodes the job will be executed on. In this example, the filter uses the component's name property as to refer to a tag on the Rundeck node

  • Rundeck supports

    that can tell Rundeck about nodes in other systems, such as StackState

Now for each view you want to connect to Rundeck, follow the instructions in the Alerting guide. This is just one example of how StackState's AIOps platform enables automatic remediation. If you would like to learn more about StackState, then request a free guided tour right here.