The power of AIOps: automatic remediation with StackState and Rundeck
Martin van Vliet· 5 min read
In order to cope with the complexity of IT environments today, IT Operations must rely more and more on automation to get the job done. On the one hand, events, metrics and logs from the managed systems must be de-duplicated and filtered to reduce noise. StackState, with its focus on topology, is extremely well equipped for this. On the other side, automating repetitive tasks to remedy common, known problems provides relief. Tools such as Rundeck make this possible. This article describes how StackState can be combined with Rundeck to set up automatic remediation for common issues.
Introduction to Rundeck and StackState
Rundeck is a tool to enable self-service in IT Operations, automating actions on your IT infrastructure and making it possible for others to run them, safely, by themselves. Many IT teams use Rundeck to automate recurring procedures so they can focus on more important or challenging tasks. Imagine tasks like removing excess log files if disk space runs low, or restarting services that have become unstable. These relatively simple tasks are not exactly rocket science, but can be a life saver if things go pear-shaped.
StackState's AIOps platform StackState utilizes current IT investments, by combining and analyzing metrics, logs, events and data beyond typical monitoring data, like Google Analytics, CMDBs, CI/CD tools, service registries, automation and incident management tools. StackState uses the variety of data it collects to learn about dependencies, allowing it to build and monitor a topology of dynamic IT landscapes in real time. With its flexible filtering mechanism, users can split the IT landscape into separate views, coherent subsets of related IT components that are monitored as a unit. A view can monitor a single application, an architectural layer such as all storage systems or a combination of both.
Views are monitored by StackState so that any negative change in the metrics of a component in the view is picked up. If the change makes any of the components unhealthy, StackState produces a state change event that can trigger an alert. Alerts can be sent to alerting systems such as Slack, HipChat or PagerDuty. Leveraging StackState's event handler mechanism also makes it possible to integrate with other systems.
Event handlers to trigger remediation jobs
Event handlers are responsible for handling incoming events and handing them off to external systems. Event handlers operate in a fire-and-forget mode, with an event handler being invoked exactly once for a state change. StackState includes several event handlers out of the box and its flexible architecture allows users to define their own handlers as well. In the remainder of this article, I will describe how to integrate StackState with Rundeck using event handlers so that any view state change in StackState triggers a remediation job in Rundeck.
A scenario explained
Imagine the following scenario. You are monitoring one of the critical systems for your company. The application running on it produces a lot of logging, so much so, that the disk of the system regularly fills up, causing the application to crash. In order to keep the application available, you log into the machine when the disk space is running low and remove the excess logging. When using StackState in combination with Rundeck, this would happen:
StackState is monitoring the disk space on the target system in a specially configured view
when the disk space runs low, the component representing the system will fail it's health check. StackState updates the view health state
the view health state triggers an event handler that invokes an automatic remediation job in Rundeck
Rundeck executes the job and logs into the target system, then cleans up the excess log data
with the disk space cleaned up, the system has enough disk space to continue running. The component in StackState representing the system returns to a clear health state and so does the StackState view
In the remainder of this article, I will describe how to integrate StackState with Rundeck using event handlers to enable the above scenario.
Configuring StackState and Rundeck for automatic remediation
StackState invokes a Rundeck API token. Once configured, the easiest way to connect StackState to Rundeck is to configure a proxy such as to pass the right credentials. Here is a sample nginx configuration snippet: StackState Event handlers are defined in the Settings section of StackState. Create a new event handler and keep the event parameter with the StackState event stream as its input type. The followingcode implements an event handler that connects with Rundeck. It uses the built-in webhookPlugin to perform the HTTP request: Notice the following:through an event handler. The event handler itself is a Groovyfunction that performs an HTTP POST request on the Rundeck server. The URL invoked as similar to the following: RundeckTo ensure StackState can invoke Rundeck, the Rundeck server must be configured to allow programmatic access to its API. This can be done by configuring an
the rundeckHost and jobid variables can be passed in as parameters to the event handler function to make it reusable
the JSON payload to the request includes the Rundeck node filter to select which nodes the job will be executed on. In this example, the filter uses the component's name property as to refer to a tag on the Rundeck node
that can tell Rundeck about nodes in other systems, such as StackState
Now for each view you want to connect to Rundeck, follow the instructions in the Alerting guide. This is just one example of how StackState's AIOps platform enables automatic remediation. If you would like to learn more about StackState, then request a free guided tour right here.
Martin van Vliet· 5 min read