AIOps for Real: Characteristics of a Platform That Add Value and Drive Change

Andreas PrinsCEO
12 min read

When you’re investing in automation solutions, ultimately, tangible results need to follow quickly. Getting a return on investment (ROI) out of an automation project after two years is something that would have been OK in the not-so-distant past but is no longer acceptable nowadays. With the current speed of change, where new technologies come and go and existing ones evolve at lightning speed, IT teams require much faster time to value on automation investments.  

In the area of AIOps this is not different: Investments in time and tools need to lead to concrete results. To truly understand what you need, I recently was re-reading the Solution Criteria for AIOps Platforms from Gartner®, authored by Gregory Murray (NOTE: You will need access to Gartner research to read the article).  

The information he provided is a very useful instrument when selecting the right AIOps solution. Why? Because it defines 13 clear selection criteria to pay attention to. When doing AIOps for real, these criteria matter!  

This blog post takes a few of Murray’s criteria and provides some additional color around them.   

The value that an AIOps platform provides

Let’s take a closer look on what AIOps platforms do and what they can bring to your organization. According to the Gartner research report:  

AIOps platforms analyze telemetry and event streams to transform data into meaningful patterns and enable proactive responses that reduce toil and overhead.” [1]  

This very first statement in the article triggers straightaway three important areas of AIOps platforms you need to think through:  

  1. An AIOps platform requires input data; this needs to be available in your organization. AIOps platforms obviously can’t perform magic if data from telemetry and event streams are not present in your organization. In our experience, customers that have already invested in monitoring and observability benefit very fast from an AIOps layer on top of that telemetry data to intelligently help you quickly find the data that matters and unify data silos. As Gartner says:  "Investments in observability and modern monitoring coverage have improved visibility but have created an overwhelming amount of data and siloed dashboards. AIOps platforms ingest and unify the events across almost every monitoring domain.“ [2]    

  2. You need to define where you want to apply AIOps. The promise of AIOps is magical. However, just having data in a central place still requires you to define what you want to do with that data. What are the meaningful patterns and proactive, predictive responses you want to achieve in using AIOps capabilities? It is true that the platform must perform a ton of intelligent processing to make sense of the data and to start identifying patterns. Depending on the available data and the exact purpose of the data, an AIOps platform can drive several different things. Within the StackState platform this includes, for example, probable root cause identification, alert noise reduction, autonomous anomaly detection and proactive problem prevention.     

  3. It’s all about the goals you set. If earlier attempts at implementing AIOps have failed utterly, it’s about time to rethink your goals, first, before proceeding. What do you want to achieve with AIOps? Where do you want to be after implementing your AIOps platform? Knowing the end game – that is, where you want to be, post-implementation - is super important. Examples of common goals might be to reduce toil, to augment human judgement and prevent issues, to reduce MTTR or to bring the right people to the war room when an outage occurs. Just going for an AIOps platform without a clear purpose won’t bring value to your organization.  

Input, processing and output go hand in hand. The same is also true for the selection of your AIOps platform in the form of data, goals and desired end state.  

The need for topology

In IT, a topology describes the set of relationships and dependencies between the discrete components in an environment (for example, business services, microservices, load balancers, containers and databases). In today’s modern environments, topologies evolve quickly as new code gets pushed to production continuously and the underlying infrastructure changes rapidly. Managing these dynamic environments requires the ability to track the changes in topology over time.  

Topology is fundamental for a strong AIOps platform. At the heart of AIOps is the ability to analyze data feeds. That analysis then intelligently extracts an entire network of dependencies and relationships between components in your IT environment. If the data is well structured, it helps tremendously in driving automation or applying other AI on top of IT assets.   

“AIOps platforms assemble a unified topology of IT assets across domains, leveraging the data from other tools that map and discover the dependencies and connections between assets. Topology can include physical proximity, logical dependency or other dimension that captures the relationship between IT assets and services.  

“Advanced AIOps solutions — like root cause analysis and augmented remediation — will require an understanding of the relationships between IT assets to understand cascading effects of issues. AIOps platforms analyze data feeds and extract connections and dependencies between IT assets, providing a topographical basis for correlation, prediction and association.”[3]  

- Gartner, “Solution Criteria for AIOps Platforms,” by Gregory Murray  

At StackState, topology is at the heart of our solution. It’s not an after-the-fact aspect of AIOps and observability, but the foundation of bringing the vast amounts of data our customers have in their environment – even data from other monitoring and observability tools – together in a structured manner.  

The need for unified observability data

The second - and very important aspect – that needs to be well-understood when applying AIOps is holistic topology. Gartner writes: “AIOps platforms assemble a unified topology of IT assets... across domains and leveraging data from other tools...”[4]  

Why does that matter, you might wonder?   

Let me give you an example. Let’s say you want to do root cause analysis for your online banking environment. 25 IT teams develop and maintain components and features for this application. Most likely, not all IT teams use the same tools. Moreover, they are probably not even in the same organizational unit reporting into the same manager. They may even be unaware of the fact that they are utilizing each other’s IT components. Without having the ability to bring all data together, your hands are tied. But this is a fundamental requirement to start applying AI.   

Gartner calls this cross-domain data ingestion and integration:  

“Cross-domain data ingestion and integration — an AIOps platform ingests, indexes and normalizes events or telemetry from multiple domains, vendors or sources. The platform can rely on separate domain-centric monitoring solutions to gather and process monitoring data or the AIOps platform can monitor IT assets and services directly.”[5]  

Unifying siloed data is not only required in hybrid IT environments, but also applicable in modern SaaS environments. Why? The DevOps teams who are developing software have become more autonomous and have their own tech components within their control. However, what if a failure starts to happen in one part of the chain that cascades through the entire chain? Having the holistic insight that a unified topology brings is crucial for effective root cause detection.  

The smartness after data ingestions and model building

The third point I would like to highlight from the Gartner research report is the selection criteria around analytics and detection solutions. Gartner shares very concrete recommendations of what to look at when selecting these solutions. Earlier in this blog post, we talked about “The Goal” (not the book, but your big, hairy audacious goal(s) when implementing AIOps). Your analytics and detection capabilities are dependent on these well-defined goals. If you don’t know what you want to achieve, it’s hard to assess what analytics and detection capabilities you need.  

Gartner explains:        

“Select an AIOps platform that provides the right set of analytic techniques for your environment and AIOps vision. Evaluate vendors through your willingness to trust purely empirical analysis and the extent to which their analysis allows for human operators to contribute to the training and analytics.”[6]  

Once you’ve set your goals, you can begin to identify the analytical capabilities you need to achieve your goals. The article clearly explains many different analytical techniques. Let’s take a closer look at a few of them and what they can bring to your organization as you look to adopt AIOps:  

  • Probable root cause association. This is what many people are looking for when selecting a platform. Why? Because this has direct customer impact. As soon as you have an outage or a production issue, you need to be on top of it and solve it as fast as you can.  

  • Change as root cause. Based on input from our own customers and other research, we’ve learned that at least 70% of failures are due to changes. That number may even be higher. Needless to say, capturing change and making this change data part of the issue detection data so that you can store it in a meaningful way is super important.   

  • Event Correlation. Event correlation is vital to reduce toil. In a typical IT organization, many people get distracted by false alarms or too many confusing, duplicate or meaningless alarms. Being able to correlate them and bring the real issue(s) to the surface is a critical process. An analytical technique that correlates events in your platform can help you with this.   

  • Anomaly Detection. Anomalies are often early warning signals that something might go wrong. It is important for teams to be on top of them. For more information, read “StackState Autonomous Anomaly Detection: AIOps for Real” to further dive into this topic.

Further reading

The Gartner Solution Criteria for AIOps Platforms research report contains many more selection criteria and required capabilities in each of the criteria categories, from data ingestion and integration to platform, deployment and more.    

Attribution and disclaimers  

[1], [2], [3], [4], [5], [6] Gartner, “Solution Criteria for AIOps Platforms,” Greg Murray, Published 27 May 2021.  

Gartner is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.  

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.