Automated Root Cause Analysis & Anomaly Detection in Concert
Artem Grotov· 3 min read
Everyday IT operators are trying to prevent outages of business-critical applications. When prevention is not possible, IT operators strive to reduce the mean time to repair (MTTR) as much as possible. Improving resolution time can be quite a challenge. But IT operators don't stand alone in this challenge. They can use smart solutions that support Automated Root Cause Analysis and Anomaly Detection. Read and learn how Automated Root Cause Analysis and Anomaly Detection can work in concert to reduce MTTR in this blog post.
Automated Root Cause Analysis & Anomaly Detection
Automated Root Cause Analysis helps IT operators reduce the mean time to repair. It allows you to see the root cause of a problem immediately and to focus on solving it. But in order for it to work properly, it needs some kind of Artificial Intelligence (AI) behind it that can figure out what's wrong with the system. Or in other words, how does Automated Root Cause Analysis get to know what the root cause of a problem is?
Have you read our latest white paper already? Download it right here!
To find out what caused a problem, Automated Root Cause Analysis looks at where the problem occurs at the moment and then uses the IT environment's topology to trace the problem to its origins. Automated Root Cause Analysis walks through the topological graph and examines each component, figuring out if it is misbehaving.
That's where Anomaly Detection comes in. Anomaly Detection compares current behavior with what it considers reasonable, and if it sees a significant deviation, it will flag the component as a potential root cause. In short, Automated Root Cause Analysis uses Anomaly Detection to decide if a component can be a root cause of a problem. That's why a solution that has a goal to prevent outages and reduce MTTR should always have both Automated Root Cause Analysis and Anomaly Detection working in concert. Not only one or the other.
StackState's Synergy Effect
The extent to which Automated Root Cause Analysis and Anomaly Detection are efficient in solving problems can differ a lot. They have a certain level of quality. Not only that, but the relationship between them can also affect the level of quality. Because the condition of Anomaly Detection affects the quality of Automated Root Cause Analysis and the MTTR, this synergy effect is essential.
How can you get the most out of it?
At StackState, we use Automated Root Cause Analysis to tune our Anomaly Detection. StackState applies Automated Machine Learning to select Anomaly Detection algorithms to correctly point out the root cause of problems that have occurred in the past. In other words, the 'machine' is 'learning' automatically from the past. This way, it becomes better each time it identifies and solves a new issue. Using StackState's 4T data model, incidents are linked to problems and automatically improve the quality of Anomaly Detection, resulting in a better Automated Root Cause Analysis and reducing MTTR. Keep an eye on our blog to get to know more about this cool feature!
NOTE: Automated Root Cause Analysis using Anomaly Detection will be available in StackState V. 4.2
StackState's Topology and Relationship-Based Observability platform lets you more effectively manage your dynamic IT environment by unifying performance data from your existing monitoring tools into a single topology. Enabling you to:
Decrease MTTR: Decrease MTTR by 80% by identifying root cause and alerting the right teams with the right information.
Less Outages: Reduce the number of outages by 65% through real-time unified observability and more planful planning.
Faster Releases: Increase application releases by 3X by giving time back to developers.
Artem Grotov· 3 min read