StackState Observability Platform v5.1: Context Is King

Andreas bluecircle
Andreas PrinsCEO
8 min read

StackState v5.1 contains the following improvements:

  • Extensive improvements in the right panel, yielding more contextual insights 

  • Increased accuracy of probable root cause detection

  • Improved detection of daily and weekly patterns (seasonality), which improves the accuracy of our AIOps-based anomaly detection

Our mission is to provide you with an observability platform to make zero downtime a reality. All additions in this latest release contribute to that mission. If you’re part of a DevOps team troubleshooting issues, you’ll love the additional insights you get in the right panel as well as the increased accuracy of probable root causes of issues. If you are on an operations team that is trying to increase control with the help of the StackState 4T® Monitors, you’ll benefit from the additional insights and possibilities. And if you strive for problem prevention, you certainly benefit from our AIOps-based seasonal pattern detection.

“Context is king” in observability means going from insights to conclusions, from a high-level overview to a deep understanding and from an alert to a resolution.

Enhancements to StackState observability platform 5.1: right panel

The right panel had many enhancements in this release and now gives you an even richer experience:

  • A relation is a visual representation of dependencies between two components in your stack. As we do with components, StackState shows crucial information on relations such as trace data, network connection data and golden signals. With v5.1, related problems are now shown for each component and relation. When troubleshooting, this additional insight helps you understand if your component is involved in more issues than the one you’re currently looking at.

  • Events are also now visible as part of the component and relations. Previously, events were shown only in the View Summary tab and in the Event perspective. But when you're troubleshooting, you need to know the context of all the events that relate to what you’re investigating. We have now added Event information to components and relations to help you see what changes have been made, and also see when your component failed before.

  • Relations are now shown on the component details panel, so you don’t need to go to the topology view to find this information. This is very important because you can easily retrieve the telemetry from the relation and see to what other components it is connected. 

  • To help you configure your monitors most effectively, the health section in the component details and relation details panel now shows warnings if something is wrong with the monitors (e.g., no data is flowing in). 

  • The health section has been restyled to ensure the information you need to understand the situation is clearly visible.

  • To help you easily find the right information, there is now dynamic naming in the details tab in the right panel. The tab shows the name of the elements you’re looking at, from components and relations to events and problems. 

Increased accuracy of probable root cause detection

Although it is our mission to support zero downtime enterprises, the reality is that incidents and outages still happen. StackState has the ability to spot problems, track the changes that matter in relation to the problem and identify the probable root causes for the user. Our goal is to bring you the smallest number of probable causes, and the ones we provide should be the most accurate ones. That focus helps you find and fix issues faster, reduce MTTR, and minimize customer impact.

With our v5.1 release, we’ve improved the accuracy of our probable root cause detection by extending our algorithm to take external events into account as we calculate the probable root cause.

In the UI, you’ll now notice that the probable causes are clustered by similar events. This helps you to browse through the various events

Enhanced AIOps-driven anomaly detection improves seasonality pattern detection

An anomaly is a deviation from normal behavior in a metrics stream. This can be a sudden shift in a baseline level or an unexpected spike. Anomalies are often caused by events outside the component, such as a tremendous increase in site traffic or a failing component elsewhere.

In many cases, systems behave differently at different hours of the day or on different days of the week, e.g. on Friday there is always an afternoon peak because a lot of sales are done at that time. This peak should not be an anomaly since it comes every week. However, if the peak is absent; it should be seen as an anomaly.

Examples of seasonality patterns

It is important to detect that certain daily or weekly patterns are normal, so that these patterns are not reported as anomalies.

Examples of daily patterns are:

  • A traffic increase early in the morning when all workers log in to a particular system

  • At 9:00, the beginning of a trading day, there is a tremendous load to the involved systems

  • At the end of day, systems run numerous closing reports that always start at a scheduled time.

Examples of weekly patters are:

  • A customer service group that is only open on working days, causing a peak of calls to the system on Monday morning

  • Systems that facilitate online betting around sporting events that happen on Saturday, such as a football match

  • With the new support of seasonality, our anomaly detection will provide greater accuracy and better way to pinpoint failures. The results are less noise and fewer distractions. 

Context is 🤴

StackState v5.1 brings a ton of new observability capabilities for everyone involved. All new features are centered around bringing accurate information to the user at the time and place it is needed the most. 

Learn more: