The Business Case for Observability With Context

charles araujo
Guest Blogger: Charles Araujo
7 min read

My team was not happy with me.

I had just convened a meeting of my direct reports — and the managers that reported to them — to deliver the news personally. “No more new tools,” I told them. “We have everything we need to do our job. Our environment just doesn’t change that fast. So stop bringing me requests for new tools. The answer is no.”

It was unquestionably the right call at the time, but today it would be laughable. Almost twenty-five years ago, my statement that our environment was relatively static and that we had everything we needed was true — even being one of the most progressive and technology-forward healthcare organizations in the country.

Today I could never imagine making such a statement. The technology is simply changing too fast and the stakes are rising too high. There is no such thing as a static environment any longer.

As a business or IT leader, you now need to always be on the lookout for something that can give you a leg up — an advantage in a rapidly changing world. And when it comes to managing your technology stack, you can find that advantage in an old adage: the best offense is a good defense.

A world of continuous change

dynamic-topology-blog-img

The healthcare organization I worked for twenty-five years ago was at the forefront of both the delivery of care and the use of technology to do so. Despite being an industry leader, however, our technology stack changed slowly.

Today, the opposite is true. Even industry laggards are coping with an unrelenting pace of change.

The cloud, agile development methodologies, continuous deployments, containers, microservices, and a host of other advancements have resulted in a technology stack that is both continually changing and, increasingly, ephemeral.

Enterprise leaders have done their best to keep up, adding new tools to match whatever new technology has hit the market. But it appears to be a losing battle.

Based on a recent PwC Study entitled, Digital Evolution: What Corporate Directors and Executives Think, it would seem that these efforts have not done enough to help enterprises keep pace.

In this study, only 35% of corporate directors stated that they believe their organizations encourage rapid enough innovation. And only 29% believe they are effectively integrating new technologies into their business processes.

If you need it, the memo is don't expect the rate of change to let up anytime soon. Of course, the challenge is that most of the approaches and tools that IT organizations use to monitor and manage these environments — even newer ones — can trace their monitoring-centric heritage back to the days in which technology stacks were much more static.

The result of exponentially increasing change that shows no sign of abating, and a technology monitoring and management apparatus built for another time is now threatening to exact a high cost for those organizations that don't find a way to adapt.

The cost of continuous change

The challenge for many IT leaders is that, while significant, it can be challenging to quantify the costs associated with the complexity and rate of change inherent in today's technology stack.

Moreover, the nature of those costs has, likewise, become more complex to calculate. To begin with, there is the traditional cost of system downtime. This cost often includes both the cost of lost productivity as well as the resources consumed in the restoration of services. As technology has become ever-more entwined in every facet of business operations, this direct cost of downtime has increased exponentially.

The 2020 ITIC study on hourly downtime estimated that the average cost of downtime for a large enterprise is now topping $5 million per hour. And the risks are widening, with 87% of organizations now requiring a minimum of 99.99% availability.

The modern enterprise must now grapple with a technology stack that is in a continuous state of change.

But while the internal costs associated with performance impacts are rising, they only tell part of the story — and, arguably, what is now the smaller part of it.

As the application of technology broke out of the back office and became a driver of the customer experience, the cost and impact of a system failure have taken on a whole new dimension.

Another recent PwC study entitled, Experience is Everything, had some shocking results about the impact of the experience on an organization’s customer relationships and financial performance.

The study found that even when dealing with a brand they love, 32% of customers would stop doing business with them after a single bad experience. Let it happen a few times and a whopping 59% would leave.

On the flip-side, however, 50% of these same customers will pay for more efficient services, and 35% will pay more for something delivered using up-to-date technology. In addition, things like easier payments, convenience, easier mobile experiences, personalization, and automation all drove price premiums — and are all technology-enabled.

The intersection becomes plain to see.

The modern enterprise must now grapple with a technology stack that is in a continuous state of change. Yet this highly complex environment now powers almost everything — both internally and externally — and presents dire consequences with every minute it fails to perform.

Therefore, the pressure on the IT operations team comes down to two things that are, essentially, two sides of the same coin: how to find the cause of any disruption and how to restore service as quickly as possible once you do.

The Intellyx take: The justification for a new approach

Rapidly determining causality and finding a pathway to resolving any performance impact in the highly complex, rapidly changing technology environment of the modern enterprise is a tall order — and one that demands new approaches built for this reality.

The industry has recently begun to recognize that traditional monitoring approaches in which organizations watched for specific, pre-defined exceptions are no longer sufficient — organizations just cannot adjust monitoring profiles fast enough.

This fact has led to the development of a new approach that relies on so-called telemetry — the logs and metrics that elements of the technology stack generate during normal operations. The industry has called this process, which allows organizations to observe the stack's operational state, observability.

As my Intellyx colleague, Jason Bloomberg, explained in a recent blog, observability by itself is not enough.

The reason is that solving the challenges of complexity and ephemerality requires something more: context.

context-blog-img

This context helps IT leaders understand the impact of a systemic failure in relation to other elements of the stack and the business outcomes it supports. This business-oriented context is particularly relevant when it comes to coping with the experiential costs of performance issues and system failures — and it is why enterprise leaders are turning to providers, such as StackState, that can provide it.

All of which brings me back to my twenty-five years ago self. As I imagine an alternate reality in which I had stayed in IT, I would most certainly be singing a different tune today. In fact, I'd be scouring the Internet for new solutions that would help me tame the complexity of my stack and ensure that I could deliver the customer experience that my organization demanded. I have no doubt that solutions that offered me observability with context would be on my shortlist — and I wouldn't hesitate to add them to my arsenal of tools.

Copyright © Intellyx LLC. StackState is an Intellyx client. None of the other companies mentioned in this article are Intellyx clients. Intellyx retains full editorial control over the content of this paper.