Observability Unpacked: 5 Takeaways From KubeCon + CloudNativeCon 2024

Andreas bluecircle
Andreas PrinsCEO
5 min read

StackState had a blast at this year's KubeCon + CloudNativeCon gathering in Paris! The discussions were in-depth, covering a wide array of topics and lasting much longer than in the past. This year, attendees seemed to have a considerably deeper understanding of the cloud-native ecosystem, probably attributed to its rapid growth.

We also noticed a pretty dramatic evolutionary shift in the vendors at the expo hall, who were showcasing some truly progressive specialized solutions. Undoubtedly, these advanced offerings will continue to inspire attendees and their teams back home to rethink and refine their strategies for leveraging cloud-native technologies.

5 key takeaways from KubeCon + CloudNativeCon 2024

With that in mind, we’d like to share our 5 key takeaways — from an observability perspective, of course — so you can consider how to further improve your observability strategy.

1. Breakthrough of standards: OTEL and eBPF

This year could easily be dubbed “The Year of Observability Standardization.” I've never seen such rapid adoption of these standards across various user communities, commercial software vendors, and enterprises of all sizes within an industry.

There is clearly a general agreement that these standards will drive greater adoption of observability, clarify the scope of the benefits it provides, and continue to drive observability-based to the next level. 

Two standards stand out for their impact and are crucial in improving observability and enhancing our ability to understand and optimize complex systems. They are:

  • OpenTelemetry, also known as OTEL, is a framework that simplifies the collection and transport of telemetry data (metrics, traces, logs) in distributed systems, providing comprehensive observability and a unified view of system performance and behavior. By adopting OpenTelemetry, organizations can establish a standardized approach to telemetry data management, facilitating better observability across their services.

  • eBPF, aka Extended Berkeley Packet Filter, enables kernel-level visibility without the need to instrument code. By facilitating dynamic tracing and performance analysis and providing deep insights into system operations and network traffic, eBPF is revolutionizing system observation and monitoring. 

StackState’s Takeaway: Embracing these two standards, StackState fully supports OpenTelemetry and harnesses eBPF for advanced observability. Over time, StackState has utilized eBPF to construct a comprehensive dependency map, capturing golden signals and offering profound insights into system health and performance without requiring changes to the application. With complete OTEL support, StackState enhances troubleshooting by pinpointing individual traces causing performance issues across diverse environments.

2. Broadening of the space towards security

With the evolving technology landscape, security is increasingly becoming intertwined with observability, reflecting a trend where it's seen not as a separate discipline but as an integral part of the IT ecosystem. 

Leading solutions like Upwind.io and wiz.io are at the forefront of bringing security to the next level, offering advanced security capabilities that seamlessly integrate with observability platforms. This integration provides comprehensive insights into both system performance and security posture, addressing the growing need for holistic monitoring and protection in modern IT environments.

Additionally, the emergence of open-source solutions like Falco, and Neuvector by SUSE underscores the community's commitment to developing robust security tools that seamlessly integrate into observability frameworks. This convergence of security and observability represents a significant shift towards a more holistic approach to system management, where understanding and mitigating security risks becomes a part of the daily workflow, reflecting a heightened emphasis on proactive security measures within operational practices.

StackState's Takeaway: At StackState, we understand the crucial significance of integrating these security tools into our observability solution. By incorporating solutions such as Falco and Prisma Cloud, we empower development teams to not only monitor their systems more effectively but also seamlessly weave security considerations into their observability practices — a holistic approach ensures that both system performance and security are prioritized and managed cohesively within our platform.

Integrating these tools is fundamental to delivering a better developer experience, where teams can focus on innovation because problem-solving becomes really easy. Detecting and resolving issues is no longer a significant task but part of your broader tool suite.

3. Humanization of the interaction through AI-powered conversations

The rise of AI-driven chat integrations marks a major transition in our approach to incident management and resolution. No longer are alerts mere notifications signaling an issue; they now serve as the inception point for structured, AI-assisted dialogues that go deep into problem-solving.

This shift goes beyond traditional alert routing, ushering in a new era of solutions focused on speeding up resolution processes. The introduction of generative AI mechanisms transforms alerts into dynamic conversations, often in Slack, promoting a more nuanced and effective approach to incident management.

The following two examples are innovations that not only promise to accelerate resolution times but also foster a more engaged and collaborative problem-solving environment. 

  • Rootly distinguishes itself with its AI-powered on-call and incident management solution, seamlessly integrated with Slack. This platform streamlines the response process, from the initial alert to retrospective analysis, while also diminishing repeat incidents and reducing the average time to resolve them.

  • Incident.io provides a platform that reimagines incident management by integrating on-call schedules, incident response workflows, and status pages into a unified system. Designed with automation, consistency, and actionable insights in mind, it enables teams to respond more efficiently while keeping stakeholders informed. Its seamless integration with Slack and other communication tools further streamlines the incident management process.

StackState's Takeaway: Given the significance of these trends, we provide a rich webhook that seamlessly integrates with systems like these, merging StackState's powerful observability with effortless communication.

4. Early steps of AI in observability: context is lacking

From our discussions with KubeCon attendees, it's evident that Kubernetes has evolved beyond being just the operating system of the cloud; it now serves as an ideal platform for AI workloads as well. We highly recommend watching a few presentations that explore this subject to gain deeper insights.

This year, there was a noticeable decrease in the presence of AI in the expo hall. This could be attributed to the initial excitement settling down or perhaps because brands are still in search of compelling use cases.

Some observability vendors have recognized this opportunity and have attempted to offer troubleshooting solutions in this area. However, based on our observations from several demos, the answers provided thus far offer minimal value. This is likely because the larger context is missing. Questions about how components depend on each other, when changes occurred, and how metrics, logs, and events evolve over time — and what these indicate about a resource — are just a few examples.

This is a trend worth monitoring, as maturing use cases and the development of dedicated AI models could make this an applicable approach to your data set.

StackState's Takeaway: StackState has begun initial experiments with GenerativeAI to leverage the data within StackState and utilize AI to assist in Resolution Reasoning. The aim here is to arrive at conclusions about issues more quickly and accurately. If you're interested in seeing this in action through a live demo, please reach out to us. 

Additionally, our updated remediation guides are designed to help users identify the root causes of issues more efficiently and accurately. Experience this for yourself in our playground.

5. Multi-cloud observability as a result of rise in cloud adoption

The Kubernetes paradigm has undergone remarkable evolution, transitioning from the simplicity of single-cluster applications to the complexity of multi-cluster architectures spanning various clouds. Initially, managing these distributed clusters posed a significant challenge. However, advancements in tools like SUSE Rancher, RedHat OpenShift, AWS EKS, and automation technologies such as Argo CD and Fleet have streamlined deployments across diverse environments, significantly mitigating the complexity once associated with multi-cluster management.

Yet, the question of how to effectively observe and monitor these distributed environments remains. Observability across multiple clouds and clusters is not just a technical necessity but a strategic asset that ensures resilience, scalability, and flexibility. And as enterprises adopt multi-cluster strategies, the demand for comprehensive observability solutions that can navigate this complexity becomes even more important. 

StackState's Takeaway: StackState leads the way in enabling multi-cluster observability, offering an integrated suite of tools tailored to meet the nuanced demands of these complex environments. Our approach is holistic, addressing the core challenges identified throughout this series:

  • Extensive Topology Mapping and End-to-End Chain Visualization offer unparalleled visibility into system-wide interactions, crucial for understanding and managing the extensive web of dependencies present in multi-cluster setups.

  • The Metrics Store centralizes data, simplifying monitoring and analysis across disparate environments, while our Set of Monitors ensures ongoing system health.

  • Remediation Guides and Alerting Possibilities enhance responsiveness and resolution capabilities, ensuring teams can quickly address issues.

  • Extensive Filtering and Fine-Grained RBAC underscore our commitment to providing tailored access and insights, catering to the specific needs of various teams while maintaining security and compliance standards.

Ready to redefine observability in your Kubernetes environments?

KubeCon+ CloudNativeCon 2024 has undeniably set a new benchmark for the cloud-native ecosystem, highlighting the important role of observability in today's complex, multi-cluster, and multi-cloud environments. 

StackState has been at the heart of these discussions, demonstrating our commitment to advancing observability standards, integrating security within our framework, humanizing AI-powered interactions, and leading the charge toward effective multi-cloud observability. 

Our solutions are designed not just to keep pace with the evolving technology landscape but to set you ahead of the curve, ensuring your systems are resilient, scalable, and effortlessly manageable, no matter how complex.

See it in action!

Get ready to revolutionize observability within your Kubernetes environments! Visit our Kubernetes Observability Page to discover how.