Discover the power of StackState, designed to provide unparalleled Kubernetes observability and troubleshooting for software engineers.


StackState at a Glance


Dependency Maps

StackState's auto-discovery feature dynamically identifies Kubernetes services and resources, then visualizes their dependencies to provide a holistic view of your entire cluster. You can now understand complex relationships, track resource changes and monitor service performance, even those you don't directly manage. This comprehensive view facilitates faster issue identification and resolution. StackState creates these maps for pods, nodes and any other resource type in your cluster.


Remidiation guides

Remediation Guides

StackState provides guided remediation help to accelerate issue resolution, leveraging advanced algorithms to provide troubleshooting hints and visual assistance. Our remediation guides can be extended or modified to make them even more specific to your environment. They provide an excellent way for Site Reliability Engineers to support their development teams and help maintain high quality services.

Dynamic Out-of-the Box Dashboards

Our dashboards aggregate and correlate all relevant metrics, logs, traces and events, providing a unified view of your Kubernetes environment. Troubleshooting a pod or other resource? All the critical data for this resource is brought together in a single screen. This combined data eliminates the need for context switching between multiple tools, letting you focus on more critical tasks.


Slack alert


StackState seamlessly integrates its alerts with popular communication tools such as Slack, Microsoft Teams, PagerDuty and OpsGenie, so your team stays updated on critical issues in the channels you already use daily. By streamlining alerts and notifications, StackState enables swift response and efficient collaboration, enhancing your team's ability to remediate incidents.

Applied Best Practices

Our platform incorporates Kubernetes expert practices by providing pre-configured monitors that detect common issues, ensuring compliance and adherence to industry standards.  Combining monitors with our unique troubleshooting intelligence, StackState quickly detects issues that are related and gives advice on how to remediate them. This proactive approach reduces the risk of undetected problems and helps maintain a healthy and robust Kubernetes environment.



Change Tracking and Topology Intelligence

Change tracking and topology intelligence are essential for effective troubleshooting.  StackState excels in tracking all changes and correlating them with topology and relevant metrics. This analysis of topology together with other data, called topology intelligence, allows for a more focused approach to identifying issues. With the typical rapid changes in Kubernetes landscapes, keeping track of changes in an automated way will augment human understanding of what is going on. Topology provides the essential foundation for dependency maps. At the same time, it is the anchor to correlate all data and it forms the foundation for the dynamic dashboards.

Time Travel

Time travel is a powerful capability for troubleshooting. StackState stores both changes and topology over time, giving you the ability to “scroll back through time” and see what your resources and services looked like at any given moment. Time travel lets you compare system states before and after a change, showing you how an issue evolved as well as how issues are related. Insights from time travel are essential for conducting blameless postmortems, enabling teams to understand problems' root causes, refine processes and prevent future incidents.

Time Travel

4T Telemetry

Powerful Data Collection

Observing a comprehensive range of data types is crucial for software engineers to fully understand their Kubernetes environment. StackState’s approach ensures that you can accurately diagnose and remediate issues, resulting in a more stable environment and reducing toil. All data is collected and stored into a central scalable Metrics, Logs, Events and Traces (MELT) store.

  • Metrics – Our tool automatically collects key metrics, retains them for an extended period and leverages PromQL to make metric query writing easy. StackState serves as a scalable and powerful alternative to Prometheus.

  • Events – StackState provides a comprehensive, centralized view of events in your Kubernetes landscape, such as images pulled, containers started or pods terminated due to memory constraints.

  • Logs – Our platform automates log collection, aggregation, analysis and visualization, revealing vital information about resources in your cluster and eliminating the need for time-consuming command line queries.

  • Traces – StackState automatically derives golden signals – error rate, throughput and latency – to monitor applications running on Kubernetes clusters. You can use trace information to detect slow services and optimize customer experience.

All Based on Open Standards

StackState supports open standards such as eBPF, OpenTelemetry and OpenMetrics, ensuring seamless data collection and observability in your Kubernetes environment. With these standards, StackState creates a robust foundation for observability. If required, you can add your own Grafana dashboards on top of StackState to display essential business metrics. Data is collected in several ways: through StackState’s own agent, straight from the major cloud provider or through OpenMetrics. All this data is stored in the scalable MELT store.