Smart IT monitoring and root cause analysis needs big data
Mark Bakker· 4 min read
Our big data engine is ready. We are on track to change the way IT departments work and manage their IT operations.
To understand what happens in an IT stack, we need good in-depth data. And as we all know, we gather this data through all kind of systems. We monitor, measure and analyze how software applications perform, which new deployments we did, the changes we made in our architecture or the issues we have and are trying to solve. All these pieces together are part of a big jigsaw puzzle. And when you add business processes, services and infrastructure components including their dependencies and states, you get what we call Full Stack Chain Monitoring. Having a single real-time unified overview is an interesting approach since it gives insights to devop teams, architects or IT services managers how healthy their (part of) the stack is. And it is a great tool for root cause analysis since it immediately shows where actual failures or services interruptions originate from.
The next step is storing all this knowledge in a big database for IT operation analytics.
Big data for a pro-active approach
But what if we could store all this real-time information as big data and use it to make monitoring and root cause analysis super smart? We could use the live IT stack overview as a time machine to go back in time and see how your infrastructure looked like a month ago. Or discover where, deeply hidden in the stack, 9 hours ago a small change or failure “infected” another component causing a domino failure effect path through the stack finally hitting one of our core services. Or discover abnormalities and be more predictive and repair before critical If services failures occur. A pro-active root cause analysis or self healing mechanisms would be the result.
A huge step forward needed new technology
Making the next step for the StackState concept from just real-time to a combination of real-time and full history, asked for specific big data technology storing and retrieving capabilities which we couldn’t find in the market. Sometimes great ideas need newer or better technologies. The last 6 months we worked hard to create a new big data engine called StackGraph to fulfill our needs. We just embedded it in StackState and we believe that this will change how IT departments will do root cause analysis and manage and control their whole IT stack to improve their service levels.
Open source graph database
Now before you start asking me all kind of technical graph database questions (I am not an engineer), I have good news for you. We are planning to release StackGraph as an open source project to share these great capabilities with the world. So be patient and keep posted.
Mark Bakker· 4 min read