Are all those IT Ops tools driving you crazy?

To answer that question, you’ll have to answer the next ones:

1) “Which systems do I have?”

2) “Which business processes do I have?”

3) “Which systems do I need to get a complete overview?”

So many questions! In this post, I’ll help you answer all of them, beginning with an overview of the different systems and how they are used. First of all we have a lot of different types of systems, some examples:

Real-time systems:

Customer facing applications
CRM systems

Batch processing systems:

Interest calculations
Billing jobs

All of these systems run on some kind of hardware, either on-premise or in the cloud.

An abstraction, like virtualization or containers, may top the hardware.

Server layer

To ensure the correct behavior and utilize the hardware/cloud in the best way for your company you need a monitoring tool. Some examples of monitoring tools include Consul, Prometheus, VMWare vSphere or your favorite cloud provider’s solution. You can also use a provisioning tool, like Chef or Puppet, or a container management tool like Kubernetes or Docker Swarm.

To monitor this layer, you need an overview of all systems and their dependencies (found in the provisioning, container management systems). You also need to know the runtime state versus the normal runtime state.

Middleware layer

In some cases, middleware is used to run applications on top of this server layer. Middleware includes application servers, ESB systems, queues, etc. These all have their own kind of management and monitoring systems.

Lots of tools

The applications that support all the business processes are run on top of the middleware. Provisioning or deployment systems can be used to manage these. There are different options to instrument and monitor these applications and all dependent systems, like application performance management (APM) tools. Also, metric stores, like Graphite or Prometheus, and log aggregation systems like Splunk or Elasticsearch can be used. Systems such as Google analytics are used to track customer behavior. We also see a lot of scripts used to test if applications are operating in a normal state.

What will be the impact?

So, control is clearly a complex issue for any company and requires many tools. Even once you identify the types of systems used to manage and monitor other systems, you still can’t be sure if you’re managing and monitoring all parts. What will be the impact if something breaks or needs to be upgraded?

IT operations platform

At StackState, we’re creating an IT operations platform to manage and monitor the complete state of all these different systems and the relationships between all the different parts in an IT stack. With StackState you're able to consolidate data from your current IT Ops tools into a single graphical view. This platform will help you run a root-cause analysis quickly and get a clear view on the impact of any changes before you apply them. Automating the analytics is the key to gaining full control, and our approach uses data science. We’ll cover this subject in more depth in a future post.

You can expect more posts about some of the systems mentioned here in the coming weeks. Plus, we’ll explain how we use those systems in combination with StackState to give you a total overview of your entire IT stack. So, if you want to know if you’re in control, stay tuned and subscribe for more updates.

Related resources

Mastering Node Affinity in Kubernetes

SIGKILL vs SIGTERM: A Developer's Guide to Process Termination

Understanding and Troubleshooting Out of Memory Error Code 137