EP #6: How and Why We Built Our Own Time Series Graph Database With Bram Schuur (Stackstate)
Bram Schuur - developer and tech lead at StackState
Bram has been with StackState since almost day one, and during his time at StackState he's been focused on our time series graph database, or StackGraph, as we call it internally. The time series graph database is our proprietary technology that is at the core of our observability solution: it enables our customers to go back in time, to see exactly what their topology looked like - for example before an issue popped up. We're psyched to talk to Bram about this and show you what it means to build a technology like this.
In this episode, Bram and Anthony talk about:
Why Lodewijk and Mark decided to build their own time series graph database
What it means to build your own proprietary technology - instead of using an existing one like Neo4j, for example
What were some of the lessons learned
What Bram is currently working on regarding the time series graph database
You can find a written transcript of the episode below. Enjoy the recording!
Bram: [00:00] Having said that, things that I personally learned as an engineer, and especially working on a technology like this is that correctness is always much more important than performance. Especially when you're talking about databases. This is something that always goes like, "I want it to be fast, but I also want it to be correct." It's very easy as an engineer to reverse those two, and just make it fast and incorrect. That lesson has been learned once or twice the hard way.
Annerieke: [00:34] Hey there and welcome to the StackPod. This is a podcast where we talk about all things related to observability - because that's what we do and that's what we're passionate about - but also what it’s like to work in the ever-changing, dynamic tech industry! So if you are interested in that, you are definitely in the right place.
Annerieke: [00:48] For today’s episode, we invited Bram Schuur. Bram is one of the developers that have been with StackState from almost day one, and he has been working on our time series graph database (or our StackGraph, as we call it internally – you might hear Bram call it that way during this episode as well). ‘So, what is that?’, You may say? Well, the time series graph database is the database that is the foundation for our topology-based observability solution: it helps our customers to go back in time to see exactly what their topology looked like, for example to find out what specific change in the environment caused an incident.
Annerieke: [01:31] So listen to this episode to hear Bram talk about why we decided to build our own database, instead of just using an existing one like Neo4j, for example, and the challenges of building our own proprietary technology – which can take up a lot of time and effort.
By the way: in order to explain why we started building the time series graph database, Bram talks about our very first customer, but we cannot mention their name. So we had to bleep that out, just so you know.
Well, without further ado, let’s get into it.
Anthony: [02:00] Hi, everybody, Anthony Evans here from StackState, and welcome back to the Stackpod today. I'm joined by Bram Schuur. Sorry if I got the last name pronunciation wrong there. But Bram is here today. He's a StackState employee, but he's one of the early StackState employees, and can talk all about the beginnings, how StackState started, why we made decisions, what are some of the lessons learned.
Anthony: [02:33] But we're going to keep it casual today and have a conversation, as Bram is a great technologist, and a technology leader here at StackState. But Bram, why don't you introduce yourself and a bit of your background, and where you're located, and that kind of fun stuff?
Bram: [02:50] Right, yeah. I'm Bram Schuur, that's how you pronounce that in Dutch.
Anthony: [02:54] Oh, how could I?
Bram: [02:57] It's the Dutch G, right?
Anthony: [02:58] Yeah.
Bram: [02:58] That's the very famous Dutch G. I live close to Utrecht, and I've been with StackState for five years. Not all the way at the beginning, but early enough to have been with the company on a big part of the journey so far. I'm super passionate about technology, and about just coding. I would call myself a grassroots coder. And in that sense, it's so great to work at StackState. And as hobbies, to disconnect from the machine, I do woodworks, a bit of playing the piano, and I like cycling.
Anthony: [03:48] That's cool.
Bram: [03:49] Like a proper Dutch guy.
Anthony: [03:52] That's cool. That's cool. Where did you work prior to StackState?
Bram: [03:55] I worked at a company that makes, or that made, the company doesn't exist anymore, but makes a C compiler that tried to find errors in C code that others wrote.
Anthony: [04:12] Okay.
Bram: [04:14]: So, compiler technology.
Anthony: [04:15] Okay, that's cool. That's cool. That's true developer-type technology, right?
Bram: [04:21] Yeah, it's IT for IT. You have to talk tech stack, and then you go back a couple of steps, and then you get to those guys.
Anthony: [04:27] Yeah, yeah. I'll tell you, that's a tough sell. All you have to do, you kind of have to hope that the developers like you at all. I saw the other week that, I think it was in GitHub now, they've got the AI that can auto-correct your code as you go through it. And there's nothing that developers love more than to dismantle somebody else's creation.
Bram: [04:54] Exactly.
Anthony: [04:55] And so, crap all over it.
Bram: [04:59] That's what we tried to do there automatically.
Anthony: [05:02] Yeah. Awesome. Awesome. So, you're based out of the Netherlands, you've been with StackState for five years.
Bram: [05:11] Yeah.
Anthony: [05:12] So that is very early days. Outside of the founders, that's a very interesting story, working for a small company and growing out a technology from scratch, right? Because our technology, although it's got some components that may be open source or from third parties, it is effectively our intellectual property. That graph database, the StackGraph, that we call it. So, to have to build all that from scratch is quite an undertaking. It's not something that would be taken lightly. But one of the things that I wanted to ask you is, what was the genesis for the idea? How did that get started, and what was the first customer that really needed it, and why did they need it?
Bram: [06:05] Well, I wasn't there at day zero or day one.
Anthony: [06:10] Yeah.
Bram: [06:12] That's Lodewijk and Mark. The first customer was ***. And well, I think Lodewijk and Mark must've talked about this in another part as well. But the short of it is that that company has many metrics, and, well, and wanted to just figure out what was happening when and where. And had just way too much data to fit in one engineer's or manager's head. And Mark and Lodewijk came up with the idea to organize that with topology, to basically model that with the topology.
Bram: [06:52] Well, that's step one. Step two then is to say, "Okay, I have all these metrics that come from an IT landscape, and that tell me something about what's going on now, but also what went on in the past," which helps tremendously when you're troubleshooting issues. And they said, "Well, if you're going to organize it as topology, I also want to be able to see the topology when I go back."
Bram: [07:29] That is the moment, now you're talking about Stack Graph, that's the moment really also Stack Graph was born, basically. So yeah, if you're talking about Stack Graph, our database technology that we built ourselves, it really comes from that use case. So, it's really driven by the use case that we have. That's how I would put it.
Anthony: [07:52] And that's a critical component, right? And that's a critical part of our technology. It's not just the single pane of glass database. Because anybody can do that, really. You got all these other tools that give you a single pane of glass, as they like to put it.
Bram: [08:15] For sure, yeah.
Anthony: [08:16] And quote observability. But, why was the Time Travel so important? Because that is really important, right?
Bram: [08:25] It was one of the features that Mark and Lodewijk envisioned at the start of the company. They kind of set the wheels in motion. They started the journey with ***. But then, basically, they were starting at square one. That meant that although they had this vision for getting the Time Travel in at the moment I joined the company, that actually we didn't have Time Travel.
Anthony: [08:54] Okay.
Bram: [08:55] Time Travel, or yeah, or a Time Series graph database as we call it now. But we called it Time Travel back then, because it sounds super cool. But, the idea was already in there. It was kind of, well you could say, almost already done, but just not activated yet. But, going further with these customers that we were attracting at the time, and asking their needs, at some point we learned that, yeah, this idea that Mark and Lodewijk initially had, it is something we really need to bring to fruition. That's what happened in the years after that, so the first two years after I joined. So yeah, that's how that went. So, it's also been a journey from kind of going from a vision to actually realizing that in a startup setting, and taking that step-by-step.
Anthony: [10:05] Yeah. Okay. Okay. Okay. So, going through that journey, and going through the creation of the Stack Graph, and exposing the new features, what were some of the lessons learned along the way in terms of mistakes maybe we made, or avenues that we shouldn't have taken in hindsight? What were some of those stories and those anecdotal components?
Bram: [10:36] Yeah, yeah, yeah. Well, and I must say that of course we learned lessons. We didn't learn any, I must say, and a very really hard lessons. Because ultimately, I think, the database is today’s success story. So, the hardest lesson like, "Hey, this doesn't work, or this doesn't..." That lesson, it's been very successful so far.
Bram: [10:59] Having said that, things that I personally learned as an engineer, and especially working on a technology like this is that correctness is always much more important than performance. Especially when you're talking about databases. This is something that always goes like, "I want it to be fast, but I also want it to be correct." It's very easy as an engineer to reverse those two, and just make it fast and incorrect. That lesson has been learned once or twice the hard way. Yeah.
Bram: [11:34] And what I also learned, but it's also kind of more a success story, is that to do such an undertaking... Because it is a fully-fledged database technology that we're building. So, to carry that, and to make it go to fruition, it means that the value that it brings has to be real. Otherwise, you don't carry it seven years and make it into the world, basically. That's also what I learned. In this case, we make some proprietary technology.
Bram: [12:18] It's only possible to do that when the thing that's being built is really unique, and really brings value. Otherwise, at some point, somebody's just going to say, "It's so nice that you all made this database here, but it doesn't help anybody." That's been very, very important in that...
Anthony: [12:40] Yeah. I think that's still part of our journey, right? So, ultimately what StackState is, we talk about observability and Time Series. But ultimately, we're a database company. That's the thing that is our secret sauce, we've created a database that is effectively a homegrown graph database in a Time Series, that allows you to query basically versions of the database as it looked like seven days ago or a day ago. And so, obviously from an IT perspective, that works great as basically a CMDB on steroids.
Bram: [13:32] For sure. Yeah.
Anthony: [13:32] Because we can really easily detect abnormalities. We're uniting not just the topology, it's the ability to unite the topology, the telemetry, the events, and the tracing data over that axis of time that makes us a very compelling solution from an observability standpoint. But again, it's all built on the database, and then the product team built the use cases, so that they can be consumed by the customers.
Bram: [13:58] Exactly.
Anthony: [13:59] As opposed to them just buying a database, and using it as yet another database as an alternative for Neo4j, for example.
Bram: [14:07] Exactly.
Anthony: [14:08] And I think we're still going to keep growing in that area. Because from what I understand, a lot of our core capabilities from just simply the database standpoint already in existence, we are successfully scaling out the database. Obviously, scalability is always a big thing. As more people consume the product, more people are going to want to push it to the limits.
Anthony: [14:36] And we want that, so that when we onboard another big customer, it's not so daunting for us. We've got several streaming clients, for example, who need us to help run their video streaming services, and that's where the Time Series graph database comes into play.
Anthony: [14:53] However, I did want to drill in a little bit. Because it's not just a database. So in order for us to actually have a Time Series graph database, it's not just a case of taking Neo4j and then just adding a bunch of stuff on top of it. We have actually built everything, or quite a few components ourselves. But then we also, like I said earlier, we do need several components on top of it to help scale it out, so it's a packaged solution.
Anthony: [15:24] If somebody was going out today and wanting to build one of these technologies from scratch, what would be... Could you shed a little light on the complexities around the solution, and the nuances in terms of different programming languages, different products, what we've built ourselves?
Bram: [15:43] Yeah, sure, sure. The plain spec for what it's built with, it's a graph database built in Java. So, that's already a starter. To make a database storage layer is something that's there. What comes also with the database is transactions. Some databases have more transactionality. Transactionality, being able to either successfully write data or fail if something happens.
Anthony: [16:29] Select, insert, update, all that kind of stuff, right?
Bram: [16:33] Yeah. To have some kind of consistency. And that all plays into, also goes with don't corrupt the data. That's the same as the storage layer. We made the choice to take an off-the-shelf storage layer and basically make the graph database on top of that. So, we took HBase for storage, which really kick-started the project. It's by no means, with just HBase do you already have a graph database, but that's one thing less to worry about, and also gives us a scalability model.
Bram: [17:09] And after that, it's thinking about how do we then scale the database layer that we put on top of that? How do we model the data that we put on top of that such that it scales? That's all the kind of knowledge that's encoded in the database technology itself. And last, are less... how do I say, less sexy topics, but very necessary topics that the database needs to be able to do is things like caching, to optimize hot paths.
Bram: [17:54] And especially to the nature of StackGraph, so it scales out, caching so you can basically have multiple clients on multiple machines writing the same data, as such that they all see the same database actually. And that makes caching a very hard problem. But a necessary one to solve, and one that was, well, I can say very fun to solve, but also very challenging to solve within the company, one of the things I did last couple of years.
Anthony: [18:27] Without good data or accurate data, it's useless.
Bram: [18:32] Yep, for sure.
Anthony: [18:34] An algorithm is only as good as the data that it runs on, right? And that's incredibly important.
Bram: [18:40] Yeah. And there's even also a real-time component to it. Because people want to be notified of changes as soon as possible. If something goes wrong, as soon as possible. So, that's something from a scalability and a performance perspective that I'm also working on a lot. How can we get the latency down? If something goes in, we want to see it as soon as possible. But we also want to keep the history on the same thing. So yeah, all those things are important. Yeah.
Anthony: [19:08] Yeah. Because I think that's one of the... Well, like I say, scalability is an ever-growing challenge. To quote a famous film, there's always a bigger fish. Star Wars, right? But once you crack it with one customer, then you've got another customer is like, "Oh, you know what, I not only want to integrate Dynatrace and your agent, I also want your topology synchronizer to merge them so that I don't have duplicate components in my database."
Anthony: [19:41] So, then you've got another layer of complexity, and the fact that you want to make sure that the scripting layer then can de-duplicate everything so that it only does an insert or an update when it needs to, as opposed to rudimentarily just going in and updating everything. It can get very complex very easily.
Bram: [19:59] For sure.
Anthony: [20:01] And it'll only get bigger over time, right?
Bram: [20:05] Yep.
Anthony: [20:07] Cool.
Bram: [20:07] For sure. For sure, yeah.
Anthony: [20:09] Cool. So, are there any projects you're doing outside of work? You're a technologist, and you said you play the piano, and you got a family and stuff. What keeps you going outside of work? Are you working on anything? Are you trying to get through a TV show?
Bram: [20:28] The thing was also, of course, due to the pandemic, things are... But one thing I really like as a project this year I did was making a mini library. My wife, she's in books, and together we had some old wood stashed away in the attic. We just made a small mini library. The idea is that people can put a book there, take a book out. In the city I live in, it's already an established concept, and you seeing these things popping up everywhere. It's actually already a hit. So, that was a nice project we concluded there.
Anthony: [21:12] My wife did the exact same thing.
Bram: [21:14] Oh, really? Oh, that's awesome.
Anthony: [21:15] Yeah. I'm just looking to see if I've got a picture of it somewhere. But she basically got a little London telephone box.
Bram: [21:24] Oh, that's awesome.
Anthony: [21:26] A little one that hangs outside our house. And yeah, it's a pick one up, drop one off kind of thing.
Bram: [21:31] Yeah.
Anthony: [21:33] It's a Brooklyn thing as well. There's a bunch of houses, and now my daughter is like, "Oh, let's look at this one down here." They also have community gardens as well, so we've got a lot of trees. And then obviously at the bottom they've got a little dirt patch. So, people plant vegetables and stuff.
Bram: [21:55] Oh, cool.
Anthony: [21:56] So it's really nice, just walking around. It adds to the community. It just makes it nicer, if you're doing things for people.
Bram: [22:07] Yeah. Yeah.
Anthony: [22:09] That's really cool. That's really cool. But yeah, no, Bram, I'm really thankful for you taking the time out of your busy schedule to do this. And I hope you had fun.
Bram [22:18] Yeah, for sure. Well, thanks for having me. I just enjoy talking about this, because I love technology. This is super cool technology. So, you facilitating this and bringing it out, super cool.
Anthony [22:30] Yeah. We've been talking for half an hour now, so...
Bram: [22:35] Yeah.
Anthony: [22:38] So, yeah. But time flies. But no, no, thanks again for doing this, and enjoy the rest of your day. We'll let you know when this comes out.
Bram: [22:50] Yep. Cool.
Anthony: [22:52] Cool. Thanks, Bram.
Bram: [22:54] Thank you.
Annerieke: [22:55] Thank you so much for listening. We hope you enjoyed it. If you'd like more information about StackState, you can visit stackstate.com and you can also a find a written transcript of this episode on our website. So if you prefer to read through what they've said, definitely head over there and also make sure to subscribe if you'd like to receive a notification whenever we launch a new episode. So, until next time...
StackState’s observability platform is built for the fast-changing container-based world. It is built on top of a one-of-a-kind “time-traveling topology” capability that tracks all dependencies, component lifecycles, and configuration changes in your environments over time. Our powerful 4T data model connects Topology with Telemetry and Traces across Time. If something happens, you can "rewind the movie” of your environment to see exactly what changed in your stack and what effects it has on downstream components.