(Photo is licensed under CC BY 2.0, and you can find it on Flickr)
When I first got interested in infrastructure, I worked at a company that deployed applications directly onto infrastructure in physical data centers. They were enormously expensive, time-consuming to maintain, and constrained by the lead time on new hardware procurement. The advent of cloud platforms and infrastructure as code gives us remarkable new capacities - we can now change a few variables in the configuration to increase or decrease capacity, deploy new services, and deploy new replicas of the infrastructure in multiple locations for faster disaster recovery.
The infrastructure as code ecosystem has lowered the barrier to entry in infrastructure management. As a community, we’ve created a space where tools are evolving at an incredible pace to be able to fulfil the needs of your business. You can even order a dominos pizza using infrastructure as code. But it hasn’t all been smooth sailing. For me, infrastructure has brought 3 major problems.
Papercuts. As our infrastructure and applications become more complex, so is the code and tooling needed to drive it. Today, teams need to stitch together many tools to be able to deliver their systems. For every tool introduced, the team has a “200% problem,” which means they need to understand the behavior of the tool and how that relates to the behavior of the system, which doubles their knowledge burden. Even worse, when cloud vendors release updates, users are kept waiting while the intermediate tooling layer catches up and includes these updates in their release.
Collaboration. The primary problem DevOps wanted to solve was to help drive collaboration across teams, but infrastructure as code isn’t conducive to collaboration. Today, the closest we come to collaboration is a series of code reviews or pull-requests. And don’t get me wrong - pull-requests are a far better system than when we used to create tickets in our bug tracker and pass them between departments, but it’s still as difficult to convey context as it was in the ticket-based workflow. Pull requests can require specific knowledge of tooling or languages to review, which makes it difficult or slow to work together and get through our pipeline.
Source of truth. In the infrastructure as code world, the source of truth is the code, but there’s a problem: the code we’re working on may not reflect what is in production right now. If the code is out of date it can be risky to deploy our changes, and that feeling of risk - which can become an undercurrent to all of our work - has a chilling effect on our entire workflow and decreases the frequency with which we are inclined to ship user value.
These problems - the papercuts in our daily work, the lack of collaboration, and the risks inherent in branching code - can be extremely frustrating and slow down the delivery pipeline for our systems. Slower delivery pipelines mean we are slow to deliver value for our users.
To solve these problems, we fundamentally need to redesign what we are doing. Another cycle of infrastructure as code tooling won’t solve them. After all, incrementalism is what got us into this mess. When I first got introduced to System Initiative, I could immediately see the path they were taking was new and would solve the problems I had experienced. Here was a company that looked at our ecosystem in a different light and got deeply ambitious about overhauling how things were done to achieve better outcomes. From where I’m sitting, System Initiative brings 3 major things to the table:
Facilitates collaboration. The multi-player aspect of System Initiative means my team can work together on building simulations to solve the problems in the past. There is no specialist programming language or DSL needed to understand the simulations that are being built. Collaboration is real-time.
Understanding the impact of my changes. There are 2 kinds of impacts we can understand with System Initiative:
How does my change affect the rest of the components in the simulation? System Initiative is a hypergraph of connected functions. Therefore, when a change is made, the dependent values of the graph are dynamically updated by invoking the tree of functions, which means that the dependent parts of the model are automatically updated.
How does this affect the outside world? Each change within a simulation is tested against the targeted infrastructure to deliver information about the delta between the model and the real world. Then, the user can make decisions about how to bridge the gap.
I am so excited to be working on an ambitious technology that rethinks the very foundations of DevOps, I’m hopeful that we’re about to see a renaissance in the thinking about what’s possible in our field, and I can’t wait to see how these technologies enliven and empower the entire developer ecosystem.
Paul is an engineer who is passionate about the Continuous Delivery and DevOps movements and how they are critical in helping the business delivery value its customers.