4 Pillars of the “Infrastructure from Code”

Asher Sterkin
8 min readNov 25, 2022

Part One: Types of Communication with Cloud Services

In this series, I’m trying to answer the question: “Precisely what kind of problems is IfC trying to solve?”

Acknowledgements

Many thanks to Shruti Kukkar for the valuable feedback on the early draft of this paper.

Introduction

In my previous article, I argued that the “Infrastructure from Code” (IfC) approach is the next logical step in Cloud Infrastructure Automation, focusing more on historical background and the current technology landscape. In this series, I’m trying to answer the question: “Precisely what kind of problems is IfC trying to solve?”

A short answer would be: “Automatic conversion of cloud-neutral application code into a coordinated set of cloud-specific assets using operational feedback as an input wherever appropriate”.

While this answer does capture the essence of the IfC process, it will unlikely be understood by the uninitiated. In order to understand what this means, we need to dig deeper and analyze four major types of interaction with cloud services at different levels of cloud API abstraction. At the moment, all of these four types of cloud service interactions are disconnected from each other and are at different levels of API abstractions, supplied by different vendors which makes the matters worse.

Therefore, the IfC mission is to bring all four types of cloud service interactions into a coherent set directly derived from the application code and to hide, as much as possible, intricate differences between various levels of API abstractions and vendors.

This, in fact, assumes a detailed analysis of a 4x4 two-dimensional structure illustrated below:

Fig 1: 4 Pillars of Infrastructure From Code

Analysis of a complex multidimensional structure in a text, which assumes sequential reading, is always a challenge: whatever axis one starts from, there will be a need for two others in order to get a complete picture. While I did choose a particular order, namely types of interactions, and then services, vendors, levels of abstraction, and deployment locations, such order is by no means an ultimate one, so feel free to read the forthcoming chapters in whatever order you find convenient the most.

Finding a proper balance between high-level abstract descriptions and concrete examples is another challenge. Most people have a problem with an abstract description and will not get it until a concrete example is presented. On the other hand, bringing too many detailed examples inflates the text size significantly. Too specific examples would make the whole discussion less general, and too abstract examples would not be clear enough.

Also, providing piecemeal examples here and there alongside of more abstract concepts discussion does not help with getting a whole picture. On the other hand, finding a one coherent example which covers all aspects is close to impossible. In this article, I will use small focused examples here with full appreciation of how imperfect such approach is. Walkthrough one coherent case study might be a subject for a future series of publications.

One may wonder if such a comprehensive description is truly required. In my opinion, the answer is “yes”: we take too many things for granted without clarifying underlying assumptions with different parties having different interpretations of seemingly the same concept.

This is the first part of a three-part series organized as follows:

  1. Part One: Types of Communication with Cloud Services
  2. Part Two: Types of Services, Vendors, and APIs
  3. Part Three: Deployment Locations, IfC Mission Elaborated

Types of Interactions with Cloud Service

These are four types of interactions with cloud services, which I decided to call “Four Pillars of IfC”, namely:

  • Acquisition
  • Configuration
  • Consumption
  • Operation

Let’s briefly look at each one.

Cloud Resources Acquisition

Every cloud resources, be it Virtual Machine, Database, or Network Load Balancer, needs to be somehow acquired, configured and, when not needed any more, released. This is the core essence of cloud resource elasticity: one does not pay for one does not use. Because of the cost, security and reliability considerations, cloud resources life cycle management is traditionally considered as belonging to the toughest population of a typical organization: system administrators, sometimes disgusted as so-called DevOps or Site Reliability engineers. This always leads infra and application code to be disconnected (more about this later) and creates a tension within organization. As one wit once educated me: “Good sysadmin hates users”.

Cloud Resources Configuration

Depending on the type of to be acquired cloud resource, some preparation or post-processing might be required. The easiest to understand example would be a Virtual Machine Instance that comes from a particular image and might need installation and configuration of additional software packages.

Some virtual machines would be ok to start with a standard image offered by the cloud platform provider, but some will need to be custom-made. The same happens with Docker images and even databases.

Indeed, once a Database Cluster is created, somebody needs to define a schema for it. Some databases, like AWS Athena, could have a schema associated with as a part of acquisition template, some others, like AWS Aurora Serverless would require a separate script to be invoked as a post-processing step.

Software installation landscape is highly fragmented and, in general case, very complex and confusing. I gave a high-level overview of what happens in this area elsewhere in another article.

Database schema definition landscape, is also complex and confusing. Depending on programming language and personal preferences, one could use either plain Data Definition Language, Object Relational Mapper for selected programming language or more advanced Schema Versioning scripts in order to define initial version of the database and to control its schema evolution.

The biggest challenge of this type of interaction is that if some installation or configuration is not natively supported by the cloud resource acquisition template, it’s normally not easy to smoothly integrate pre- and post-processing steps with them: the final solution has good chances to be clunky, fragile and to fail in all kinds of arcane edge cases. Doable, yet inconvenient.

Cloud Resources Consumption

Once cloud resources are acquired, configured and, optionally post-allocation processed, one may start using them: to invoke cloud functions via HTTP API, to store new data in a database or cloud storage, to retrieve or delete previously stored data, to send a message to some channel, etc., etc.

It sounds obvious, but without close coordination with two previous steps this one will not work: if database cluster is not created or its schema is not properly defined, there will be not possible neither store new records, nor query existing ones.

Not only this, but if a computation unit (vm instance, container or cloud function) is not provided with adequate permissions at stage one (and sometimes two), it will fail with something like “Access Denied” error message. Today, without some form of an IfC solution, this coordination is performed manually.

To ensure cloud neutrality of the application code we need to formulate the first

IfC Rule #1: Always wrap cloud resources consumption with a standard interface; if such interface does not exist create a new one and contribute to Open Source.

Even high-level SDKs providing access to cloud resources are very special for particular cloud platform and might be quite complicated and no intuitive to use. For example, AWS RDS Data SDK for Python supports submitting a plain SQL query to a database. While this SDK provides presumably the most optimal, security and cost-wise, option it’s not a standard one. On the other hand, we might consider using pymysql connection directly while giving up on the native SDK integration. From the IfC perspective, the “right” solution would be to define a db_connection Python Protocol that reflects the Python DB API specification, to contribute it to Open Source (it’s strange that the Python standard library does not contain one) and ensure that the application code uses only this protocol while providing automatic implementation on the top of AWS RDS Data SDK for Python behind the scenes.

Access to many cloud resources could be very efficiently wrapped with Python collection abstract interfaces such as Mapping. This by itself is a large topic that deserves a separate publication.

Cloud Resources Operation

Once deployed to production, the application code and allocated cloud resources need to be operated: log records need to be generated and, optionally, transferred to a central place for further analysis, metrics need to be calculated, trends and forecasts need to be derived and compared with targets and baselines, alerts need to be raised and properly disseminated, if something goes or could go wrong, snapshots of durable data need to be created and kept properly, disaster recovery and failover procedures need to be applied when the system, Heavens forbid, crashes, tickets for detected errors need to be generated and sent to the R&D team, and, last but not the least, resource configurations need to be adjusted based on the operational feedback collected so far.

As we can see, there is a lot of what is going on in production, and, as with the previous types of activities, without a proper IfC, coordination of cloud resource life cycle management, pre- and post-configuration, and consumption with operation is done manually. The larger the system to be developed and operated, the harder it will be to keep the whole picture in the head of even the most gifted person. As time goes by and the system grows, more and more subtle details will be falling between the cracks.

References

Publications

Here is the list of all IfC publications I’m aware about, including my own. If there is anything else not included here, drop me a line.

Products

Here is the list of pure IfC products, I’m aware about, including my own CAIOS. If there is anything else not included here, drop me a line.

About

The author, Asher Sterkin, is an SVP Engineering and GM at BST LABS. BST LABS is breaking the cloud barrier — making it easier for organizations to realize the full potential of cloud computing through a range of open-source and commercial offerings. We are best known for CAIOS, the Cloud AI Operating System, a development platform featuring Infrastructure-from-Code technology. BST LABS is a software engineering unit of BlackSwan Technologies.

--

--

Asher Sterkin

Software technologist/architect; connecting dots across multiple disciplines; C-level mentoring