Permanent vs. Ephemeral Kubernetes Environments: A Comprehensive Guide

We all have at least one permanent environment - Production. Beyond that, we need somewhere to test changes and future releases to ensure that all the pieces put together work as expected. This is why we have test(s) environment(s). The intuitive solution is simply making copies of our production environment; that way, we create “permanent” test environments.

This idea seems reasonable and has been widely adopted. However, there are several drawbacks to it. For one thing, test environments are not required 24/7 and thus represent a waste. Plus, when several teams deploy their new application to the same environment, we introduce interdependencies - such as bugs slowing down teams one another. For the latter, the problem may be solved by having multiple permanent test environments (stage, pre-prod, UAT, feature, demo, sandbox). However, this increases the cost and maintenance efforts every time we add a new environment, slowing down the deployment rate along the way.

Permanent environments are not the only solution; all our test environments could be ephemeral, staying up and running just for the time we need them. With Kubernetes and modern Infrastructure-as-code tooling creating an environment on demand has never been that simple. For more insights on how Kubernetes can facilitate such environments, you can check out Stakater's Kubernetes Consultancy services.

In this guide, we will explore the differences between permanent and ephemeral environments, evaluating the pros and cons of each. Assuming we all start the journey with permanent environments, we will look at what is required to move to ephemeral environments regarding tooling. The considerations between permanent and ephemeral are not bound to Kubernetes. Still, as we will see in the final part of the article, Kubernetes is an enabler in mastering ephemeral environments.

When/Why do we need development environments?

Several types of environments are typically used in the software development life cycle (SDLC). To better understand the difference between permanent and ephemeral environments, let's first categorize the types of environments we may have or need. The definition is pretty loose and may vary from company to company, but here is a general picture of the most common cases:

Pre-production or staging: Test releases before they are sent to production.
User Acceptance Testing (UAT): Dedicated to QA or stakeholders so they can approve releases before shipping the release to production. Similar to stage or pre-production, but some companies like adding an extra UAT environment.
Feature or demo environments: Used to test new features or demonstrate the product to stakeholders (Sprint demo, partners). They are often used to showcase specific functionality or to provide a space for experimentation without affecting the production environment.
Development or testing environments: Used by developers to quickly test their work before merging their pull request or promoting their code to another environment.

All environments aim to test new code or features before deploying them to production. They allow teams to work on their code without affecting the production environment. This helps to catch bugs and errors early in the development process, reducing the risk of a bad release.

The critical question here is: why do we need a different type of environment?

First, environments act as a series of nets that catch different types of bugs by looking each time at a different aspect of the product (applications, components, integrations, etc.). Second, more environments ensure better stability to ease testing for late-stage environments. For instance, the development environment may have dozens or hundreds of releases daily due to developers working on their applications concurrently. While staging or production will only receive release candidates once or twice a day, depending on our company's QA process and deployment rate.

Permanent Environments: What’s wrong with it?

We strive to create the optimal workflow to deliver software. A tempting solution is to identify the different steps of the software delivery lifecycle and provide a dedicated environment for each step. Having multiple environments should allow teams to work in parallel, thus having multiple efforts progressing through the chain of environments.

How a typical permanent environment flow works — A typica permanent environment flow

But this is a fallacy; mapping the development stage to the environment leads to dependencies between teams that ultimately create bottlenecks in your process, turning it into something more Waterfall than Agile.

If you think about it, not all environments are always required.

Pre-production or staging wastes money because we only need them for an hour before release; features should be delivered Just-In-Time to a production-like environment if we want to be able to have small releases, increase the deployment frequency, and reduce the risk of a faulty release.
Feature/demo environments: We may need to make a demo to stakeholders, but we don’t need a permanent environment for that. This environment should exist only for as long as it is required.
Development environment: These aberrations where all developers deploy their work-in-progress applications to the same environment. Our applications should not be tested against an unstable version of their dependent applications. Instead, developers should work in isolation to ship faster to production. The development environment should be a "fork" of a production environment, and the new code should be tested in that environment.

Not to mention, having a permanent environment is concerning with all our efforts into Infrastructure-as-code (IaC), GitOps, etc. This should be granted; otherwise, why invest in all those tooling and best practices if we can not guarantee that we can rebuild our infrastructure in no time?

Ephemeral Environments: The Way Forward for our Infrastructure?

The capability of having on-demand environments is what ephemeral environments are all about. Developers or testers create environments to reproduce issues, validate releases or test specific scenarios. When working with ephemeral environments, we mainly use:

Feature Branch Environments: Each branch can have its environment for teams that use feature branching, allowing developers to test their code in isolation.
Pull Request Environments: Similar to feature branch environments, pull request environments can be created automatically when a new pull request is opened or ready to merge, allowing the reviewer to test and approve the changes before merging.
Integration Testing Environments: These are created automatically when a new version of a microservice is ready to be deployed, allowing for integration testing with other microservices.

When we think about it, an ephemeral environment simplifies many used cases, such as Load Testing Environments. It can be spun up on demand to simulate traffic and test the scalability of our application. Load tests are sometimes disregarded due to the difficulty of creating isolated environments. Nothing prevents us from spinning up a demo/preview environment for stakeholders to experiment with new features.

Having ephemeral environments is a sign of Operational Excellence. Ephemeral environments require proper automation that works consistently. DevOps and Platform engineers should focus on infrastructure-as-code that is reproducible. Ensuring that rerunning our code will create the same result is crucial. Unfortunately, many companies have nice GitOps and Terraform repositories that would never work to create a new copy of an environment. They live in the dream that if they use those tools, they have that ability, but they never test it, and their dream will never become true.

In general, ephemeral environments can be less expensive to maintain than permanent environments because they are only spun up when needed and then torn down. This means that we only pay for the computing resources we use. However, setting up and managing an ephemeral environment can require more upfront work and technical expertise, increasing the cost of time and resources.

Kubernetes and Ephemeral Environments: How to get started

One way to create ephemeral environments is by isolating each environment in a separate Kubernetes namespace. It simplifies resource management and access control, as each environment can have its own set of resources and permissions.

To simplify ephemeral creating, adopting a GitOps approach is recommended. GitOps uses version control systems like Git to manage infrastructure and application deployment. In GitOps, all infrastructure and application configurations are stored as code in a Git repository. Creating a new environment should be as simple as duplicating Kubernetes manifests of an existing environment and replacing some values. Typically Kustomize or Helm can help by providing a configuration mechanism for each environment. In the case of Kustomize you can override existing values to make each ephemeral environment unique. With Helm using values files allows you to provide different parameters to each environment.

With Kubernetes and GitOps, you are very close to automating ephemeral environment creation. However, we need a way to provision share resources our application needs, such as databases, caches, buckets, and other cloud configurations. This is where the design of your platform comes into play; it should not matter to the developer how this is achieved (applications do not need to know where they run; they should expect environment variables to tell them what to do - where is the database, how to connect, how to call other services, etc.

Several options to provide the missing infrastructure provisioning.

The simplest one would be to use Terraform. With Terraform, we could create modules that provide all the missing pieces. Variable definitions (.tfvars) files can provide the mechanism to create ephemeral environments.
We can also Go 100% Kubernetes using a Kubernetes Operator that leverages Kubernetes CRD to define cloud resources provisioning. This comes with the advantage that all configurations are in the same place and format and applied using our GitOps tooling. Solutions like CrossPlane are increasingly popular for offering that capability.
We can also use shared resources to reduce the cost of resources like databases further. In that case, creating pipelines to create database access and permission would be advised. You can continue reading our blog to learn more about the best practices for maintaining and managing a Kubernetes test environment.

How an environment architecture can look like for Kubernetes — Environment architecture for Kubernetes

Overall the capability to create an ephemeral environment is not far fetch dream after adopting Kubernetes. Nothing that a team of experienced DevOps or Platform engineers should be afraid of.

Choosing the Right Environment for Your Needs

While a permanent environment may seem more straightforward to implement and maintain, it has significant drawbacks, such as increased costs and team interdependencies. Ephemeral environments may require more upfront work and technical expertise, but they can ultimately be less expensive and, most importantly, enable faster delivery.

It's essential to carefully consider the needs of your organization and the technical requirements before deciding which to use. For instance, consider the adoption of Kubernetes with a GitOps approach. GitOps declarative pattern makes it much easier to duplicate existing environments. Unfortunately, deploying applications to Kubernetes is insufficient; you must consider infrastructure resources such as databases, caches, events, and cloud configuration.

In sum, Ephemeral environments are the logical choice, but they will require an upfront investment in creating the provisioning process. Only once that gap is closed can we truly abandon waterfall delivery and adopt a genuine Agile approach.

Stakater Blog

Follow our blog for the latest updates in the world of DevSecOps, Cloud and Kubernetes