Manipulating Terraform states for fun, profit, and reusability

danw1979 · on Aug 22, 2023

> You could also try using the count and for_each meta-arguments, but there's only so much you can do until complexity hits you in the head. And let's be honest, if something is called a "meta-argument", it's probably not a good idea.

Let’s actually be honest: here’s the workable solution -built into the language- to the problem you’re describing and you’re waving it away to justify whatever hack it is we’re about to be presented with.

danw1979 · on Aug 22, 2023

It gets worse…

> Besides the practical advantages of not having to modify .tf files every time someone needs a new bucket, this approach also completely detaches the concept of a desired state with the concept of an actual state.

Yeah, who wanted their infrastructure in version control anyway.

But seriously, if you want a new bucket without _all the hassle_ of using Terraform, just use the CDK or maybe even a CLI one-liner ?

OJFord · on Aug 22, 2023

The built-in way to do almost exactly, afaict, what's presented here is workspaces. Config once, multiple states for multiple environments. (Shared config via outputs from remote state in another project/stack, it is conceptually separate after all.)

benlivengood · on Aug 23, 2023

Tools like Atlantis understand workspaces and can apply changes across multiple environments as part of automated infrastructure management.

Last time I had to edit terraform state 1) it didn't work (trying to downgrade to an older version of the state file after a newer version of terraform modified it) and 2) was more work than recreating the environment from scratch.

glenngillen · on Aug 23, 2023

I used to be on the product team @ Terraform.

The overwhelming majority of uses for state surgery are break glass hacks for other issues, the fact people had to resort to it so frequently was a bug and something we were working hard to fix. The one you called out was (hopefully) addressed by having a stable schema, or at least one that could be read by newer and older versions in a consistent way, since (irrc) v0.14. Config-driven moves (`moved` statements in HCL) was a big one, especially for refactoring into or between modules. The recent import functionality is another in that long line. Some of the historical version upgrade specific issues we had to retrospectively address so they were more akin to guided workflows than core changes, and so they only existed in Terraform Cloud (though not paid-for features last I knew).

I'm sure we didn't get every single use case solved. But I'd recommend to everyone that if they find themselves reaching for a state command to double check there isn't now a better way. If there isn't a better way, it's worth making sure someone on the Terraform team is aware of that fact.

dygd · on Aug 22, 2023

A minor thing, but the article says that the Terraform AWS provider uses AWS CLI to manage resources. That's not the case, it uses aws-go-sdk to directly talk to the AWS APIs.

mnahkies · on Aug 22, 2023

This feels brittle to me. I'd stick to creating layers by having separate root modules with their own state files (eg: in the example you could have a kubernetes root module that was shared across developers, and a buckets root module for the S3 buckets)

You'd then typically have multiple state files per root module applied with different tfvars to create multiple copies of the same infrastructure (eg: dev/stage/prod). Workspaces can help with this, especially when using terraform cloud.

As other comments have mentioned you can use outputs to share data between state modules, and to make this nicer you can wrap the remote state blocks in a module generated from the outputs so that you can get intelisense/autocomplete (though it's a bit annoying that you can't plan speculatively across state files where output changes are involved)

This page goes into a bit more detail about separating components of your infrastructure into discrete state files and talks about when (terraform cli) workspaces may or may not be appropriate https://developer.hashicorp.com/terraform/cli/workspaces

In terms of creating the buckets per developer - I think there's a bigger question of do they need a bucket each or some larger subset of the infrastructure and this should guide how you organise things. It'll need to tie into your strategy around production infrastructure (eg: single Vs multi tenancy, and how you want to manage deploying to different regions).

An aside, but I'd probably argue that for development purposes sharing a bucket but having your applications accept a bucket prefix (giving each developer a namespace of sorts) in addition to a bucket name is simpler, and lends itself to quicker ephemeral deployments that avoid the need to change underlying infrastructure

cellarmation · on Aug 22, 2023

This is a really nice write up and a clear explanation of a real organisational problem with terraform. This seems more like a design pattern to me though, I dont quite see the need for a specific tool?

Terraform has the ability to work with remote tfstate files, reference resources in other tfstate files via terraform_remote_state and unique instances can be labelled via a tfvar ID as a pre/post fix. Is this a polished wrapper around those mechanisms, or is there more going on as well?

swozey · on Aug 22, 2023

First the main.tf example with my_bucket and my_configs is bad it's using the same resource names.

Second I run pretty complex metas (for_each, etc) and have never encountered an issue..

Third, using a random string as a bucket name? So we're abstracting the bucket name which will just confuse someone who does aws s3 ls, or looks at s3 in the console? If Ops is dealing with an outage related to s3 they have to go in and figure out what (I assume) tag of someones name is attached to what bucket. It's 5am here and my brain is barely churning, I'm trying to figure out how else you would know whose bucket is whose aside from tagging the bucket "dalton."

I could possibly see if you use the SAME ID for the developer for EVERY object and I could go in and say "Oh, 112345123145 is Dalton and his s3 tstate is in the 112345123145 bucket." But you're randomizing all of them.

Then you do random IDs for namespaces as well.. This feels like an ops nightmare.. It's great if you have devs with the same names I guess.

thwoerasdf · on Aug 22, 2023

You'd examine the state file which holds identity of infrastructure.

You can also export S3 bucket names as an output.

You should also be using tags to pin ownership.

isbvhodnvemrwvn · on Aug 22, 2023

But your alerts are on AWS. You take away context from place where it's needed the most.

tekla · on Aug 22, 2023

I really want to see the shitshows of TF environments that inspire needless solutions like this.

zdc1 · on Aug 23, 2023

Despite its other quirks and pain points, I feel that AWS CloudFormation does a better job of separating the ideas of a set of desired resources (Templates) and their deployed instances (Stacks).

I've seen a lot of Terraform best-practices guides for project organisation that are a mess of nested folders for environments and modules, with boilerplate provider and backend configuration in each of them. The built-in solution (Workspaces) can also be helpful, but doesn't work in contexts where deeper segregation between environments is required.

My personal approach to living happily with Terraform is to have a set of standard root modules (network, db, etc.) and have a wrapper like Ansible that generates the backend configuration on-the-fly for whatever environment is being targeted. I'd also have it load variables from a central configuration store and expose them as TF_VAR_... so a standard set of variables is always available.

OJFord · on Aug 22, 2023

So, workspaces? TFA doesn't mention them at all, maybe it does something more/different, but should at least explain it, it's the obvious question/confusion.

thwoerasdf · on Aug 22, 2023

Built in solution is workspaces, stacks, modules, and making sure modules have reentrancy.

no_circuit · on Aug 22, 2023

The TL;DR is ergomake/layerform is a GPL v3 licensed package/service that lets one create instances of infrastructure sets through composition of Terraform. It presents itself with Kubernetes examples but its readme has goal of doing any type of infrastructure.

I sort of fail to see why one just can't write reusable Terraform modules already to accomplish the same. If you don't want someone modifying a base layer, than that is achieved with OWNERS files for source control, and role-based access control for the identifies/credentials, for the underlying infra services, used by the Terraform plan.

But for those familiar with Kubernetes already, it is essentially Kustomize [1] for Terraform. Although if you are using Kubernetes already, the alternative is to write an Operator yourself to create a self-serve environment provisioner. That way the operators have their own service accounts, roles, and secrets. And quotas/rate limits can be achieved through admission control. Existing way for reporting to know that things failed to deploy, and so on...

[1] https://kustomize.io

nazarewk · on Aug 22, 2023

Does it come with flattening modules and being able to merge/patch resources or child module's resources?

I fiddled with the idea of recreating this with terranix, but kind of lost interest after license changes.

The basic motivation is to adjust/extend 3rd party modules without forking them.

confiq · on Aug 22, 2023

hmm, we did something similar using workspaces in terraform. If I would know about this before I might reevaluate it.

It would be cool if we could transfer from workspaces to layerform