I have been using Claude Code as my primary coding environment for infrastructure work for six months. This is not a getting-started tutorial — there are plenty of those. This is a practical guide to the workflow patterns that actually work for Terraform and Python DevOps tooling, and an honest account of where the friction still lives.


The IaC workflow#

The diagram below shows the session loop I have settled on for Terraform work:

Claude Code IaC workflow

The key discipline is step 4 (iterate on the diff, not on the description). Once you have a concrete diff to react to, the conversation becomes precise. Going back and forth on descriptions produces increasingly vague suggestions.


CLAUDE.md for infrastructure repositories#

The most important setup step is a well-written CLAUDE.md in your repository root. Claude Code reads this file as persistent context for every session.

# CLAUDE.md

## Terraform conventions
- Resource names: snake_case.
- All resources tagged: environment, owner, cost_center.
- Variable types must be explicit — never `any`.
- No `count` for resources that differ logically — use `for_each` with a map.
- Do not add error handling or validation that the task did not ask for.
- Do not refactor surrounding code — only change what was explicitly requested.

## Module structure
modules/
  storage/     — Azure Storage Accounts
  networking/  — VNets, subnets, NSGs
  aks/         — AKS cluster and node pools

## Commands
- Validate: terraform validate
- Plan:     terraform plan -out=tfplan
- Apply:    terraform apply tfplan  (human approval required)

The two most important rules are the last two under Terraform conventions. Without them, Claude Code will helpfully refactor surrounding code, add unused variables, and introduce abstractions you did not ask for.


Tutorial: extracting a flat main.tf into modules#

Before#

A 600-line main.tf with everything in one file:

# main.tf (flat)

resource "azurerm_resource_group" "main" {
  name     = "prod-euw-platform-rg"
  location = "westeurope"
}

resource "azurerm_storage_account" "logs" {
  name                     = "prodeuwlogsasa"
  resource_group_name      = azurerm_resource_group.main.name
  location                 = azurerm_resource_group.main.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  min_tls_version          = "TLS1_2"
  enable_https_traffic_only = true
  tags = {
    environment = "prod"
    owner       = "platform-team"
    cost_center = "infra-ops"
  }
}

resource "azurerm_kubernetes_cluster" "main" {
  name                = "prod-euw-aks"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "prod-euw"
  # ... 80 more lines
}

Step 1 — Describe the target structure to Claude Code#

Extract the storage account resources into modules/storage/.
The module should accept: name, resource_group_name, location, tags.
It should output: id, primary_blob_endpoint.
Only move the storage resources. Do not touch the AKS or RG resources.

Step 2 — Review the proposed diff#

Claude Code proposes the following new files:

# modules/storage/main.tf
variable "name"                { type = string }
variable "resource_group_name" { type = string }
variable "location"            { type = string }
variable "tags"                { type = map(string) }

resource "azurerm_storage_account" "this" {
  name                      = var.name
  resource_group_name       = var.resource_group_name
  location                  = var.location
  account_tier              = "Standard"
  account_replication_type  = "LRS"
  min_tls_version           = "TLS1_2"
  enable_https_traffic_only = true
  tags                      = var.tags
}

output "id"                    { value = azurerm_storage_account.this.id }
output "primary_blob_endpoint" { value = azurerm_storage_account.this.primary_blob_endpoint }
# main.tf (updated)
module "logs_storage" {
  source              = "./modules/storage"
  name                = "prodeuwlogsasa"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  tags = {
    environment = "prod"
    owner       = "platform-team"
    cost_center = "infra-ops"
  }
}

Step 3 — Validate before accepting#

terraform plan
Plan: 0 to add, 0 to change, 0 to destroy.

A refactor that produces a non-empty plan means something changed unintentionally — stop and investigate.


Tutorial: writing pytest coverage for operational Python scripts#

Claude Code makes writing tests fast enough that I now do it by default for all operational scripts.

The script#

# cost_report.py
import boto3
from datetime import datetime, timedelta

def get_monthly_cost(account_id: str, month_offset: int = 0) -> dict:
    """Return AWS Cost Explorer data for the given month offset (0 = current month)."""
    client = boto3.client("ce", region_name="us-east-1")
    now = datetime.utcnow()
    start = (now.replace(day=1) - timedelta(days=30 * month_offset)).strftime("%Y-%m-01")
    end = now.strftime("%Y-%m-%d")

    response = client.get_cost_and_usage(
        TimePeriod={"Start": start, "End": end},
        Granularity="MONTHLY",
        Metrics=["UnblendedCost"],
        Filter={"Dimensions": {"Key": "LINKED_ACCOUNT", "Values": [account_id]}},
    )
    return response["ResultsByTime"][0]["Total"]["UnblendedCost"]

The tests (Claude Code generates these from a single request)#

# test_cost_report.py
import pytest
from unittest.mock import patch, MagicMock
from cost_report import get_monthly_cost

MOCK_RESPONSE = {
    "ResultsByTime": [
        {"Total": {"UnblendedCost": {"Amount": "1234.56", "Unit": "USD"}}}
    ]
}

@pytest.fixture
def mock_ce_client():
    with patch("cost_report.boto3.client") as mock:
        client = MagicMock()
        client.get_cost_and_usage.return_value = MOCK_RESPONSE
        mock.return_value = client
        yield client

def test_returns_cost_dict(mock_ce_client):
    result = get_monthly_cost("123456789012")
    assert result["Amount"] == "1234.56"
    assert result["Unit"] == "USD"

def test_calls_correct_account(mock_ce_client):
    get_monthly_cost("123456789012")
    call_kwargs = mock_ce_client.get_cost_and_usage.call_args.kwargs
    assert "123456789012" in call_kwargs["Filter"]["Dimensions"]["Values"]

@pytest.mark.parametrize("offset,expected_month", [
    (0, "current"),
    (1, "previous"),
])
def test_month_offset(mock_ce_client, offset, expected_month):
    # Just verify it runs without error for valid offsets
    result = get_monthly_cost("123456789012", month_offset=offset)
    assert "Amount" in result

Honest productivity numbers#

Task type Multiplier vs. without Claude Code
New resource blocks, variable wiring 4–5×
Module extraction and refactoring 3–4×
Writing pytest fixtures
Understanding unfamiliar code
Architectural decisions (what to build) 1.1×

Claude Code amplifies execution. It does not replace judgement. The right division of labour: let it handle the expression; you own the architecture.


Where I still hit friction#

Large repos and module nesting. On a Terraform monorepo with dozens of modules, Claude Code sometimes loses track of how a variable flows through three levels of nesting. Fix: paste the relevant outputs and variable declarations explicitly into the conversation rather than assuming they are in context.

Overly helpful suggestions. Despite the CLAUDE.md rules, Claude Code occasionally adds error handling or abstractions not requested. The fix is explicit instruction: “Only change what I asked. Do not refactor surrounding code.”

State-dependent operations. Claude Code works on the code, not the deployment. For anything involving terraform state mv, taint, or import, I do it manually with the output in front of me.

Six months in, I am not going back.