I Don't Want AI to Replace DevOps. I Want It to Read the Docs I'm Too Tired to Read

It's 2 AM. The pager went off eleven minutes ago. You're staring at a Kubernetes upgrade advisory that's forty-seven paragraphs long, and somewhere in paragraph thirty-one there's a breaking change about how EKS handles PodIdentity federation with IAM roles. You know it's in there. You read it three months ago. But right now your brain is running on caffeine and cortisol, and the words are blurring into each other.

You could run the upgrade now and hope for the best. Or you could spend forty minutes re-reading the entire changelog, the Terraform provider notes, the Helm chart migration guide, and three different Slack threads from the last time someone did this.

This is the part of DevOps nobody puts in conference talks. Not the elegant GitOps pipelines or the slick dashboards. The part where you're exhausted and you still have to make a decision that affects production, and the information you need is spread across nine browser tabs, a Confluence page from 2023, and a runbook that was last updated when your cluster was on 1.24.

This is where I want AI to help. Not by taking over. Not by running kubectl apply on my behalf while I sleep. By reading the damn docs for me.

The kind of tired that matters

The Google SRE Workbook has a word for what happens when engineers spend too much time on repetitive operational work: toil. They define it as "the repetitive, predictable, constant stream of tasks related to maintaining a service." Rollouts, upgrades, alert triage, manual repairs, ticket-driven provisioning. Google puts a hard cap on it: no more than 50% of an SRE's time should go to operational work.

The reasoning isn't just about efficiency. The workbook makes a point that has always stuck with me: time spent on toil is time not spent where human judgment, creativity, and design thinking matter.

Here's what I think the SRE Workbook doesn't fully capture, at least not in those exact words. There's a specific kind of toil that doesn't look like toil. It doesn't involve clicking buttons or running the same script for the hundredth time. It's cognitive. It's the mental cost of assembling context from scattered sources before you can make a decision.

Reading a Kubernetes release notes page that's 3,000 words long to find the one deprecation that affects your cluster. Comparing two versions of a Helm values.yaml to understand what changed between chart versions 4.2.1 and 5.0.0. Skimming a Terraform provider changelog to see if the aws_eks_cluster resource changed its default behavior. Correlating an incident timeline from last Thursday with the deployment that happened two hours before the spike in 5xx errors.

This work isn't glamorous. It doesn't produce artifacts. Nobody thanks you for spending an hour reading release notes. But if you skip it, you miss the breaking change that takes down a service at 3 AM on a Sunday.

Sometimes the most exhausting part of an incident is not fixing the issue. It is building enough context to feel safe fixing it.

I think of this as cognitive toil, and AI is unusually well suited to help with it.

Why I don't want an AI agent with production access

Before I talk about what I do want, let me be clear about what I don't.

I don't want an AI agent that has kubectl apply access by default. I don't want one that can merge PRs, push to main, modify IAM policies, or restart services without a human in the loop. I've seen enough production incidents caused by humans who were tired, rushed, or copy-pasting from the wrong terminal. Giving that same power to something that hallucinates API flags and invents Kubernetes resources that don't exist is not progress. It's a new category of incident.

In application code, an AI mistake might fail a test. In DevOps, an AI mistake might page five teams, drain the wrong node, rotate the wrong secret, or turn a small incident into a very educational afternoon.

The Stack Overflow 2025 Developer Survey backs this up. 76% of developers don't plan to use AI for deployment or monitoring tasks. Not because they're luddites. Because they know what's at stake. More developers actively distrust AI accuracy (46%) than trust it (33%). Only 3% highly trust it. That is the part that makes people nervous: AI can sound confident even when the answer still needs careful verification.

In DevOps, "almost right" isn't a minor inconvenience. An "almost right" IAM policy is a security incident. An "almost right" Kubernetes manifest is a workload that runs fine until it doesn't, and then you're debugging at 2 AM wondering why the liveness probe path changed. An "almost right" Terraform plan is a production resource that gets destroyed and recreated instead of updated in place.

The problem is not that AI is useless. The problem is that AI is useful enough to make dangerous workflows look reasonable. In DevOps, the gap between "sounds correct" and "safe to execute" is where incidents live.

The hard part of DevOps is rarely knowing the command. kubectl apply -f manifest.yaml isn't the hard part. The hard part is knowing whether that command is safe in this environment, with this version of Kubernetes, with these admission controllers, with this cluster autoscaler configuration, right after that EKS add-on got updated. That requires context, judgment, and accountability. AI is genuinely useful for the first two, but it can't own the third. Not yet. Maybe not ever.

Most production work is not blocked because nobody knows how to type kubectl. It is blocked because nobody is completely sure what is safe to do next.

What I actually want AI to do

I want AI to be the colleague who actually reads the release notes before standup. The one who highlights the three things that matter out of a forty-seven-paragraph changelog. The one who can look at a Terraform plan diff and tell you, in plain language, what's about to change and what might break.

Concretely, here's what that looks like.

When I'm going from Kubernetes 1.29 to 1.30, I want something that tells me what got deprecated, what changed in API versions, and what I need to act on before upgrading. Skip the boilerplate about "improved performance." Focus on the removals and behavioral changes.

Before I update the VPC CNI add-on, I want to know if this version is compatible with my current Kubernetes version, my node group AMI, and the Calico network policy version we're running. That compatibility matrix is spread across three AWS docs pages and it changes every quarter.

When the AWS Terraform provider goes from 5.x to 6.x, I don't want to read the entire migration guide. I want to know which resources I'm actually using that changed behavior. Focus on my code, not the universe of possibilities.

When I'm upgrading a Helm chart from 4.x to 5.x, show me what changed in the default values: which new keys were introduced, which old keys were removed, which ones changed their default behavior. Better yet, cross-reference my current values.yaml and tell me which of my overrides are now invalid.

If I inherit a cluster with 200 custom resources I've never seen before, help me understand what they do without reading CRD documentation for six hours.

When an incident happens, take the Slack thread, the PagerDuty timeline, and the post-mortem notes, and produce a runbook that the next on-call engineer can actually follow. One that isn't three years stale.

When the error rate spiked at 14:32 and something was deployed at 14:15, pull the deployment diff, the relevant log lines, and the metrics shift into one view so I can see the connection without switching between four tools.

When five services are throwing errors and the logs are a wall of stack traces, filter out the noise, group the unique errors, and tell me which one started first. That's the one I care about.

None of these require production access. None require the AI to execute anything. They require it to read, understand, summarize, compare, and present information so I can decide faster.

The best DevOps AI will not feel magical. It will feel like a senior engineer left clean notes before going on vacation.

The data says this approach works

GitHub's study on Copilot found something interesting beyond speed. 87% of developers said AI helped them preserve mental effort during repetitive tasks. 73% said it helped them stay in flow. 60-75% said it helped them focus on more satisfying work. One senior engineer put it simply: with AI, they had to think less about the boring stuff, and when they had to think, it was the fun stuff.

The DORA research on generative AI adds an important nuance. Developers who use AI extensively report higher job satisfaction, more time in flow state, and less burnout. But there's a catch: AI adoption didn't reduce time spent on toilsome, repetitive tasks. It sped up the valuable work developers already enjoyed, but didn't crack the code on automating drudgery. DORA also found that a 25% increase in AI adoption was associated with a decrease in delivery stability, because AI lets teams generate more code and more changes faster than their review and testing processes can handle.

Read that last sentence again. AI doesn't hurt stability because it writes bad code. It hurts stability because it lets teams produce more work than their feedback loops can safely absorb.

This is exactly why the read-summarize-suggest model is the right one for DevOps. It gives engineers better context without adding unreviewed changes to the pipeline. It accelerates understanding without bypassing approval. It reduces the time between "I need to figure this out" and "I understand enough to decide" without collapsing the distance between "I decided" and "it's done."

A boundary that matters

I'm not anti-agent. I think autonomous AI agents will eventually have a role in infrastructure operations. But the keyword is eventually, and the prerequisite is trust, and trust is earned slowly and lost quickly.

Stack Overflow also shows developers are much more cautious with high-responsibility work. Most respondents do not plan to use AI for deployment or monitoring. These are not people who hate AI. These are people who know where the blast radius lives.

The DORA report reinforces this: trust directly drives AI productivity. Developers who trust AI accept more suggestions, submit more changes, and spend less time searching for information. But DORA also found that 39% of developers still trust AI outputs "a little" or "not at all."

In DevOps, trust isn't about vibes. It's about being right when being wrong has consequences. An AI that summarizes a changelog and misses a breaking change is annoying but survivable. An AI that applies a change based on that incomplete summary is a production incident.

The line I draw is simple. AI should read, summarize, compare, draft, and suggest. Humans should approve, execute, and own.

Let AI read. Let AI summarize. Let AI compare. Let AI draft. Let AI suggest.

But make humans approve, execute, and own.

The fatigue I want to replace

DevOps has a burnout problem. This isn't news. The on-call rotations, the incident pressure, the constant context switching between ten different tools and three different cloud providers and a pile of documentation that's always slightly out of date.

The fatigue is real. It accumulates. It's not the dramatic kind where someone screams and quits. It's the quiet kind where you stop reading the full changelog because you've read forty of them and nothing ever breaks, until one Tuesday it does. Where you stop updating the runbook because nobody reads it anyway, including you. Where you start copy-pasting Terraform modules from the last project because you don't have the energy to check if the AWS provider changed the defaults again.

AI can't fix organizational dysfunction. It can't fix understaffed on-call rotations or unreasonable SLAs. But it can reduce the cognitive tax of the work that sits between "I got paged" and "I understand what is happening." It can give you back the thirty minutes you'd have spent re-reading docs you already read once. It can catch the breaking change you'd have missed at 2 AM.

I don't want AI to replace DevOps engineers. I want it to replace the exhaustion that makes us worse at the job we're good at. I want it to be the thing that reads the docs so I can focus on deciding what to do with what they say. I want it to handle the reading so I can handle the thinking.

That's not a smaller vision. It's a more honest one.

References: