Agentic DevOps: Why I Think MCP Servers Are the Right Abstraction

I gave an AI agent kubectl access once. It deleted the wrong namespace because I worded the prompt poorly. That was the day I stopped believing in “just give it a shell and supervise it.”

I have been building and thinking about infrastructure agents for a while. The part that excites me is not autonomous remediation at scale. It is the idea that an agent can read a log, summarize a failure, and suggest a next step before a human even opens the incident channel.

The part that scares me is giving that agent a shell. I learned that the hard way.

This post is about why I think MCP servers are the right answer to that tension, and what I believe a safe agentic DevOps setup looks like. It is based on things I actually built and things I actually broke.

What Agentic DevOps Means to Me

Agentic DevOps is the use of AI agents to observe, reason about, and act on infrastructure. The shift from traditional automation is not that the agent is smarter. It is that the agent operates in a space of ambiguity.

A traditional script says: if memory is over 90%, scale up. An agent says: memory is climbing, there was a deployment 10 minutes ago, and the error logs mention a memory leak in the new version. It might recommend a rollback. It might recommend a restart. It might recommend escalating to a human.

That ambiguity is useful. It is also dangerous if the agent has too much power. I found this out when I connected Kimi Code to my homelab cluster with a generic “help me manage Kubernetes” prompt. Within ten minutes it had suggested deleting a namespace that contained my monitoring stack. The namespace name was similar to the one I actually wanted to clean up. The agent did not know the difference because I had not told it.

Why MCP Servers Matter

MCP servers sit between the agent and your infrastructure. They expose a small, well-defined set of tools instead of raw API access. The agent sees get_pod_logs, list_deployments, and restart_deployment. It does not see kubectl exec or cluster-admin.

This matters for three reasons:

Least privilege by design. You decide exactly what each tool can do. I learned this after the namespace incident. Now my agents only see what I explicitly expose.
Auditable actions. Every tool call is a discrete, loggable event. When something goes wrong, I know exactly what the agent tried to do.
Composable infrastructure. The same MCP server works with any compatible agent. I built one for my homelab and tested it with Kimi Code, Continue.dev, and a custom LangGraph agent.

I think of MCP servers as the USB-C port of agentic infrastructure. One protocol, many peripherals, clear boundaries. The boundary is what I was missing when I gave the agent raw kubectl access.

The Architecture I Believe In

A production agentic system has four layers:

Observability plane: Prometheus, Loki, PagerDuty, whatever you already use. The agent reads from here.
MCP server layer: Translates infrastructure APIs into scoped tools.
Agent runtime: The LLM, tool calling, memory, and planning layer.
Governance layer: Policies, audit logs, rate limits, and approval queues.

Most teams want to skip straight to layer 3. I think layers 2 and 4 are where the real work is. I spent two weekends building layer 3 before I realized I needed layer 2. That was a mistake I do not want to repeat.

Supervised Autonomy Is the Only Safe Default

I do not believe in fully autonomous agents touching production. Not yet. The right model for almost everyone is supervised autonomy:

Read-only tools run without approval.
Reversible actions run with notification.
Destructive actions require explicit human approval.

This is not cowardice. It is engineering. The cost of a wrong autonomous action in production is much higher than the cost of waiting five minutes for a human. I learned this when my Kimi Code agent suggested a pod restart that would have interrupted a long-running batch job. The restart was technically correct, the pod was unhealthy, but the timing would have corrupted the job output. I caught it because I was reviewing the suggestion. If I had enabled auto-execute, I would have spent my Sunday rebuilding the job.

A Simple MCP Server Example

Here is the shape of a Kubernetes log query server I actually built after the namespace incident:

from kubernetes import client, config
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("k8s-logs")
config.load_kube_config()
v1 = client.CoreV1Api()

@mcp.tool()
async def get_pod_logs(
    pod_name: str,
    namespace: str = "default",
    tail_lines: int = 100,
) -> str:
    """Get recent logs from a Kubernetes pod."""
    try:
        logs = v1.read_namespaced_pod_log(
            name=pod_name,
            namespace=namespace,
            tail_lines=tail_lines,
        )
        return logs
    except client.exceptions.ApiException as e:
        return f"Kubernetes API error: {e.status}: {e.reason}"

if __name__ == "__main__":
    mcp.run(transport="stdio")

This is read-only, scoped to logs, and fails cleanly. That is the baseline for any tool I would let an agent use. I run this on my homelab now. The agent can ask about logs but it cannot delete anything.

My Implementation Path

If I were starting from scratch today, I would do this:

Build one read-only MCP server for observability.
Connect it to an agent and ask natural language questions about infrastructure.
Add one reversible action with human approval.
Run that for a month and measure outcomes.
Only then consider expanding autonomy.

The temptation is to automate everything at once. The safe path is to prove one narrow capability at a time. I know because I tried the fast path first. It cost me a namespace and a Sunday.

What I Actually Run Now

On my homelab cluster, I run three MCP servers:

k8s-logs: Read-only log queries. The agent uses this daily.
k8s-metrics: Read-only Prometheus queries. The agent uses this for context.
k8s-safe-restart: Pod restart with a confirmation gate. The agent has used this twice in three months. Both times I approved it manually.

I do not run a write-capable MCP server yet. The gap between “read-only” and “can restart pods” is smaller than the gap between “can restart pods” and “can edit deployments.” I am taking my time crossing that second gap.

What I Learned About Prompts

The namespace incident was not just about tool scope. It was about prompt ambiguity. I said “clean up the old monitoring namespace” and the agent picked the wrong one because I had two monitoring-related namespaces and did not specify which.

Now I write prompts like this:

Cluster: homelab-k3s
Namespace: production-apps
Task: Check logs for the pod named api-gateway-7d9f4b2c1-x5k9p
Constraints: Read-only. Do not restart, delete, or modify anything.

The constraints line is non-negotiable. Even with MCP servers, I add it because the agent might suggest something outside the tool scope and I want to be explicit.

Conclusion

Agentic DevOps is not about removing humans. It is about giving humans better tools. MCP servers make that possible by defining clear boundaries between what an agent can see, what it can do, and what needs human judgment.

Start small, read-only, supervised. Increase autonomy only after the agent proves it deserves it. I learned this the hard way so you do not have to.

The namespace I lost was not production. It was my homelab monitoring stack. I rebuilt it in a few hours. If it had been production customer data, I would be writing a very different post.