Why I Actually Centralize My LLM Traffic Through LiteLLM

24 Feb 2026 5 min read Technology
Why I Actually Centralize My LLM Traffic Through LiteLLM

Every new AI tool wants its own API key. After the third or fourth provider, you are no longer managing keys. You are managing regret. I know because I took a screenshot of an env file to send to a colleague, and I spent the next hour rotating keys because I realized the screenshot was on my company Slack.

I use a few different LLM providers. Some are better for coding, some are cheaper for quick tasks, some are just experiments I want to try. The problem is that every tool in my stack, my IDE, my CLI assistant, my automation scripts, wants its own key and its own config. That is how you end up with secrets scattered across laptops, env files, and screenshots.

LiteLLM solves this by putting a single OpenAI-compatible gateway in front of every provider. I point all my tools at one URL. LiteLLM handles the routing, the headers, the retries, and the provider-specific quirks.

What a Gateway Actually Buys You

The value is not just convenience. It is control.

  • One key per client: My tools hold one LiteLLM master key, not five provider keys.
  • Server-side routing: I change the model or provider in one config file, without touching every client.
  • Spend visibility: I can see usage in one place instead of checking five dashboards.
  • Fallbacks: If a provider is down or rate-limited, I can route to another one.

For a solo setup this is nice. For a team it is essential. I do not have a team. But I still like the convenience.

How I Actually Run It

I run LiteLLM on Kubernetes because I already run Kubernetes. It does not have to be on Kubernetes. A Docker container on a homelab server works fine. The important part is that the config lives somewhere persistent and the clients can reach it.

My setup is intentionally simple:

  • A proxy_config.yaml that lists the providers and models I use.
  • Kubernetes Secrets for provider keys and the master key.
  • A Deployment with a NodePort for access. No Ingress because I do not expose it externally.
  • Postgres for LiteLLM’s internal state like spend tracking and key management.

I keep the config on a PVC so I do not lose it when the pod restarts. I keep Postgres outside the gateway pod so database maintenance does not take the gateway down. I learned this the hard way when a Postgres restart killed the gateway pod because they were in the same Deployment.

My Actual Config

Here is my actual proxy_config.yaml:

model_list:
- model_name: kimi-code
litellm_params:
model: openai/kimi-for-coding
api_base: "https://api.kimi.com/coding/v1"
api_key: "os.environ/KIMI_API_KEY"
headers:
User-Agent: "claude-code/0.1.0"
X-Kimi-Client: "Kimi-Code"
- model_name: openrouter-free
litellm_params:
model: openai/openrouter/free
api_key: "os.environ/OPENROUTER_API_KEY"
api_base: "https://openrouter.ai/api/v1"
headers:
HTTP-Referer: "https://blog.eduard3v.com"
X-Title: "LiteLLM-Automation"
- model_name: local-llama
litellm_params:
model: openai/llama3.1
api_base: "http://ollama.ollama.svc.cluster.local:11434/v1"
api_key: "dummy"

The Kimi headers are an example of the kind of provider-specific nonsense a gateway absorbs. Without LiteLLM, every client would need to know about them. The local-llama entry routes to my Ollama instance on the same cluster. This is how I switch between API and local models without changing client config.

Client Configuration

Any OpenAI-compatible client works. I set two values:

  • Base URL: http://litellm.litellm.svc.cluster.local:8000
  • API key: my LiteLLM master key.

Then I choose a model by name. If I want to switch from Kimi to OpenRouter, I change the model name, not the client config. If I want to decommission a provider, I remove it from LiteLLM and no client notices.

What I Actually Watch

Once traffic flows through a gateway, monitoring becomes useful:

  • Request volume and latency per model. I check this weekly.
  • Error rates per provider. This tells me when a provider is flaky.
  • Token spend over time. I have a monthly budget of €100 for API calls. This keeps me honest.
  • Which clients use which models. I discovered my automation script was using the expensive model for simple tasks. I fixed the routing.

LiteLLM exposes Prometheus metrics, and I send them to my existing Grafana setup. The dashboard tells me when a provider is flaky and whether my spend is going where I expect.

What I Actually Pay

ProviderMonthly CostWhat I Use It For
Kimi~€25Coding tasks, complex reasoning
OpenRouter~€15Experiments, fallback models
Local (Ollama)~€0 (electricity only)Simple tasks, privacy-sensitive code
Total~€40/month

Before LiteLLM, I was paying ~€60/month because I was using the wrong model for the wrong task. The gateway helped me see that and fix it. The savings paid for the setup time in two months.

When a Gateway Is Overkill

I would not run LiteLLM for a single client and one provider. If you only use OpenAI from one application, adding a gateway is unnecessary complexity. The value appears when you have multiple clients, multiple providers, or a team that needs shared access.

I have multiple clients and multiple providers. The gateway is worth it for me. If you have one API key and one script, skip it.

My Take on Provider Diversity

I do not believe in betting everything on one LLM provider. Pricing changes, rate limits get tighter, models get deprecated. A gateway lets me treat providers as interchangeable plumbing while keeping my clients stable.

That said, provider diversity has its own cost. More providers means more accounts, more billing, more quirks. I keep my provider list small and add new ones only when they solve a real problem. I have three providers. That is enough.

Conclusion

A centralized LLM gateway is the simplest way I have found to stay in control of AI tooling. It reduces secret sprawl, makes routing decisions server-side, and gives me a single place to watch spend and health.

If you are managing more than one provider or more than one client, I think it is worth the small operational cost. If you are managing one provider and one client, it is probably not.

The screenshot incident taught me that secret sprawl is not just a security risk. It is a time sink. Every key rotation is an hour I could have spent on something else. LiteLLM reduced my key count from five to one. That alone was worth the setup.

#litellm #Kubernetes #AI #Llm #proxy #ai-gateway
Back to Writing End of Post