Workflow OS

How to Stop Your AI Agent From Running Up a Bill

TL;DR

AI agent costs go wrong in four ways: metered API runaways, cloud resources spun up without supervision, paid third-party calls, and retry storms. Three precautions catch most of them — a subscription plan or a hard spend cap, agent-level rules against creating billable resources, and session length capped at two to three hours. Set them up before the first session, not after the first surprise bill.

While testing an agent workflow for this site, we hit the same build error for the eighth time. Each retry was a fresh API call. The session was on a fixed-rate subscription, so the daily limit kicked in and the agent paused. If it had been on metered API billing, the math says we would have burned through several hundred dollars by the time we noticed.

That session is the reason this article exists. AI agent costs do not go wrong gradually. They go wrong in a single afternoon. The fix is a structure you set once, before the first session that can hurt you.

This piece covers the four cost-runaway patterns we have seen, the three precautions that catch most of them, and the safeguards for the cases spend caps cannot reach.

The four cost-runaway patterns

Cost failures fall into four distinct categories. Each has a different cause, and each needs a different control.

Pattern What goes wrong Worst-case cost shape
Metered API runaway Agent retries the same failing call hundreds of times on per-token billing Several thousand dollars in an afternoon
Cloud resource creation Agent provisions infrastructure (servers, databases, storage) on your account Recurring monthly costs until you find and remove the resources
Paid third-party API calls Agent calls a billable external service without your noticing Variable, depends on the service
Retry storms Error → fix → new error → new fix loop, billed each iteration Compounds with metered API to produce the worst incidents

The retry storm is the one that produces most of the horror stories. A misconfigured loop can burn through a month's budget in a few hours. The good news is that every category has a known prevention, and they stack — putting all four in place takes about ten minutes.

What we have seen

A near-miss is the easiest way to describe the shape of the risk.

On a small test build for The Executive OS, we asked an agent to clean up a build configuration. The agent got the first part wrong, hit an error, attempted a fix, hit a different error, attempted another fix. Each iteration was a full API call. The cycle ran fast — eight iterations in roughly fifteen minutes.

This stayed a near-miss for one reason: the session was on a fixed-rate subscription. The daily limit kicked in and the agent paused. The cost was zero beyond the subscription we were already paying for.

If the same session had been on per-token API billing without a spend cap, the numbers get ugly fast. Eight rapid iterations of a configuration cleanup at typical token volumes is several hundred dollars. Multiply by the number of times an unsupervised agent could iterate before someone noticed, and you have the four-figure surprise bill that makes for the canonical story in this category.

The control that worked was not "watch the agent carefully." It was "use a billing model with a built-in ceiling." The first depends on attention. The second works even when attention fails.

The three precautions

Three controls catch most of what can go wrong. Set them once, then leave them alone until your usage pattern changes.

1. Use a subscription plan, or set a hard spend cap

Billing model Cost behavior Right for
Subscription Predictable. Built-in rate limits double as a cost ceiling. Newcomers, anyone who prefers a known monthly cost
Per-token API (metered) Pay for what you use. No ceiling unless you set one. Operators with a clear sense of their consumption

Either model works. The mistake is running on metered API without setting a spend cap.

For each provider, the spend-cap setting is a single toggle. The exact path varies and changes occasionally, so confirm against the provider's current documentation, but the pattern is the same:

Provider Where the spend cap lives
Anthropic API Console → Settings → Billing → Spend Limits
OpenAI API Platform → Settings → Billing → Usage Limits (set both Hard and Soft)
AWS / Google Cloud Each console's Budgets / Billing Alerts panel
Third-party APIs Each service's billing dashboard

The hard limit prevents disasters. The soft limit notifies you before you get there. Set both.

2. Block cloud-resource creation at the agent level

Spend caps protect the AI provider's billing. They do not protect AWS, Google Cloud, or any other infrastructure account the agent might touch.

If your agent can touch infrastructure — by running command-line tools, making API calls, or executing shell commands — treat every resource-creation step as something that needs explicit approval. Add this to your standing instructions:

- Do not create cloud resources (instances, databases, storage, queues) without confirmation.
- Do not register or pay for any third-party service.
- Do not modify any billing-related configuration.

Standing instructions are the right place for this rule because the cost path is structural, not session-specific. We cover the broader pattern in Standing Instructions for AI Agents.

3. Cap session length at two to three hours

Retry storms get worse in long sessions. Past two or three hours, several factors stack:

  • Memory pressure builds
  • Earlier instructions get summarized away
  • Context drift makes the agent more likely to misread the situation
  • Each attempted fix gets layered on top of stale context

The fix is simple: cap each session at two to three hours, take a break, and start fresh. The break is also when you review what happened and update your standing instructions if anything new came up.

This is the same session-boundary rule that protects against runaway loops in general. We cover it more in Three Ways AI Agents Break Your Work.

Standing-instruction safeguards that close the remaining gaps

Spend caps cap the bill. They do not change the agent's behavior. The agent will still try to retry the same failing call eighty times if you let it. To stop the behavior at the source, add these to your standing instructions:

- If you hit the same error three times in a row, stop and ask before trying again.
- If a single task has been running for more than thirty minutes, summarize progress and ask whether to continue.
- Do not run commands that install or upgrade dependencies without explicit confirmation.
- Do not call any external service that bills per request without prior approval.

The "three errors and stop" rule is the single highest-leverage line. It costs nothing to add and prevents most of the runaway loops we have seen.

Where spend caps do not catch everything

Even with a spend cap set, three patterns can still produce a bill:

Pattern Why the cap does not catch it Mitigation
Already-consumed usage Spend caps apply to future usage; what was already used will still be billed Set the cap at the start of the month, not after the first big session
Billing-time lag Most providers count usage with a small delay, so the cap can be exceeded by a small margin Set the cap at 90% of what you actually want to allow
Cloud resources spun up by the agent The AI provider's spend cap does not see your AWS/GCP usage Block resource creation at the agent level (see standing instructions above)

These are not reasons to skip the spend cap. They are reasons to layer the agent-level rules on top of it.

The five-minute setup

If you are setting up an agent for the first time, this is the order:

# Action Time
1 Set the monthly spend cap on every billable account 3 minutes
2 Add the cost-control rules to your standing instructions 1 minute
3 Decide your session-length rule (we use two to three hours)
4 Note the "three errors and stop" rule somewhere visible during sessions 30 seconds
5 Set a calendar reminder to review your billing dashboard weekly for the first month 30 seconds

Five minutes of setup, applied once, prevents most of the cost incidents that ruin agent workflows for newcomers.

A pre-session check

Before any non-trivial agent run, the cost-side equivalent of the pre-flight checklist is three questions:

  1. Is the spend cap still in place? Caps occasionally get reset by billing changes; a quick glance at the dashboard catches this.
  2. Are the standing instructions still loaded? Some tools require restarting the session to pick up changes; confirm the agent acknowledges them.
  3. Is the task scoped tightly enough that a retry storm is unlikely? The looser the task, the more iterations the agent will run, and the more cost exposure compounds.

If any answer is uncertain, fix it before sending the prompt.

The bottom line

Tier What it prevents
Subscription plan or hard spend cap The four-figure surprise bill
Standing-instruction rules against billable actions Cloud-resource and third-party-API costs that bypass the cap
Session length cap at two to three hours Retry storms compounding into expensive runs
"Three errors and stop" rule Most of the runaway loops we have seen

None of these controls are difficult. They are just easy to skip when the bill has never surprised you. Set them up before the first incident, and most of the stories you have read about runaway AI bills will not be your stories.

Related reading

Note. Some links in this article are affiliate links. We only recommend tools we actively use. Tool reviews reflect our own field experience and are not editorial recommendations from the linked vendors.