Olive-One Bill Shock Pattern
Cloud Run Runaway Cost
When an AI workflow scales faster than its margin model.
Executive summary. Cloud Run is excellent for lightweight APIs and event-driven workloads, but teams can create bill shock when services scale without max instances, concurrency controls, request throttling, budget ownership, or workflow-level cost attribution.
Use case. An AI support workflow receives a traffic spike. Each request triggers model calls, vector lookups, logging, and retries. The Cloud Run service scales correctly from an engineering perspective, but nobody owns the unit economics of the workflow.
A. The symptom
Finance sees margin compression, but product metrics look healthy.
- Cloud Run spend spikes unexpectedly.
- LLM/API spend rises in parallel.
- Logs increase dramatically.
- Support automation looks successful in product metrics.
- Finance sees margin compression but cannot trace it to a workflow.
B. The hidden mechanism
The service scales, then every downstream dependency scales with it.
- Request volume increases.
- Concurrency is misconfigured or too low.
- Max instances are not capped.
- Retries multiply requests.
- Every request triggers downstream paid services.
- Logs grow with request volume.
- No workflow owner is accountable for cost per resolved ticket or agent run.
C. Example cost shape
A workflow that appears to cost $900/month can become a $3,400/month workflow.
Hypothetical example numbers only:
The visible model cost may look acceptable at first. After retry amplification, freely scaling Cloud Run instances, retained logs, vector lookups, database calls, and downstream LLM/API calls, the same support workflow can move from an apparent $900/month cost to a hypothetical $3,400/month run-rate.
D. Detection signals
Look for request amplification plus downstream paid service fanout.
- Cloud Run request count spike.
- Instance count spike.
- Concurrency too low.
- Max instances missing.
- 5xx/429 rate increases.
- Retry count increases.
- Logs volume increases.
- Downstream LLM/vector/database cost increases.
- Missing owner tag or workflow tag.
- No cost per resolved ticket or per agent run metric.
E. Scanner rule
Cloud Run Runaway Cost
- Risk
- High
- Pattern
- Cloud Run Runaway Cost
- Cost Shape
- Request amplification + downstream paid service fanout.
- Business Risk
- Margin compression and unowned AI workflow spend.
- Recommended Action
- Add max instances, tune concurrency, cap retries, add workflow cost attribution, add budget ownership, and track cost per successful outcome.
Scanner checklist: Cloud Run service without max instances, missing concurrency setting, missing timeout discipline, missing budget labels, missing service ownership labels, missing log retention, public unauthenticated endpoint where not expected, retry policy without cap/backoff, or service calling paid AI/model/vector APIs without a budget guardrail.
F. Recommended fix
Put economic guardrails around the workflow, not just the service.
- Set max instances.
- Tune concurrency.
- Set timeouts.
- Cap retries with exponential backoff.
- Add rate limiting.
- Tag by workflow, owner, and environment.
- Add log retention policy.
- Create budget alerts.
- Track cost per business outcome.
- Route low-risk tasks to a cheaper model/API where applicable.
- Define a kill switch.
G. Executive interpretation
This is a workflow economics issue.
This is not merely a Cloud Run configuration issue. The service is scaling, but the business has not defined the acceptable cost per resolved ticket, document, customer, or agent run.
H. Olive-One teardown angle
How Olive-One would diagnose it.
- Map the workflow from request to downstream paid services.
- Identify cost fanout across Cloud Run, logs, model calls, vector lookups, database calls, and retries.
- Estimate unit economics by resolved ticket, agent run, customer, or successful outcome.
- Detect missing controls: max instances, concurrency, retry caps, budget ownership, labels, and kill switches.
- Rank fixes by margin impact, implementation effort, and operational risk.
- Produce an executive decision memo: keep, optimize, cap, reroute, reprice, or kill.