What it actually costs to run an AI automation in production
A real cost breakdown for running an AI automation in production in 2026. LLM API costs, infrastructure, monitoring, vector storage, and the hidden ops costs nobody puts in their pitch deck. With realistic ranges by automation shape.
— TL;DR
A moderate-volume B2B AI automation in 2026 costs $200 to $800 per month all-in. LLM API spend is 40 to 60% of that; infrastructure, monitoring, and vector storage make up the rest. Output tokens are the biggest line. Routing to cheap models, caching prefixes, and constraining length cuts 50 to 70% without hurting quality.
A realistic monthly cost for a moderate-volume B2B SaaS AI automation in production is $200–$800 in 2026. LLM API spend, infrastructure, monitoring, vector storage. That's the all-in number including the boring ops costs that don't make it into pitch decks. The high end of that range is the case where you've shipped without cost discipline; the low end is where the team has built in routing, caching, and monitoring from week 1.
This piece breaks down the line items by category, gives realistic ranges by automation shape, and walks through the levers that actually move the cost.
#The shape of the bill
A typical production B2B AI automation cost stack:
| Line item | % of bill | Range (moderate volume) |
|---|---|---|
| LLM API calls | 40–60% | $80–$480/mo |
| Compute (containers / serverless) | 10–20% | $20–$160/mo |
| Vector storage (if RAG) | 5–15% | $10–$120/mo |
| Monitoring + observability | 10–15% | $20–$120/mo |
| Audit log storage | 5–10% | $10–$80/mo |
| Workflow platform fees (if n8n cloud / Zapier / Make) | 0–20% | $0–$160/mo |
| Email / notification services | 1–5% | $2–$40/mo |
| Total all-in | (varies) | $200–$800/mo |
These are real numbers for production B2B automations we've shipped. Moderate volume meaning ~10,000 LLM calls per day, with monitoring, fallbacks, and audit logging in place. Below that volume the cost falls; above that volume the cost rises but typically sub-linearly because you start hitting volume discounts and amortizing fixed costs.
#LLM API costs
The biggest line item. Within it, two sub-lines that account for most of the variance:
Input vs output tokens. Both OpenAI and Anthropic charge 3–5x more for output tokens than input tokens. A prompt that has 1,000 input tokens and produces 2,000 output tokens costs more than a prompt with 5,000 input tokens that produces 500 output tokens.
The single biggest cost-cutting move is constraining output length. Use stop sequences. Set max_tokens. Ask for terse responses in the system prompt. Use structured outputs (JSON schema constraints) which often produce shorter responses than free-form text.
Model tier. Flagship models (GPT-5, Claude Opus 4.7) cost 10–30x what cheap-fast models (gpt-5-nano, Claude Haiku 4.5) cost per token. Most production workloads have a heavy long tail of routine calls that don't need flagship reasoning. Those should route to the cheap tier.
The pattern: ~80% of production LLM calls go to a cheap-fast model; ~20% go to a flagship model when the cheap one fails an evaluation gate or the workload needs flagship-grade reasoning. Teams that route 100% of calls to flagship models are spending 5–10x what they need to.
Prompt caching. Both vendors offer prompt prefix caching in 2026. If your prompts have a stable system prompt or shared context (RAG, few-shot examples), caching cuts input token costs by 50–90% for the cached portion. Free for Anthropic; small per-cached-token fee for OpenAI; both worth wiring up.
For more detail on the model-by-model cost trade-offs, see OpenAI vs Anthropic for B2B SaaS automation.
#Compute costs
Where the automation runs. Three common shapes:
- Serverless functions (Vercel, AWS Lambda, Cloudflare Workers): pay-per-execution, no idle cost. Best for low-to-moderate volume automations with bursty traffic. Typical monthly cost: $5–$80 for moderate-volume automations.
- Long-running containers (Fly.io, Railway, AWS Fargate, plain VPS): flat monthly cost regardless of execution count. Best for sustained-traffic automations or workloads that don't fit serverless cold-start tolerance. Typical monthly cost: $20–$200 for a small container.
- Workflow platforms (n8n cloud, Zapier, Make): bundled with execution fees. n8n cloud starts at
$24/month; self-hosted n8n costs only the hosting ($10–$30/month for a small server). Zapier / Make per-execution fees can dominate at volume.
For most B2B automations under 100k executions/day, serverless is the right shape. For workflows that need to maintain long-lived connections (websockets, polling-heavy integrations), a container is cleaner.
#Vector storage (for RAG workloads)
If your automation does retrieval-augmented generation, you need a vector store. Options in 2026:
- pgvector in your existing Postgres (Supabase, Neon): $0–$25/month additional cost. Fine for under ~1M vectors. Becomes slow above that without careful indexing.
- Pinecone: $0 free tier, $70/month starter, more at scale. Fully managed, fast, opinionated about index types.
- Qdrant: $0 free tier on Qdrant Cloud, $25–$50/month starter. Open source if you self-host.
- Weaviate: $25–$100/month for managed; self-host on a $5–$20/month VPS.
- Chroma: usually self-hosted, ~$5–$20/month for the VPS.
For most B2B SaaS automations: pgvector if you're already on Postgres and your vector count is under 1M. Pinecone or Qdrant if you cross that threshold or have specific perf requirements.
#Monitoring + observability
The line item teams skip in prototype and pay for in production. Realistic stack:
- Sentry for error monitoring: $0 free tier (5k events/month), $26/month for Team tier
- PostHog or Datadog for usage analytics + dashboards: $0–$45/month at moderate volume
- LangSmith or Langfuse for LLM-specific observability (prompt/response logging, eval traces): $0 free tier, $39–$199/month for paid tiers
- Better Uptime or equivalent for cron / endpoint monitoring: $0–$29/month
For a production-grade automation, monitoring lands at $20–$120/month all-in. That's the cost of being able to diagnose what went wrong when the automation produces weird output at 3 AM. Skipping it is the canonical false economy.
#Audit log storage
Required for any automation that touches customer data, financial records, or regulated workloads. Even when not legally required, it's required operationally. When a customer complains that the automation did the wrong thing, you need to be able to prove what it did and why.
Realistic costs:
- Postgres audit log table (in your existing DB): $0 incremental cost; storage cost is negligible at moderate volume
- S3 / R2 for long-term retention of full prompt + response payloads: $0.02/GB/month; usually $5–$20/month for moderate-volume automations
- Datadog logs or equivalent if you want unified search across application + audit logs: $30–$100/month at moderate volume
For B2B SaaS the right pattern is usually: structured audit log row in Postgres (timestamp, user, action, summary) + full payloads in S3 / R2 for retrieval when needed. Cheap and adequate for most regulatory contexts.
#Hidden costs
The ones founders don't budget for:
Prompt iteration cost. When a prompt isn't working, you iterate. Each iteration involves running the prompt against an evaluation set (often 50–500 test cases) and comparing outputs. That's not free; running 500 evaluation cases on a flagship model can be $5–$20 per iteration. Across 20–50 iterations during an automation's lifetime, that's $100–$1,000 in eval-only LLM spend.
Budget: ~10% of expected production LLM spend, allocated to evaluation runs.
Fallback model spend. When the primary vendor hits a rate limit or returns 5xx, your automation falls over to the secondary vendor. Most of the time this is fine and rare. Occasionally the primary has a multi-hour outage and 100% of traffic flips to the secondary, which charges different prices. Budget for occasional spikes.
Compliance / audit prep. If your customer is enterprise and you'll be audited (SOC 2, HIPAA, etc.), the audit logging and reporting infrastructure adds engineering time and tooling cost. Not a monthly LLM spend line item, but a one-time $5–25k workstream for the first audit, smaller for subsequent ones.
Drift management. LLM models change. The prompt that worked perfectly on Claude Sonnet 4.6 in April 2026 may behave differently when Anthropic releases Sonnet 4.8 in August 2026. Production automations need an ongoing eval discipline to catch drift; that's engineering time, not a vendor cost line item.
Plan for ~10–20% of LLM API spend going to these hidden costs over the lifetime of an automation.
#Realistic ranges by automation shape
| Shape | Monthly LLM spend | All-in monthly cost |
|---|---|---|
| Daily summary email (1–10 calls/day) | $1–$10 | $20–$50 (mostly monitoring) |
| Customer support triage (1k tickets/day) | $30–$150 | $80–$300 |
| Content moderation (100k items/day) | $200–$800 | $300–$1,000 |
| Sales enrichment (10k leads/day) | $50–$300 | $150–$500 |
| RAG over docs (5k queries/day) | $80–$400 | $150–$600 |
| Multi-step agent workflow (1k runs/day, 10 steps each) | $200–$1,500 | $400–$2,000 |
| High-volume real-time agent (50k+/day) | $1,000–$8,000 | $1,500–$12,000 |
These ranges are with reasonable cost discipline. Model routing, prompt caching, output constraint, monitoring in place. Without those, double the LLM spend. Without monitoring, expect production incidents that cost more than the monitoring would have.
#What we ship for cost discipline
For our AI Automation Sprint engagements, the default cost-control posture in 2026:
- Model routing layer that defaults to the cheap-fast model and escalates to flagship only when needed
- Prompt caching wired up from day 1 (free win)
- Output constraints in every prompt (max_tokens, structured outputs, stop sequences)
- Daily LLM spend dashboard with alerts at 50%, 80%, 100% of expected daily spend
- Hard kill-switch that can pause the automation if cost runs away (typically: a feature flag we can toggle in 30 seconds)
- Cost-per-execution metric tracked in monitoring so we can see if a prompt change spiked the per-call cost
That setup costs ~2–3 days of week-1 build and saves 30–60% of LLM spend over the automation's lifetime. The math is decisive. This is the work that separates "we shipped an automation" from "we shipped an automation that's economically sustainable."
#Bottom line
Production AI automations in 2026 cost real money, but the costs are predictable when you build with discipline. The cost goes up when teams skip the boring infrastructure work (routing, caching, monitoring) in favor of speed. The cost stays bounded when the boring infrastructure work gets shipped in week 1.
Plan for $200–$800/month for a moderate-volume B2B automation. Plan for the ramp from prototype ($5–50/month) to production (10–20x prototype) before shipping. Build the kill-switch before you need it.
— Want this for your SaaS?
AI Automation Sprints, shipped fortnightly ↗
Two-week cycles to ship internal-tool automations that actually save hours. n8n, LangChain, custom code. Opinionated stack, full handoff, paid for by the time it gives back.
— Keep reading
AI Automation
AI automation ROI: how to estimate hours saved before building
A practical framework for estimating the dollar value, payback period, and 12-month ROI of an AI automation engagement before you commit to building it. Inputs, formulas, common mistakes, and the worksheet that turns vibes into a defensible number.
Read post
AI Automation
Anthropic MCP for B2B SaaS automation: when to adopt
A practical guide to Model Context Protocol (MCP) for B2B SaaS automation in 2026. What MCP actually is, what it changes about agent tooling, the cases where it's the right call, and the cases where vendor-native tool calling is still the better default.
Read post
AI Automation
Internal tools vs customer-facing AI: scoping the right automation first
Why most B2B SaaS teams should ship internal AI automations before customer-facing ones. The scope, risk, and ROI differences. The scoping framework that decides which to build first. The patterns that make customer-facing AI fail.
Read post