LLM Cost Estimator

Compare API costs across models. Enter your usage and see which model fits your budget.

Sorted cheapest first · Prices per 1M tokens
ModelPer reqMonthly total
Gemini 1.5 FlashGooglecheapest$0.000225$0.2250
Gemini 2.0 FlashGoogle$0.000300$0.3000
Llama 3.3 70BMeta/OSS$0.000430$0.4300
GPT-4o miniOpenAI$0.000450$0.4500
Claude Haiku 3.5Anthropic$0.002800$2.80
o1 miniOpenAI$0.003300$3.30
Gemini 1.5 ProGoogle$0.003750$3.75
Mistral LargeMistral$0.005000$5.00
GPT-4oOpenAI$0.007500$7.50
Claude Sonnet 4Anthropic$0.0105$10.50
GPT-4 TurboOpenAI$0.0250$25.00
o1OpenAI$0.0450$45.00
Claude Opus 4Anthropic$0.0525$52.50

Prices are approximate and change frequently. Always verify with provider pricing pages before production use.

Deploy AI agents

Build and deploy production-ready AI agents.

About This Tool

LLM API pricing is per-token, with separate input and output rates that vary by model. Input tokens (your prompt) are typically 3–10x cheaper than output (the response). At late-2025 pricing, GPT-4o is $2.50/$10 per million tokens, Claude Sonnet $3/$15, Gemini 1.5 Pro $1.25/$5. A token is roughly 4 characters of English.

The estimator takes input/output token counts, model, and request volume and returns total cost.

A token in LLM context is the unit a tokenizer breaks text into — not exactly a word, character, or syllable. The most common tokenizer (BPE — byte pair encoding) splits text into subword pieces, where common words are single tokens and rare or compound words split into pieces. "Hello" is 1 token; "unhappiness" is typically 3 (un, happi, ness); a unique product code might tokenize each character separately. The 4-characters-per-token approximation works for English; code, JSON, and other languages tokenize differently. Different model families (GPT, Claude, Gemini, Llama) use different tokenizers, producing slightly different counts on the same text — usually within 10%. Output tokens are more expensive than input because output is generated sequentially (one forward pass per token) while input is processed in parallel.

A worked example: a customer-support chatbot processes 100,000 conversations per month. Average conversation: 800 input tokens (system prompt + history) + 200 output tokens. Using Claude Sonnet at $3/$15 per million: input cost = (100,000 × 800 / 1,000,000) × $3 = $240. Output cost = (100,000 × 200 / 1,000,000) × $15 = $300. Total: $540 per month. Switching to Claude Haiku ($0.80/$4) would reduce cost to $144. Adding prompt caching (cached input at ~10% of regular price for portions reused) on a 700-token system prompt can drop input cost further to roughly $90, total around $230 — a 60% reduction without changing the model.

Limitations: pricing changes frequently. Models add new versions, deprecate old ones, and shift price tiers — the estimator's rate table needs periodic updates. Token counts vary by tokenizer and content (code is denser than prose). Real-world cost includes retries, failures, and tooling overhead the estimator doesn't capture. For high-volume production use, instrument actual token usage from API responses rather than estimating from prompt content. Caching, batch APIs, and provisioned throughput offer significant discounts the basic calculator doesn't model. The estimator is for capacity planning, not production billing.

The about text and FAQ on this page were drafted with AI assistance and reviewed by a member of the Coherence Daddy team before publishing. See our Content Policy for editorial standards.

Frequently Asked Questions