How does the LLM pricing calculator work?

Enter your expected input and output tokens per run, how many runs you execute each month, and an optional buffer. The calculator multiplies those inputs by each model's published per-million token rates to estimate per-run, monthly, and annual spend.

Which models are included in the calculator?

We track current pricing for OpenAI GPT-5, GPT-4.1, and GPT-4o mini, Anthropic Claude Sonnet 4.5 and Haiku 4.5, Google Gemini 2.5 Pro and Flash, plus Groq-hosted Llama 3.3 and Llama 4 snapshots. We update the data when vendors refresh pricing.

How often are token prices updated?

Thread Deck reviews vendor changelogs monthly and updates the calculator as soon as pricing shifts are announced by OpenAI, Anthropic, Google, Meta, or Mistral.

What is a good overage buffer to apply?

Most teams work with a 10-20% buffer to cover bursty usage. If your workload is highly seasonal or you run eval sweeps, increase the buffer to 25-30% so you have room for spike days.

Visual canvasThread Deck

Updated October 2024

LLM Pricing Calculator

Predict your GPT, Claude, Gemini, and Llama costs before you ship. Thread Deck's free calculator combines current per-million token pricing with your own usage assumptions so you can spot the right model mix, budget accurately, and defend margins.

No login requiredReal vendor ratesMulti-model comparisons

Input your usage assumptions

Provider

Model

Input tokens per run

Prompt, persona, and context tokens you send to the model each run.

Output tokens per run

Average response length in tokens. Adjust for larger analyses.

Runs per month

How many production calls you expect across your team each month.

Overage buffer %

Cushion for unexpected spikes. We recommend 10-20% for healthier guardrails.

Estimated spend

Cost per run$0.01

Monthly cost$3.12

Monthly cost with buffer$3.59

Annualised cost with buffer$43.1

Effective $ / 1K tokens$0.01

Total tokens per month480,000 tokens

How the calculator forecasts LLM spend

The math mirrors what Stripe or your model provider charges: we convert your input and output tokens into millions, multiply by the published per-million rates, and then factor in your run volume. The optional buffer applies a safety margin so finance and product teams see a realistic upper bound.

1. Capture context scope

Input tokens represent personas, briefs, and any retrieval context you pass to the model before generation.

2. Estimate response size

Output tokens translate to the expected length of each answer, summarisation, or decision packet.

3. Multiply by run cadence

Runs per month let you model steady-state operations and scale up experiments or agents over time.

4. Add guardrails

Keep a buffer so last-minute campaigns, eval sweeps, or customer escalations never surprise finance.

Inside Thread Deck, these numbers sync automatically with every run, so your canvas always shows the model used, tokens burned, and cost impact in context.

When to upgrade your model roster

Baseline automationStart with GPT-4o mini, Claude Haiku 4.5, or Gemini 2.0 Flash Lite. They deliver clean prose for under $1 per 1K tokens.
Premium reasoningUpgrade to GPT-4o, Claude 3.5 Sonnet, or Gemini Pro when you need richer chain-of-thought or tool orchestration.
Enterprise guardrailsPick Groq-hosted Llama or self-hosted models when regional compliance, privacy, or fine-tuned tone matter most.

Tip

Track experiments in Thread Deck and you'll see real-time token burn alongside every block, making it obvious when to renegotiate with vendors.

Popular model pricing at a glance

Below is a snapshot of current large language model pricing. Rates are listed per one million tokens to align with vendor billing. Always confirm with your provider before provisioning production workloads.

Model	Provider	Input $ / 1M	Output $ / 1M	Context window	Best for
GPT-5 2025-04	OpenAI	$1.25	$10.0	200,000 tokens	Mission-critical copilots Advanced analytics Enterprise assistants
GPT-4.1 2025-04	OpenAI	$2.00	$8.00	200,000 tokens	Product UIs Multi-turn workflows Compliance narratives
GPT-4o 2024-11	OpenAI	$2.50	$10.0	128,000 tokens	Support copilot Realtime chat Content orchestration
GPT-4o mini 2024-07	OpenAI	$0.15	$0.60	128,000 tokens	Background automation Marketing ops Bulk content refresh
o1 2024-09	OpenAI	$15.0	$60.0	128,000 tokens	Agent planning Evaluation harnesses Complex QA
Claude Sonnet 4.5 2025-09	Anthropic	$3.00	$15.0	1,000,000 tokens	Knowledge workflows Enterprise copilots Agent handoffs
Claude Haiku 4.5 2025-10	Anthropic	$1.00	$5.00	200,000 tokens	Realtime copilots Localization Customer support
Claude Opus 4.1 2025-08	Anthropic	$15.0	$75.0	200,000 tokens	Executive research Risk reviews High-stakes drafting
Gemini 2.5 Pro 2025-02	Google	$1.25	$10.0	1,000,000 tokens	Multimodal analysis Code copilots Long-form research
Gemini 2.5 Flash 2025-02	Google	$0.30	$2.50	1,000,000 tokens	Product chat High-volume summarisation Triggered automations
Gemini 2.0 Flash Lite 2025-01	Google	$0.15	$1.25	1,000,000 tokens	Mobile UX Notification copy Inline support widgets
Llama 3.3 70B (Groq) 2024-12	Groq	$0.59	$0.79	128,000 tokens	On-brand tone control Latency-sensitive UX Fine-tune distillation
Llama 4 Scout (Groq) 2025-03	Groq	$0.11	$0.34	128,000 tokens	Monitoring alerts Tool routing Structured extraction

LLM pricing FAQs

What drives the cost of an LLM run?

Providers bill for input tokens (everything you send: persona, context, conversation history) and output tokens (the model's reply). Higher-capacity models charge multiples of the base rate, and some vendors add surcharges for tool calls or image inputs.

How should I plan for bursts or eval sweeps?

Start with your steady-state demand, then layer in buffer for bursts. Teams typically add 10-20% overhead to cover campaigns, regression sweeps, and unforeseen chat escalations. The calculator's buffer slider handles that math automatically.

Does this include fine-tuning or data hosting costs?

The calculator focuses on inference pricing. Fine-tuning fees, vector storage, and retrieval queries vary by provider. Inside Thread Deck you can track those lines separately so finance sees a full-stack cost picture.

How often do you refresh pricing?

We monitor OpenAI, Anthropic, Google, Meta, and Mistral release notes weekly. When prices shift, we update the calculator and note the change log inside Thread Deck so your team stays ahead of any margin impact.

Keep pricing in context with Thread Deck

Every canvas in Thread Deck logs model choice, token burn, and spend automatically. Invite finance, ops, and PMs, so everyone sees the same story before green-lighting a new agent or workflow.

Loading Thread Deck…

LLM Pricing Calculator

No login requiredReal vendor ratesMulti-model comparisons

How the calculator forecasts LLM spend

1. Capture context scope

Input tokens represent personas, briefs, and any retrieval context you pass to the model before generation.

2. Estimate response size

Output tokens translate to the expected length of each answer, summarisation, or decision packet.

3. Multiply by run cadence

Runs per month let you model steady-state operations and scale up experiments or agents over time.

4. Add guardrails

Keep a buffer so last-minute campaigns, eval sweeps, or customer escalations never surprise finance.

Inside Thread Deck, these numbers sync automatically with every run, so your canvas always shows the model used, tokens burned, and cost impact in context.

Popular model pricing at a glance

Model	Provider	Input $ / 1M	Output $ / 1M	Context window	Best for
GPT-5 2025-04	OpenAI	$1.25	$10.0	200,000 tokens	Mission-critical copilots Advanced analytics Enterprise assistants
GPT-4.1 2025-04	OpenAI	$2.00	$8.00	200,000 tokens	Product UIs Multi-turn workflows Compliance narratives
GPT-4o 2024-11	OpenAI	$2.50	$10.0	128,000 tokens	Support copilot Realtime chat Content orchestration
GPT-4o mini 2024-07	OpenAI	$0.15	$0.60	128,000 tokens	Background automation Marketing ops Bulk content refresh
o1 2024-09	OpenAI	$15.0	$60.0	128,000 tokens	Agent planning Evaluation harnesses Complex QA
Claude Sonnet 4.5 2025-09	Anthropic	$3.00	$15.0	1,000,000 tokens	Knowledge workflows Enterprise copilots Agent handoffs
Claude Haiku 4.5 2025-10	Anthropic	$1.00	$5.00	200,000 tokens	Realtime copilots Localization Customer support
Claude Opus 4.1 2025-08	Anthropic	$15.0	$75.0	200,000 tokens	Executive research Risk reviews High-stakes drafting
Gemini 2.5 Pro 2025-02	Google	$1.25	$10.0	1,000,000 tokens	Multimodal analysis Code copilots Long-form research
Gemini 2.5 Flash 2025-02	Google	$0.30	$2.50	1,000,000 tokens	Product chat High-volume summarisation Triggered automations
Gemini 2.0 Flash Lite 2025-01	Google	$0.15	$1.25	1,000,000 tokens	Mobile UX Notification copy Inline support widgets
Llama 3.3 70B (Groq) 2024-12	Groq	$0.59	$0.79	128,000 tokens	On-brand tone control Latency-sensitive UX Fine-tune distillation
Llama 4 Scout (Groq) 2025-03	Groq	$0.11	$0.34	128,000 tokens	Monitoring alerts Tool routing Structured extraction