Predict your GPT, Claude, Gemini, and Llama costs before you ship. Thread Deck's free calculator combines current per-million token pricing with your own usage assumptions so you can spot the right model mix, budget accurately, and defend margins.
Prompt, persona, and context tokens you send to the model each run.
Average response length in tokens. Adjust for larger analyses.
How many production calls you expect across your team each month.
Cushion for unexpected spikes. We recommend 10-20% for healthier guardrails.
The math mirrors what Stripe or your model provider charges: we convert your input and output tokens into millions, multiply by the published per-million rates, and then factor in your run volume. The optional buffer applies a safety margin so finance and product teams see a realistic upper bound.
1. Capture context scope
Input tokens represent personas, briefs, and any retrieval context you pass to the model before generation.
2. Estimate response size
Output tokens translate to the expected length of each answer, summarisation, or decision packet.
3. Multiply by run cadence
Runs per month let you model steady-state operations and scale up experiments or agents over time.
4. Add guardrails
Keep a buffer so last-minute campaigns, eval sweeps, or customer escalations never surprise finance.
Inside Thread Deck, these numbers sync automatically with every run, so your canvas always shows the model used, tokens burned, and cost impact in context.
Tip
Track experiments in Thread Deck and you'll see real-time token burn alongside every block, making it obvious when to renegotiate with vendors.
Below is a snapshot of current large language model pricing. Rates are listed per one million tokens to align with vendor billing. Always confirm with your provider before provisioning production workloads.
| Model | Provider | Input $ / 1M | Output $ / 1M | Context window | Best for |
|---|---|---|---|---|---|
GPT-5 2025-04 | OpenAI | $1.25 | $10.0 | 200,000 tokens |
|
GPT-4.1 2025-04 | OpenAI | $2.00 | $8.00 | 200,000 tokens |
|
GPT-4o 2024-11 | OpenAI | $2.50 | $10.0 | 128,000 tokens |
|
GPT-4o mini 2024-07 | OpenAI | $0.15 | $0.60 | 128,000 tokens |
|
o1 2024-09 | OpenAI | $15.0 | $60.0 | 128,000 tokens |
|
Claude Sonnet 4.5 2025-09 | Anthropic | $3.00 | $15.0 | 1,000,000 tokens |
|
Claude Haiku 4.5 2025-10 | Anthropic | $1.00 | $5.00 | 200,000 tokens |
|
Claude Opus 4.1 2025-08 | Anthropic | $15.0 | $75.0 | 200,000 tokens |
|
Gemini 2.5 Pro 2025-02 | $1.25 | $10.0 | 1,000,000 tokens |
| |
Gemini 2.5 Flash 2025-02 | $0.30 | $2.50 | 1,000,000 tokens |
| |
Gemini 2.0 Flash Lite 2025-01 | $0.15 | $1.25 | 1,000,000 tokens |
| |
Llama 3.3 70B (Groq) 2024-12 | Groq | $0.59 | $0.79 | 128,000 tokens |
|
Llama 4 Scout (Groq) 2025-03 | Groq | $0.11 | $0.34 | 128,000 tokens |
|
Providers bill for input tokens (everything you send: persona, context, conversation history) and output tokens (the model's reply). Higher-capacity models charge multiples of the base rate, and some vendors add surcharges for tool calls or image inputs.
Start with your steady-state demand, then layer in buffer for bursts. Teams typically add 10-20% overhead to cover campaigns, regression sweeps, and unforeseen chat escalations. The calculator's buffer slider handles that math automatically.
The calculator focuses on inference pricing. Fine-tuning fees, vector storage, and retrieval queries vary by provider. Inside Thread Deck you can track those lines separately so finance sees a full-stack cost picture.
We monitor OpenAI, Anthropic, Google, Meta, and Mistral release notes weekly. When prices shift, we update the calculator and note the change log inside Thread Deck so your team stays ahead of any margin impact.