📡 AI API Radar

Cheapest LLM API by task tier

Models ranked by blended $/M-token cost (output weighted 3:1, the typical agent mix), grouped by minimum context window. Live status shown per row.

Bottom line: the cheapest tracked model overall is Command R7B (12-2024) (Cohere) at $0.04/1M in, $0.15/1M out.

Cheapest with ≥128K context

#ModelProviderInput /1MOutput /1MContextBlended
1 Command R7B (12-2024) Cohere $0.04$0.15128K$0.12
2 GPT-OSS 20B Groq $0.08$0.3131K$0.24
3 DeepSeek V4 Flash DeepSeek $0.14$0.281M$0.25
4 Llama 3.3 70B (via OpenRouter) OpenRouter $0.1$0.32131K$0.27
5 Llama 4 Scout (17Bx16E) Groq $0.11$0.34131K$0.28
6 GPT-4.1 nano OpenAI $0.1$0.41M$0.33
7 Gemini 2.5 Flash-Lite Google Gemini $0.1$0.41.048576M$0.33
8 GPT-4o mini (legacy) OpenAI $0.15$0.6128K$0.49

Absolute cheapest (any context)

#ModelProviderInput /1MOutput /1MContextBlended
1 Command R7B (12-2024) Cohere $0.04$0.15128K$0.12
2 GPT-OSS 20B Groq $0.08$0.3131K$0.24
3 DeepSeek V4 Flash DeepSeek $0.14$0.281M$0.25
4 Llama 3.3 70B (via OpenRouter) OpenRouter $0.1$0.32131K$0.27
5 Llama 4 Scout (17Bx16E) Groq $0.11$0.34131K$0.28
6 GPT-4.1 nano OpenAI $0.1$0.41M$0.33
7 Gemini 2.5 Flash-Lite Google Gemini $0.1$0.41.048576M$0.33
8 GPT-4o mini (legacy) OpenAI $0.15$0.6128K$0.49

Cheapest with ≥1M context

#ModelProviderInput /1MOutput /1MContextBlended
1 DeepSeek V4 Flash DeepSeek $0.14$0.281M$0.25
2 GPT-4.1 nano OpenAI $0.1$0.41M$0.33
3 Gemini 2.5 Flash-Lite Google Gemini $0.1$0.41.048576M$0.33
4 DeepSeek V4 Pro DeepSeek $0.44$0.871M$0.76
5 DeepSeek V4 Pro (via OpenRouter) OpenRouter $0.44$0.871.048576M$0.76
6 Gemini 3.1 Flash-Lite Google Gemini $0.25$1.51.048576M$1.19

Automate this: agents call cheapest_model(min_context=…, operational_only=true) over MCP/x402 to route to the cheapest live model per request. API docs →