LLM Comparison Matrix

Ratings are directional, not absolute. Updated May 31, 2026 with latest releases and pricing changes. Use 1-5 ranking to compare faster, then sort by what matters most for your product.

Latest matrix includes: Claude Opus 4.8 (new frontier), Claude Sonnet 4.6 (1M context fast), GPT-5.5-Pro (highest intelligence), GPT-5.5, GPT-5.4 mini (coding/subagents), Gemini 3.5 Flash (stable), Gemma 4 31B (open reasoning), Llama 4 Maverick (402B MoE), DeepSeek V4-Pro (862B), Qwen3.5-35B-A3B (MoE coding), and Mistral Medium 3.5.

Model Overall Reasoning Coding Cost Efficiency Latency Context Quality Deployment Control
Claude Opus 4.8 4.5/5★★★★★ 5/5★★★★★ 5/5★★★★★ 3/5★★★☆☆ 3/5★★★☆☆ 5/5★★★★★ 2/5★★☆☆☆
GPT-5.5-Pro 4.3/5★★★★☆ 5/5★★★★★ 5/5★★★★★ 2/5★★☆☆☆ 2/5★★☆☆☆ 5/5★★★★★ 2/5★★☆☆☆
GPT-5.5 4.2/5★★★★☆ 5/5★★★★★ 5/5★★★★★ 3/5★★★☆☆ 3/5★★★☆☆ 5/5★★★★★ 2/5★★☆☆☆
GPT-5.4 mini 4.0/5★★★★☆ 4/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 4/5★★★★☆ 4/5★★★★☆ 2/5★★☆☆☆
Claude Sonnet 4.6 4.0/5★★★★☆ 5/5★★★★★ 5/5★★★★★ 3/5★★★☆☆ 4/5★★★★☆ 5/5★★★★★ 2/5★★☆☆☆
Claude Haiku 4.5 3.5/5★★★★☆ 4/5★★★★☆ 3/5★★★☆☆ 4/5★★★★☆ 5/5★★★★★ 3/5★★★☆☆ 2/5★★☆☆☆
Gemini 3.5 Flash 3.8/5★★★★☆ 4/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 2/5★★☆☆☆
Gemma 4 31B 4.1/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 4/5★★★★☆ 5/5★★★★★
Llama 4 Maverick 4.3/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 4/5★★★★☆ 4/5★★★★☆ 4/5★★★★☆ 5/5★★★★★
Llama 4 Scout 4.0/5★★★★☆ 4/5★★★★☆ 4/5★★★★☆ 5/5★★★★★ 5/5★★★★★ 3/5★★★☆☆ 5/5★★★★★
Mistral Medium 3.5 3.8/5★★★★☆ 4/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 4/5★★★★☆ 3/5★★★☆☆ 4/5★★★★☆
Qwen3.5-35B-A3B 4.0/5★★★★☆ 4/5★★★★☆ 5/5★★★★★ 5/5★★★★★ 4/5★★★★☆ 4/5★★★★☆ 4/5★★★★☆
DeepSeek V4-Pro 4.0/5★★★★☆ 5/5★★★★★ 4/5★★★★☆ 4/5★★★★☆ 3/5★★★☆☆ 4/5★★★★☆ 4/5★★★★☆

Score key: 5 = excellent, 4 = strong, 3 = medium, 2 = low.

Interpreting the Matrix

For customer-facing quality, prioritize reasoning + context quality. For internal automation at scale, prioritize cost efficiency + latency.