Best AI Model for Math & Scientific Reasoning
Solving math problems, scientific analysis, physics simulations, and formal reasoning. Needs top-tier GPQA scores.
Our Verdict
Gemini 3.1 Pro destroys the competition here — 94.3% GPQA Diamond is the highest of any model, period. For budget math, o3 at $0.40/$1.60 with 79.2% GPQA and chain-of-thought reasoning is unbeatable value. DeepSeek R1 at $0.55/$2.19 is another budget reasoning option with strong math scores. Don't use budget non-reasoning models for math — they hallucinate numbers and skip steps.
Top Picks
94.3% GPQA Diamond — highest of any model
Best for: Graduate-level science and complex math
Input
$2/1M
Output
$12/1M
Context
1M
Max Output
64K
Input
$0.4/1M
Output
$1.6/1M
Context
200K
Max Output
100K
71.5% GPQA at $0.55/$2.19 — cheapest reasoning model
Best for: Budget math with reasoning chains
Input
$0.55/1M
Output
$2.19/1M
Context
128K
Max Output
64K
What Matters for Math & Science
Key Factors
- •GPQA score
- •Reasoning chains
- •Mathematical accuracy
Tips
- ✓Reasoning models (o3, DeepSeek R1) dominate here with chain-of-thought
- ✓Gemini 3.1 Pro has the highest GPQA Diamond score (94.3%)
- ✓Don't use budget models for math — accuracy drops significantly
Full Ranking (All Compatible Models)
| Rank | Model | Input | Output | GPQA | Score |
|---|---|---|---|---|---|
| #1 | Gemini 3.1 ProGoogle | $2.00 | $12.00 | 94.3% | 139 |
| #2 | o3OpenAI | $0.40 | $1.60 | 79.2% | 134 |
| #3 | DeepSeek R1DeepSeek | $0.55 | $2.19 | 71.5% | 125 |
| #4 | o4-miniOpenAI | $1.10 | $4.40 | 76% | 105 |
| #5 | Gemini 3 ProGoogle | $2.00 | $12.00 | 77% | 103 |
| #6 | GLM-5Zhipu AI | $1.00 | $3.20 | 72% | 103 |
| #7 | GPT-5.3 CodexOpenAI | $2.00 | $16.00 | 78% | 103 |
| #8 | GLM-4.7Zhipu AI | $0.60 | $2.20 | 85.7% | 103 |
| #9 | GPT-5.2 CodexOpenAI | $1.75 | $14.00 | 76% | 102 |
| #10 | Gemini 2.5 ProGoogle | $1.25 | $10.00 | 76% | 93 |
| #11 | Claude Opus 4.6Anthropic | $5.00 | $25.00 | 75.5% | 92 |
| #12 | MiniMax M2.5MiniMax | $0.30 | $1.20 | 86.0% | 87 |
| #13 | GPT-5OpenAI | $1.25 | $10.00 | 73.5% | 87 |
| #14 | Gemini 2.5 FlashGoogle | $0.15 | $0.60 | 82.8% | 86 |
| #15 | Gemini 3 FlashGoogle | $0.50 | $3.00 | 84.0% | 85 |
| #16 | Grok 4xAI | $3.00 | $15.00 | 72% | 84 |
| #17 | Mistral Large 3Mistral | $2.00 | $5.00 | 87.0% | 80 |
| #18 | Claude Sonnet 4.6Anthropic | $3.00 | $15.00 | 70% | 77 |
| #19 | Claude Sonnet 4.5Anthropic | $3.00 | $15.00 | 68.2% | 75 |
| #20 | DeepSeek V3DeepSeek | $0.14 | $0.28 | 83.5% | 74 |