Released in January 2025, o3-mini introduced adjustable reasoning effort levels (low, medium, high) as the first model to let developers explicitly trade inference cost for accuracy per request. At medium effort it matches o1 on AIME and GPQA Diamond while running 24% faster, and at high effort it reaches 97.9% on MATH and 49.3% on SWE-bench Verified. It was also the first small reasoning model to ship with function calling, Structured Outputs, and developer messages from day one.
o3-mini scores 86.9% on MMLU and 97.9% on MATH at high reasoning effort, with AIME 2024 reaching 83.6-87.3% depending on evaluation methodology. Compared to o1-mini on the same sheet, it delivers higher accuracy across all STEM benchmarks while costing 63% less ($1.10/$4.40 vs $3.00/$12.00 per million tokens). At medium effort it matches the full o1 model on GPQA Diamond (~78%), making it the strongest reasoning-per-dollar option on the sheet.
Running o3-mini through Telnyx Inference costs $1.10 per million input tokens and $4.40 per million output tokens. Processing 1,000,000 STEM reasoning tasks at 2,000 tokens each would cost approximately $5,500, a 63% reduction from o1-mini ($15,000) and an 88% reduction from o1-preview ($75,000) for comparable reasoning quality.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| deepseek-ai | DeepSeek-R1-Distill-Qwen-14B | text generation | English | 43,000 | 14.8B | medium | deepseek |
| fixie-ai | ultravox-v0_4_1-llama-3_1-8b | audio text-to-text | Multilingual | 8,000 | 8.7B | small | mit |
| gemma-2b-it | text generation | English | 8,192 | 2.5B | small | gemma | |
| gemma-7b-it | text generation | English | 8,192 | 8.5B | small | gemma | |
| meta-llama | Llama-3.3-70B-Instruct | text generation | Multilingual | 99,000 | 70.6B | large | llama3.3 |
| meta-llama | Llama-Guard-3-1B | safety classification | Multilingual | 128,000 | 1.5B | small | llama3.3 |
| meta-llama | Meta-Llama-3.1-70B-Instruct | text generation | Multilingual | 99,000 | 70.6B | large | llama3.1 |
| meta-llama | Meta-Llama-3.1-8B-Instruct | text generation | Multilingual | 131,072 | 8.0B | small | llama3.1 |
| minimaxai | MiniMax-M2.5 | text generation | English | 2,000,000 | 0 | large | minimaxai |
| mistralai | Mistral-7B-Instruct-v0.1 | text generation | English | 8,192 | 7.2B | small | apache-2.0 |
| mistralai | Mistral-7B-Instruct-v0.2 | text generation | English | 32,768 | 7.2B | small | apache-2.0 |
| mistralai | Mixtral-8x7B-Instruct-v0.1 | text generation | Multilingual | 32,768 | 46.7B | medium | apache-2.0 |
| moonshotai | Kimi-K2.5 | text generation | English | 256,000 | 1.0T | large | modified-mit |
| Qwen | Qwen3-235B-A22B | text generation | English | 32,768 | 235.1B | large | apache-2.0 |
| zai-org | GLM-5 | text generation | English | 202,752 | 753.9B | large | mit |
| zai-org | GLM-5.1-FP8 | text generation | English | 202,752 | 753.9B | large | mit |
| anthropic | claude-3-7-sonnet-latest | text generation | Multilingual | 200,000 | 0 | large | anthropic |
| anthropic | claude-haiku-4-5 | text generation | Multilingual | 200,000 | 0 | large | anthropic |
| anthropic | claude-opus-4-6 | text generation | Multilingual | 200,000 | 0 | large | anthropic |
| anthropic | claude-sonnet-4-20250514 | text generation | Multilingual | 200,000 | 0 | large | anthropic |
| gemini-2.0-flash | text generation | Multilingual | 1,048,576 | 0 | large | ||
| gemini-2.5-flash | text generation | Multilingual | 1,048,576 | 0 | large | ||
| gemini-2.5-flash-lite | text generation | Multilingual | 1,048,576 | 0 | large | ||
| groq | gpt-oss-120b | text generation | English | 131,072 | 117.0B | large | groq |
| groq | kimi-k2-instruct | text generation | English | 131,072 | 1.0T | large | groq |
| groq | llama-3.3-70b-versatile | text generation | Multilingual | 131,072 | 70.6B | large | llama3.3 |
| groq | llama-4-maverick-17b-128e-instruct | text generation | Multilingual | 1,000,000 | 400.0B | large | llama4 |
| groq | llama-4-scout-17b-16e-instruct | text generation | Multilingual | 128,000 | 109.0B | large | llama4 |
| openai | gpt-3.5-turbo | text generation | Multilingual | 4,096 | 0 | large | openai |
| openai | gpt-4 | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4-0125-preview | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4-0314 | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4-0613 | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4-1106-preview | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4-32k-0314 | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4-turbo-preview | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4.1 | text generation | Multilingual | 1,047,576 | 0 | large | openai |
| openai | gpt-4.1-mini | text generation | Multilingual | 1,047,576 | 0 | large | openai |
| openai | gpt-4o | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-4o-mini | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | gpt-5 | text generation | Multilingual | 400,000 | 0 | large | openai |
| openai | gpt-5-mini | text generation | Multilingual | 400,000 | 0 | large | openai |
| openai | gpt-5.1 | text generation | Multilingual | 400,000 | 0 | large | openai |
| openai | gpt-5.2 | text generation | Multilingual | 400,000 | 0 | large | openai |
| openai | o1-mini | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | o1-preview | text generation | Multilingual | 128,000 | 0 | large | openai |
| openai | o3-mini | text generation | Multilingual | 200,000 | 0 | large | openai |
| xai-org | grok-2 | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-2-latest | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3 | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-beta | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-fast | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-fast-beta | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-fast-latest | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-latest | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-mini | text generation | Multilingual | 131,072 | 0 | large | xai |
| xai-org | grok-3-mini-fast | text generation | Multilingual | 131,072 | 0 | large | xai |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Check out our helpful tools to help get you started.
o3-mini is OpenAI's cost-efficient reasoning model, released January 2025. It uses a private chain-of-thought process to deliberate before answering, with three adjustable effort levels (low, medium, high) that let developers trade speed for accuracy. It scores 97.9% on MATH and 49.3% on SWE-bench Verified at high effort.
Free ChatGPT users can access o3-mini with usage limits. Plus and Team subscribers get triple the rate limits compared to o1-mini, and Pro users have unlimited access. API pricing is $1.10 per million input tokens and $4.40 per million output tokens.
o3-mini scores 86.9% on MMLU at high effort, comparable to GPT-4's 86.4%, but dramatically outperforms it on reasoning benchmarks like AIME 2024 (87.3% vs ~12%) and MATH (97.9% vs ~52%). The tradeoff is that o3-mini is optimized for STEM reasoning rather than general conversation and creative writing.
The high effort setting allocates maximum compute to reasoning, scoring 87.3% on AIME 2024 and 79.7% on GPQA Diamond at the cost of longer response times. It is suited for competition-level math, complex code generation, and scientific analysis where accuracy matters more than speed.
Independent benchmarks show o3-mini outperforming DeepSeek R1 on coding speed while delivering comparable accuracy. o3-mini also supports native function calling and Structured Outputs, features DeepSeek R1 lacks, making it more practical for production tool-use pipelines.
At high effort, o3-mini matches o1 on GPQA Diamond (~78-79%) and approaches it on AIME 2024 (87.3% vs 93.4%). At medium effort it matches o1 on most benchmarks while running 24% faster and costing 63% less, making it the stronger value for most STEM workloads.
o3-mini is specifically a reasoning model trained via reinforcement learning to think before responding. Microsoft describes it as a dedicated reasoning model with adjustable compute allocation, fundamentally different from standard chat models that generate responses token by token.
The reasoning effort parameter controls how much compute o3-mini dedicates to thinking, with three levels: low (fastest, cheapest), medium (default, matches o1 on most tasks), and high (strongest accuracy at 97.9% MATH). Developers set this per request through the API, allowing the same model to handle both simple and complex queries at different cost points.