Released in January 2025, o3-mini introduced adjustable reasoning effort levels (low, medium, high) as the first model to let developers explicitly trade inference cost for accuracy per request. At medium effort it matches o1 on AIME and GPQA Diamond while running 24% faster, and at high effort it reaches 97.9% on MATH and 49.3% on SWE-bench Verified. It was also the first small reasoning model to ship with function calling, Structured Outputs, and developer messages from day one.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
o3-mini excels at STEM reasoning, coding, and mathematical problem-solving tasks. It is OpenAI's recommended reasoning model for applications that need strong analytical capability at lower cost than the full o3.
o3-mini is available with usage limits in ChatGPT's free tier. API access is paid, with pricing documented in OpenAI's model reference.
o3-mini scores 86.9% on MMLU and 97.9% on MATH at high reasoning effort, with AIME 2024 reaching 83.6-87.3% depending on evaluation methodology. Compared to o1-mini on the same sheet, it delivers higher accuracy across all STEM benchmarks while costing 63% less ($1.10/$4.40 vs $3.00/$12.00 per million tokens). At medium effort it matches the full o1 model on GPQA Diamond (~78%), making it the strongest reasoning-per-dollar option on the sheet.
Running o3-mini through Telnyx Inference costs $1.10 per million input tokens and $4.40 per million output tokens. Processing 1,000,000 STEM reasoning tasks at 2,000 tokens each would cost approximately $5,500, a 63% reduction from o1-mini ($15,000) and an 88% reduction from o1-preview ($75,000) for comparable reasoning quality.
o3-mini-high uses more compute per query to improve accuracy on complex reasoning tasks. This configuration trades speed for quality on problems that benefit from deeper thinking, making it suited for technical analysis and code review.
o3-mini is free to use in ChatGPT with rate limits. Through the API, it requires a paid account. Infrastructure providers also offer hosted access for production workloads.
o3-mini and DeepSeek R1 are competitive on reasoning benchmarks, with o3-mini generally leading on math and science tasks. The choice often depends on deployment constraints: R1 is open-weight and self-hostable, while o3-mini is API-only through OpenAI and partner platforms.
o3-mini holds an edge on structured coding benchmarks like SWE-bench, while DeepSeek R1 is strong on code generation from natural language prompts. For production voice AI pipelines, the choice depends on latency and infrastructure requirements rather than raw benchmark scores alone.