o3-mini

OpenAI's cost-efficient reasoning model with three adjustable effort levels, delivering o1-class math and coding performance at 63% lower cost than o1-mini.

about

Released in January 2025, o3-mini introduced adjustable reasoning effort levels (low, medium, high) as the first model to let developers explicitly trade inference cost for accuracy per request. At medium effort it matches o1 on AIME and GPQA Diamond while running 24% faster, and at high effort it reaches 97.9% on MATH and 49.3% on SWE-bench Verified. It was also the first small reasoning model to ship with function calling, Structured Outputs, and developer messages from day one.

Licenseopenai
Context window(in thousands)200,000

Use cases for o3-mini

  1. Cost-controlled STEM reasoning: Three discrete effort levels let developers route competition-level math problems to high effort (87.3% AIME) while keeping simple queries on low effort at a fraction of the cost.
  2. Automated code generation with structured output: Native function calling and Structured Outputs support enables it to generate code and return results in typed JSON schemas, suited for CI/CD pipelines and automated testing.
  3. Batch scientific analysis: Batch API support combined with 97.9% on MATH makes it practical for processing thousands of quantitative research queries overnight at $1.10 per million input tokens.

Quality

Arena Elo1337
MMLUN/A
MT BenchN/A

o3-mini scores 86.9% on MMLU and 97.9% on MATH at high reasoning effort, with AIME 2024 reaching 83.6-87.3% depending on evaluation methodology. Compared to o1-mini on the same sheet, it delivers higher accuracy across all STEM benchmarks while costing 63% less ($1.10/$4.40 vs $3.00/$12.00 per million tokens). At medium effort it matches the full o1 model on GPQA Diamond (~78%), making it the strongest reasoning-per-dollar option on the sheet.

gpt-oss-120b

1354

o1-mini

1337

o3-mini

1337

llama-4-17b-128e-instruct

1327

gpt-4-turbo-preview

1324

pricing

Running o3-mini through Telnyx Inference costs $1.10 per million input tokens and $4.40 per million output tokens. Processing 1,000,000 STEM reasoning tasks at 2,000 tokens each would cost approximately $5,500, a 63% reduction from o1-mini ($15,000) and an 88% reduction from o1-preview ($75,000) for comparable reasoning quality.

What's Twitter saying?

  • Developer-friendly reasoning: Nikunj Handa from OpenAI calls o3-mini the most feature-complete o-series model released to date, with function calling, structured outputs, and developer messages built in from day one.
  • o1-class performance at mini cost: Developer algo_diver notes the model has reached o1-level performance in the mini class, with the system card confirming significant gains over o1-mini across benchmarks.
  • Mixed coding reception: The r/ChatGPTCoding community found o3-mini disappointing for coding tasks compared to expectations, while others praised its one-shot code accuracy and low failure rate.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai
Model NameDeepSeek-R1-Distill-Qwen-14B
Taskstext generation
Languages SupportedEnglish
Context Length43,000
Parameters14.8B
Model Tiermedium
Licensedeepseek

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

Sign up and start building

faqs

What is OpenAI o3-mini?

o3-mini is OpenAI's cost-efficient reasoning model, released January 2025. It uses a private chain-of-thought process to deliberate before answering, with three adjustable effort levels (low, medium, high) that let developers trade speed for accuracy. It scores 97.9% on MATH and 49.3% on SWE-bench Verified at high effort.

Is o3-mini free to use?

Free ChatGPT users can access o3-mini with usage limits. Plus and Team subscribers get triple the rate limits compared to o1-mini, and Pro users have unlimited access. API pricing is $1.10 per million input tokens and $4.40 per million output tokens.

Is o3-mini better than GPT-4?

o3-mini scores 86.9% on MMLU at high effort, comparable to GPT-4's 86.4%, but dramatically outperforms it on reasoning benchmarks like AIME 2024 (87.3% vs ~12%) and MATH (97.9% vs ~52%). The tradeoff is that o3-mini is optimized for STEM reasoning rather than general conversation and creative writing.

What is o3-mini high used for?

The high effort setting allocates maximum compute to reasoning, scoring 87.3% on AIME 2024 and 79.7% on GPQA Diamond at the cost of longer response times. It is suited for competition-level math, complex code generation, and scientific analysis where accuracy matters more than speed.

Is o3-mini better than DeepSeek R1 for coding?

Independent benchmarks show o3-mini outperforming DeepSeek R1 on coding speed while delivering comparable accuracy. o3-mini also supports native function calling and Structured Outputs, features DeepSeek R1 lacks, making it more practical for production tool-use pipelines.

Is o3-mini high as good as o1?

At high effort, o3-mini matches o1 on GPQA Diamond (~78-79%) and approaches it on AIME 2024 (87.3% vs 93.4%). At medium effort it matches o1 on most benchmarks while running 24% faster and costing 63% less, making it the stronger value for most STEM workloads.

Does o3-mini have reasoning capabilities?

o3-mini is specifically a reasoning model trained via reinforcement learning to think before responding. Microsoft describes it as a dedicated reasoning model with adjustable compute allocation, fundamentally different from standard chat models that generate responses token by token.

What is the reasoning effort parameter?

The reasoning effort parameter controls how much compute o3-mini dedicates to thinking, with three levels: low (fastest, cheapest), medium (default, matches o1 on most tasks), and high (strongest accuracy at 97.9% MATH). Developers set this per request through the API, allowing the same model to handle both simple and complex queries at different cost points.