gpt-oss-120b

OpenAI's first open-weight model under Apache 2.0, activating 5.1B of 117B total parameters per token for efficient reasoning and agentic tool use.

about

OpenAI's first open-weight release uses 128 experts per layer with top-4 routing, keeping 5.1B of 116.8B total parameters active per token, and fits on a single 80GB GPU through MXFP4 post-training quantization. Trained over 2.1 million H100-hours with a STEM and coding focus, it scores 96.6% on AIME 2024 and reaches a Codeforces Elo of 2,622 with configurable low/medium/high reasoning effort.

Licensegroq
Context window(in thousands)131,072

Use cases for gpt-oss-120b

  1. Single-GPU frontier inference: MXFP4 quantization fits the full 116.8B-parameter model on one H100 80GB GPU, making frontier reasoning accessible without multi-node infrastructure.
  2. Configurable reasoning effort: Three reasoning modes (low/medium/high) with visible chain-of-thought let developers trade latency for accuracy per request, scoring 96.6% on AIME 2024 at high effort.
  3. Open-weight competitive coding: With a Codeforces ELO of 2,622 under Apache 2.0, it runs competitive programming and algorithm design workflows on private infrastructure without API dependencies.

Quality

Arena Elo1354
MMLUN/A
MT BenchN/A

GPT-OSS 120B scores 87.2% on MMLU and 90.0% on MMLU-Pro, placing it between GPT-4o (88.7% MMLU) and GPT-4.1 (90.2% MMLU) on the same sheet. With a Codeforces ELO of 2,622 it outperforms every other open-weight model on competitive coding. As OpenAI's first Apache 2.0 release, it runs on a single H100 GPU with MXFP4 quantization despite its 116.8B total parameters.

Gemini-2.5-Flash-Lite

1374

Gemini-2.0-Flash

1360

gpt-oss-120b

1354

o1-mini

1337

o3-mini

1337

pricing

Running GPT-OSS 120B through Telnyx Inference costs $0.039 per million input tokens and $0.10 per million output tokens via the open-weight deployment. Processing 10,000,000 reasoning tasks at 1,000 tokens each would cost approximately $700, making it the cheapest frontier-class reasoning model available under an Apache 2.0 license.

What's Twitter saying?

  • Developers note that GPT-OSS 120B performs well locally on high-end hardware like quad 3090s at 34.7 tokens/second, with strong code generation but "Spartan" design elements.
  • Reviewers find it good but not superior to open-source rivals like Qwen Coder and Kimi K2, calling it overhyped rather than a game-changer.
  • Tech guides praise its coding prowess for reviewing, fixing, and writing code "like magic" on GPUs like H200, emphasizing privacy and no API needs.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai
Model NameDeepSeek-R1-Distill-Qwen-14B
Taskstext generation
Languages SupportedEnglish
Context Length43,000
Parameters14.8B
Model Tiermedium
Licensedeepseek

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

Sign up and start building

faqs

What is GPT-OSS 120B?

GPT-OSS 120B is OpenAI's first open-weight model, released under the Apache 2.0 license. It uses a mixture-of-experts architecture with 117B total parameters, activating only 5.1B per token for efficient inference that fits on a single 80GB GPU.

How much VRAM does GPT-OSS 120B require?

GPT-OSS 120B uses MXFP4 quantization and can run on a single 80GB GPU like an NVIDIA H100 or AMD MI300X. This is possible because only 5.1B of its 117B parameters activate per token.

Is GPT-OSS 120B free?

Yes, GPT-OSS 120B is fully open-weight under the Apache 2.0 license with no copyleft restrictions or patent risk. It is free for commercial deployment, experimentation, and customization.

What GPU will run GPT-OSS 120B?

A single NVIDIA H100 80GB, A100 80GB, or AMD MI300X can run GPT-OSS 120B. The model's MXFP4 quantization and sparse activation keep memory requirements manageable despite the large total parameter count.

How much is GPT-OSS 120B?

The model weights are free to download from Hugging Face. For hosted inference, pricing varies by provider. Self-hosting costs depend on GPU infrastructure, with a single H100 being the minimum recommended hardware.