gpt-oss-120b

Name: GPT-OSS-120B: Powerful AI Model for Diverse Tasks
Brand: Telnyx
Price: 1 USD
Availability: InStock

OpenAI's first open-weight model under Apache 2.0, activating 5.1B of 117B total parameters per token for efficient reasoning and agentic tool use.

Start building GET Available Models

about

OpenAI's first open-weight release uses 128 experts per layer with top-4 routing, keeping 5.1B of 116.8B total parameters active per token, and fits on a single 80GB GPU through MXFP4 post-training quantization. Trained over 2.1 million H100-hours with a STEM and coding focus, it scores 96.6% on AIME 2024 and reaches a Codeforces Elo of 2,622 with configurable low/medium/high reasoning effort.

Licensegroq

Context window(in thousands)131,072

Use cases for gpt-oss-120b

Single-GPU frontier inference: MXFP4 quantization fits the full 116.8B-parameter model on one H100 80GB GPU, making frontier reasoning accessible without multi-node infrastructure.
Configurable reasoning effort: Three reasoning modes (low/medium/high) with visible chain-of-thought let developers trade latency for accuracy per request, scoring 96.6% on AIME 2024 at high effort.
Open-weight competitive coding: With a Codeforces ELO of 2,622 under Apache 2.0, it runs competitive programming and algorithm design workflows on private infrastructure without API dependencies.

Quality

Arena Elo1354

MMLUN/A

MT BenchN/A

GPT-OSS 120B scores 87.2% on MMLU and 90.0% on MMLU-Pro, placing it between GPT-4o (88.7% MMLU) and GPT-4.1 (90.2% MMLU) on the same sheet. With a Codeforces ELO of 2,622 it outperforms every other open-weight model on competitive coding. As OpenAI's first Apache 2.0 release, it runs on a single H100 GPU with MXFP4 quantization despite its 116.8B total parameters.

Gemini-2.5-Flash-Lite

1374

Gemini-2.0-Flash

1360

gpt-oss-120b

1354

o1-mini

1337

o3-mini

1337

pricing

Running GPT-OSS 120B through Telnyx Inference costs $0.039 per million input tokens and $0.10 per million output tokens via the open-weight deployment. Processing 10,000,000 reasoning tasks at 1,000 tokens each would cost approximately $700, making it the cheapest frontier-class reasoning model available under an Apache 2.0 license.

What's Twitter saying?

Developers note that GPT-OSS 120B performs well locally on high-end hardware like quad 3090s at 34.7 tokens/second, with strong code generation but "Spartan" design elements.
Reviewers find it good but not superior to open-source rivals like Qwen Coder and Kimi K2, calling it overhyped rather than a game-changer.
Tech guides praise its coding prowess for reviewing, fixing, and writing code "like magic" on GPUs like H200, emphasizing privacy and no API needs.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

No data available at this time, please try again later.

Organization	Model Name	Tasks	Languages Supported	Context Length	Parameters	Model Tier	License
No data available at this time, please try again later.

HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models

RESOURCES

Get started

Check out our helpful tools to help get you started.

Test in the portal
Easily browse and select your preferred model in the AI Playground.
Test today
Explore the docs
Don’t wait to scale, start today with our public API endpoints.
Get started
Stay up to date
Keep an eye on our AI changelog so you don't miss a beat.
See updates

Sign up and start building

faqs

What is GPT-OSS 120B?

GPT-OSS 120B is OpenAI's first open-weight model, released with 120 billion parameters under a permissive license. It is available on Hugging Face and supported through Telnyx's inference infrastructure.

What is GPT-OSS 120B good for?

GPT-OSS 120B excels at coding, reasoning, and instruction-following tasks, performing competitively with proprietary models. It is supported on Telnyx for production voice AI and inference workloads.

How much VRAM does GPT-OSS 120B require?

GPT-OSS 120B requires approximately 240GB of VRAM at full precision, typically needing multiple A100 GPUs. Hosted inference platforms provide access without managing GPU infrastructure.

Is GPT-OSS free?

GPT-OSS 120B is released under an open-weight license permitting free commercial use. Weights are available on Hugging Face, and API access is available through hosting providers.

How does GPT-OSS compare to GPT-5?

GPT-OSS 120B is competitive with GPT-4o class models but does not match GPT-5's full reasoning capability. It represents OpenAI's commitment to open-weight models and is available through Telnyx alongside GPT-5.

Why did OpenAI release an open model?

OpenAI released GPT-OSS to participate in the open-weight ecosystem and provide a self-hostable alternative to their API-only models. It is documented in OpenAI's announcement.

about

Use cases for gpt-oss-120b

Single-GPU frontier inference: MXFP4 quantization fits the full 116.8B-parameter model on one H100 80GB GPU, making frontier reasoning accessible without multi-node infrastructure.
Configurable reasoning effort: Three reasoning modes (low/medium/high) with visible chain-of-thought let developers trade latency for accuracy per request, scoring 96.6% on AIME 2024 at high effort.
Open-weight competitive coding: With a Codeforces ELO of 2,622 under Apache 2.0, it runs competitive programming and algorithm design workflows on private infrastructure without API dependencies.

pricing

What's Twitter saying?

Developers note that GPT-OSS 120B performs well locally on high-end hardware like quad 3090s at 34.7 tokens/second, with strong code generation but "Spartan" design elements.
Reviewers find it good but not superior to open-source rivals like Qwen Coder and Kimi K2, calling it overhyped rather than a game-changer.
Tech guides praise its coding prowess for reviewing, fixing, and writing code "like magic" on GPUs like H200, emphasizing privacy and no API needs.

Organization

Model Name

Tasks

Languages Supported

Context Length

Parameters

Model Tier

License

No data available at this time, please try again later.

faqs