Telnyx - Global Communications Platform ProviderHome
Voice AIVoice APIInferenceMobile VoiceSpeech-to-TextText-to-speechSIP TrunkingSMS APIWhatsApp Business APIView all productsHealthcareFinanceTravel and HospitalityLogistics and TransportationContact CenterInsuranceRetail and E-CommerceSales and MarketingServices and DiningView all solutionsVoice AIVoice APIInferenceMobile VoiceSpeech-to-TextText-to-SpeechSIP TrunkingSMS APIWhatsApp Business APIGlobal NumbersIoT SIM CardView all pricingOur NetworkMission Control PortalCustomer storiesGlobal communicationsPartnersCareersEventsResource centerSupport centerAI TemplatesSETIDev DocsIntegrations
Contact usLog in
Contact usLog inSign up

Social

Company

  • Our Network
  • Global Coverage
  • Release Notes
  • Careers
  • Voice AI
  • AI Glossary
  • Shop

Legal

  • Data and Privacy
  • Report Abuse
  • Privacy Policy
  • Cookie Policy
  • Law Enforcement
  • Acceptable Use
  • Trust Center
  • Country Specific Requirements
  • Website Terms and Conditions
  • Terms and Conditions of Service

Compare

  • ElevenLabs
  • Vapi
  • Baseten
  • Together.ai
  • Twilio
  • Bandwidth
  • Vonage
  • Amazon Connect
© Telnyx LLC 2026
ISO • PCI • HIPAA • GDPR • SOC2 Type II

Ask AI

  • GPT
  • Claude
  • Perplexity
  • Gemini
  • Grok

gpt-oss-120b

OpenAI's first open-weight model under Apache 2.0, activating 5.1B of 117B total parameters per token for efficient reasoning and agentic tool use.

Start buildingGET Available Models

about

OpenAI's first open-weight release uses 128 experts per layer with top-4 routing, keeping 5.1B of 116.8B total parameters active per token, and fits on a single 80GB GPU through MXFP4 post-training quantization. Trained over 2.1 million H100-hours with a STEM and coding focus, it scores 96.6% on AIME 2024 and reaches a Codeforces Elo of 2,622 with configurable low/medium/high reasoning effort.

Licensegroq
Context window(in thousands)131,072

Use cases for gpt-oss-120b

  1. Single-GPU frontier inference: MXFP4 quantization fits the full 116.8B-parameter model on one H100 80GB GPU, making frontier reasoning accessible without multi-node infrastructure.
  2. Configurable reasoning effort: Three reasoning modes (low/medium/high) with visible chain-of-thought let developers trade latency for accuracy per request, scoring 96.6% on AIME 2024 at high effort.
  3. Open-weight competitive coding: With a Codeforces ELO of 2,622 under Apache 2.0, it runs competitive programming and algorithm design workflows on private infrastructure without API dependencies.

Quality

Arena Elo1354
MMLUN/A
MT BenchN/A

GPT-OSS 120B scores 87.2% on MMLU and 90.0% on MMLU-Pro, placing it between GPT-4o (88.7% MMLU) and GPT-4.1 (90.2% MMLU) on the same sheet. With a Codeforces ELO of 2,622 it outperforms every other open-weight model on competitive coding. As OpenAI's first Apache 2.0 release, it runs on a single H100 GPU with MXFP4 quantization despite its 116.8B total parameters.

Gemini-2.5-Flash-Lite

1374

Gemini-2.0-Flash

1360

gpt-oss-120b

1354

o1-mini

1337

o3-mini

1337

pricing

Running GPT-OSS 120B through Telnyx Inference costs $0.039 per million input tokens and $0.10 per million output tokens via the open-weight deployment. Processing 10,000,000 reasoning tasks at 1,000 tokens each would cost approximately $700, making it the cheapest frontier-class reasoning model available under an Apache 2.0 license.

What's Twitter saying?

  • Developers note that GPT-OSS 120B performs well locally on high-end hardware like quad 3090s at 34.7 tokens/second, with strong code generation but "Spartan" design elements.
  • Reviewers find it good but not superior to open-source rivals like Qwen Coder and Kimi K2, calling it overhyped rather than a game-changer.
  • Tech guides praise its coding prowess for reviewing, fixing, and writing code "like magic" on GPUs like H200, emphasizing privacy and no API needs.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

No data available at this time, please try again later.
OrganizationModel NameTasksLanguages SupportedContext LengthParametersModel TierLicense
No data available at this time, please try again later.
HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models
RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

    Test today
  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

    Get started
  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

    See updates

Sign up and start building

Sign upContact sales

faqs

What is GPT-OSS 120B?

GPT-OSS 120B is OpenAI's first open-weight model, released with 120 billion parameters under a permissive license. It is available on Hugging Face and supported through Telnyx's inference infrastructure.

What is GPT-OSS 120B good for?

GPT-OSS 120B excels at coding, reasoning, and instruction-following tasks, performing competitively with proprietary models. It is supported on Telnyx for production voice AI and inference workloads.

How much VRAM does GPT-OSS 120B require?

GPT-OSS 120B requires approximately 240GB of VRAM at full precision, typically needing multiple A100 GPUs. Hosted inference platforms provide access without managing GPU infrastructure.

Is GPT-OSS free?

GPT-OSS 120B is released under an open-weight license permitting free commercial use. Weights are available on Hugging Face, and API access is available through hosting providers.

How does GPT-OSS compare to GPT-5?

GPT-OSS 120B is competitive with GPT-4o class models but does not match GPT-5's full reasoning capability. It represents OpenAI's commitment to open-weight models and is available through Telnyx alongside GPT-5.

Why did OpenAI release an open model?

OpenAI released GPT-OSS to participate in the open-weight ecosystem and provide a self-hostable alternative to their API-only models. It is documented in OpenAI's announcement.