llama-4-17b-128e-instruct

Meta's largest Llama 4 model with 17B active parameters across 128 experts, supporting multimodal input and 128k context for complex agentic workflows.

about

With 128 routed experts plus one shared expert per layer and 400B total parameters, Maverick is the highest expert-count open model in any major family. It uses early-fusion multimodality trained on 22 trillion tokens of text and image data, debuting at an LMSYS Chatbot Arena ELO of 1417, above both GPT-4o and Gemini 2.0 Flash, with a 1-million-token context window.

Licensellama4
Context window(in thousands)1,000,000

Use cases for llama-4-17b-128e-instruct

  1. Multimodal document processing: Early-fusion vision trained on 22 trillion tokens enables native understanding of charts, diagrams, and screenshots alongside text without adapter overhead.
  2. Image-grounded conversation: With an LMSYS ELO of 1417 and native image input, it answers complex visual questions about photographs, UI designs, and technical schematics in multi-turn dialogue.
  3. Efficient large-model inference: 128 experts with only 17B active per token means frontier-quality output at a fraction of the compute cost of dense models at equivalent quality.

Quality

Arena Elo1327
MMLU85.5
MT BenchN/A

Llama 4 Maverick scores 85.5% on MMLU and 80.5% on MMLU-Pro, placing it near GPT-4o (88.7% MMLU) on the same sheet with only 17B active parameters. Its LMSYS Arena ELO of 1,327 sits above GPT-4o (1,316), achieved through 128-expert routing that activates only a fraction of its 400B total parameters per token.

o1-mini

1337

o3-mini

1337

llama-4-17b-128e-instruct

1327

gpt-4-turbo-preview

1324

llama-3.3-70b-versatile

1318

pricing

Running Llama 4 Maverick through Telnyx Inference follows 70B+ pricing at $0.0006 per 1,000 tokens, since only 17B of its 400B parameters are active per token. Processing 1,000,000 multimodal queries at 1,500 tokens each would cost approximately $900, delivering GPT-4o-competitive quality with MoE efficiency.

What's Twitter saying?

  • Developers note Llama 4 Maverick underperforms in coding tasks, often giving up or producing inferior results compared to DeepSeek v3, making it better for "vibe coding" than serious development.
  • Tech commentators express disappointment with verbosity, as Maverick "faffs around" with long, circular responses and jokes, burying answers and failing to get to the point.
  • Community benchmarks highlight controversy over inflated scores, with LMArena's high ELO from an experimental, non-released chat-tuned version leading to bans on such models.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai
Model NameDeepSeek-R1-Distill-Qwen-14B
Taskstext generation
Languages SupportedEnglish
Context Length43,000
Parameters14.8B
Model Tiermedium
Licensedeepseek

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

Sign up and start building

faqs

Is Llama 4 Maverick free to use?

Llama 4 Maverick is released under Meta's community license, making it free for most commercial applications. Weights are available on Hugging Face and through hosted inference providers.

What is Llama 4 Maverick?

Llama 4 Maverick is Meta's mixture-of-experts model with 17 billion active parameters drawn from 128 experts, designed for high-capability reasoning at efficient compute cost. It was released as part of Meta's Llama 4 family alongside Llama 4 Scout.

What provider is Llama 4 Maverick?

Llama 4 Maverick is available through multiple providers including Telnyx, together.ai, Fireworks, and directly from Meta's own infrastructure. It can also be self-hosted using the open weights.

Is Llama 4 Maverick MoE?

Yes, Llama 4 Maverick uses a mixture-of-experts architecture with 128 experts, activating 17B parameters per inference pass. This MoE design delivers strong performance while keeping per-token compute cost manageable.

How does Maverick compare to Scout?

Maverick is the larger, more capable model with 128 experts, while Scout uses 16 experts for faster, lighter inference. Maverick targets complex reasoning tasks while Scout is better suited for high-throughput production workloads.

Is Llama 4 Maverick good at coding?

Maverick performs well on coding benchmarks, benefiting from its large expert pool for specialized code patterns. It is competitive with GPT-4 class models on code generation and is particularly strong on multi-file reasoning tasks.