Llama-4-Scout-Instruct

Name: Llama 4 Scout 17B 16e Instruct: Powerful AI Model for Diverse Tasks
Brand: Telnyx
Price: 1 USD
Availability: InStock

Meta's multimodal mixture-of-experts model with 17B active parameters across 16 experts, supporting text and image input with a 10M token context window.

Start building GET Available Models

about

Trained on 40 trillion tokens of multimodal data, Scout fuses vision at the transformer backbone level from pre-training using an enhanced MetaCLIP-based encoder rather than attaching it as a separate module. Its interleaved attention architecture alternates layers with and without positional encodings to achieve the 10-million-token context window, and the full 109B-parameter model fits on a single H100 GPU with int4 quantization.

Licensellama4

Context window(in thousands)128000

Use cases for Llama-4-Scout-Instruct

Extreme-length document processing: The 10-million-token context window accommodates entire codebases, multi-volume legal records, or years of correspondence in a single inference pass.
Native image understanding: Early-fusion multimodal training via MetaCLIP processes images at the transformer backbone level, not as an afterthought, enabling integrated visual reasoning alongside text.
Single-GPU frontier inference: The full 109B-parameter model fits on one H100 with int4 quantization, making frontier multimodal capability accessible without multi-node infrastructure.

Quality

Arena Elo1250

MMLU79.6

MT BenchN/A

Llama 4 Scout scores 79.6% on MMLU and 74.3% on MMLU-Pro, placing it between Gemini 2.0 Flash (76.4% MMLU) and GPT-4o mini (82.0% MMLU) on the same sheet. With only 17B of its 109B parameters active per token across 16 experts, it fits on a single H100 GPU with int4 quantization while supporting a 10-million-token context window.

Claude-3-7-Sonnet-Latest

1268

GPT-4 1106 Preview

1251

Llama-4-Scout-Instruct

1250

Llama 3.1 70B Instruct

1248

GPT-4 0125 Preview

1245

pricing

Running Llama 4 Scout through Telnyx Inference follows the 70B+ pricing tier at $0.0006 per 1,000 tokens, since only 17B of its 109B parameters are active per token. Processing 1,000,000 long-document queries at 5,000 tokens each would cost $3,000, with the 10M-token context window eliminating the need for retrieval augmentation.

What's Twitter saying?

Developers praise Llama 4 Scout's 10M context length as a breakthrough for summarization, function calling, and long-context tasks like multi-document processing, calling it best-in-class for cheap, local GPUs.
Many tech reviewers criticize its poor coding performance, noting it struggles with basic tasks, underperforms smaller models like Llama 3.3 70B or Gemma 3 27B, and lags in benchmarks.
Commentators highlight mixed enterprise results, with strong accuracy in simple info extraction but weaknesses in complex reasoning compared to Maverick or rivals like Claude Haiku.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai

Model NameDeepSeek-R1-Distill-Qwen-14B

Taskstext generation

Languages SupportedEnglish

Context Length43,000

Parameters14.8B

Model Tiermedium

Licensedeepseek

Organization	Model Name	Tasks	Languages Supported	Context Length	Parameters	Model Tier	License
deepseek-ai	DeepSeek-R1-Distill-Qwen-14B	text generation	English	43,000	14.8B	medium	deepseek
fixie-ai	ultravox-v0_4_1-llama-3_1-8b	audio text-to-text	Multilingual	8,000	8.7B	small	mit
google	gemma-2b-it	text generation	English	8,192	2.5B	small	gemma
google	gemma-7b-it	text generation	English	8,192	8.5B	small	gemma
meta-llama	Llama-3.3-70B-Instruct	text generation	Multilingual	99,000	70.6B	large	llama3.3
meta-llama	Llama-Guard-3-1B	safety classification	Multilingual	128,000	1.5B	small	llama3.3
meta-llama	Meta-Llama-3.1-70B-Instruct	text generation	Multilingual	99,000	70.6B	large	llama3.1
meta-llama	Meta-Llama-3.1-8B-Instruct	text generation	Multilingual	131,072	8.0B	small	llama3.1
minimaxai	MiniMax-M2.5	text generation	English	2,000,000	0	large	minimaxai
minimaxai	MiniMax-M2.7	text generation	English	200,000	0	large	minimaxai
mistralai	Mistral-7B-Instruct-v0.1	text generation	English	8,192	7.2B	small	apache-2.0
mistralai	Mistral-7B-Instruct-v0.2	text generation	English	32,768	7.2B	small	apache-2.0
mistralai	Mixtral-8x7B-Instruct-v0.1	text generation	Multilingual	32,768	46.7B	medium	apache-2.0
moonshotai	Kimi-K2.5	text generation	English	256,000	1.0T	large	modified-mit
Qwen	Qwen3-235B-A22B	text generation	English	32,768	235.1B	large	apache-2.0
zai-org	GLM-5.1-FP8	text generation	English	202,752	753.9B	large	mit
anthropic	claude-3-7-sonnet-latest	text generation	Multilingual	200,000	0	large	anthropic
anthropic	claude-haiku-4-5	text generation	Multilingual	200,000	0	large	anthropic
anthropic	claude-opus-4-6	text generation	Multilingual	200,000	0	large	anthropic
anthropic	claude-sonnet-4-20250514	text generation	Multilingual	200,000	0	large	anthropic
google	gemini-2.0-flash	text generation	Multilingual	1,048,576	0	large	google
google	gemini-2.5-flash	text generation	Multilingual	1,048,576	0	large	google
google	gemini-2.5-flash-lite	text generation	Multilingual	1,048,576	0	large	google
groq	gpt-oss-120b	text generation	English	131,072	117.0B	large	groq
groq	kimi-k2-instruct	text generation	English	131,072	1.0T	large	groq
groq	llama-3.3-70b-versatile	text generation	Multilingual	131,072	70.6B	large	llama3.3
groq	llama-4-maverick-17b-128e-instruct	text generation	Multilingual	1,000,000	400.0B	large	llama4
groq	llama-4-scout-17b-16e-instruct	text generation	Multilingual	128,000	109.0B	large	llama4
openai	gpt-3.5-turbo	text generation	Multilingual	4,096	0	large	openai
openai	gpt-4	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-0125-preview	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-0314	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-0613	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-1106-preview	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-32k-0314	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-turbo-preview	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4.1	text generation	Multilingual	1,047,576	0	large	openai
openai	gpt-4.1-mini	text generation	Multilingual	1,047,576	0	large	openai
openai	gpt-4o	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4o-mini	text generation	Multilingual	128,000	0	large	openai
openai	gpt-5	text generation	Multilingual	400,000	0	large	openai
openai	gpt-5-mini	text generation	Multilingual	400,000	0	large	openai
openai	gpt-5.1	text generation	Multilingual	400,000	0	large	openai
openai	gpt-5.2	text generation	Multilingual	400,000	0	large	openai
openai	o1-mini	text generation	Multilingual	128,000	0	large	openai
openai	o1-preview	text generation	Multilingual	128,000	0	large	openai
openai	o3-mini	text generation	Multilingual	200,000	0	large	openai
xai-org	grok-2	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-2-latest	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-beta	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-fast	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-fast-beta	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-fast-latest	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-latest	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-mini	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-mini-fast	text generation	Multilingual	131,072	0	large	xai

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models

RESOURCES

Get started

Check out our helpful tools to help get you started.

Test in the portal
Easily browse and select your preferred model in the AI Playground.
Test today
Explore the docs
Don’t wait to scale, start today with our public API endpoints.
Get started
Stay up to date
Keep an eye on our AI changelog so you don't miss a beat.
See updates

Sign up and start building

faqs

llama-4-scout-17b-16e-instruct

What is Llama 4 Scout 17B 16E Instruct?

Llama 4 Scout is Meta's multimodal mixture-of-experts model with 17B active parameters out of 109B total, using 16 specialized experts. It supports text and image input with a 128K context window (up to 10M tokens supported) across 12 languages.

What is Llama 4 Scout good for?

Llama 4 Scout handles assistant-style chat, visual reasoning, code generation, and document analysis. It is optimized for tasks that combine text and image understanding, such as answering questions about images, captioning, and multimodal content generation.

How much is the token limit for Llama 4 Scout?

Llama 4 Scout has a standard context window of 128K tokens for most deployments, with support for up to 10 million tokens in extended configurations. The large context makes it suitable for processing lengthy documents and maintaining extensive conversation histories.

What GPU is needed for Llama 4 Scout?

Llama 4 Scout requires significant GPU resources due to its 109B total parameters. At minimum, an A100 80GB or H100 GPU is recommended for quantized inference. Full-precision deployment typically requires multi-GPU setups.

Can Llama 4 run on CPU?

Llama 4 Scout can technically run on CPU-only systems using quantized formats, but performance is severely limited with very slow token generation. GPU inference is strongly recommended for practical use.

Llama-4-Scout-Instruct

about

Use cases for Llama-4-Scout-Instruct

Quality

pricing

What's Twitter saying?

Explore Our LLM Library

Chat with an LLM

Selecting LLMs for Voice AI

Create an account

Choose Llama-4-Scout-Instruct

Enter your API key

Prompt the LLM

Get started

Test in the portal

Explore the docs

Stay up to date

Sign up and start building

faqs

llama-4-scout-17b-16e-instruct

What is Llama 4 Scout 17B 16E Instruct?

What is Llama 4 Scout good for?

How much is the token limit for Llama 4 Scout?

What GPU is needed for Llama 4 Scout?

Can Llama 4 run on CPU?

Llama-4-Scout-Instruct

about

Use cases for Llama-4-Scout-Instruct

Quality

pricing

What's Twitter saying?

Explore Our LLM Library

DeepSeek-R1-Distill-Qwen-14B

ultravox-v0_4_1-llama-3_1-8b

gemma-2b-it

gemma-7b-it

Llama-3.3-70B-Instruct

Llama-Guard-3-1B

Meta-Llama-3.1-70B-Instruct

Meta-Llama-3.1-8B-Instruct

MiniMax-M2.5

MiniMax-M2.7

Mistral-7B-Instruct-v0.1

Mistral-7B-Instruct-v0.2

Mixtral-8x7B-Instruct-v0.1

Kimi-K2.5

Qwen3-235B-A22B

GLM-5.1-FP8

claude-3-7-sonnet-latest

claude-haiku-4-5

claude-opus-4-6

claude-sonnet-4-20250514

gemini-2.0-flash

gemini-2.5-flash

gemini-2.5-flash-lite

gpt-oss-120b

kimi-k2-instruct

llama-3.3-70b-versatile

llama-4-maverick-17b-128e-instruct

llama-4-scout-17b-16e-instruct

gpt-3.5-turbo

gpt-4

gpt-4-0125-preview

gpt-4-0314

gpt-4-0613

gpt-4-1106-preview

gpt-4-32k-0314

gpt-4-turbo-preview

gpt-4.1

gpt-4.1-mini

gpt-4o

gpt-4o-mini

gpt-5

gpt-5-mini

gpt-5.1

gpt-5.2

o1-mini

o1-preview

o3-mini

grok-2

grok-2-latest

grok-3

grok-3-beta

grok-3-fast

grok-3-fast-beta

grok-3-fast-latest

grok-3-latest

grok-3-mini

grok-3-mini-fast

Chat with an LLM

Selecting LLMs for Voice AI

Create an account

Choose Llama-4-Scout-Instruct

Enter your API key

Prompt the LLM

Test in the portal

Explore the docs

Stay up to date

Sign up and start building

faqs

llama-4-scout-17b-16e-instruct

What is Llama 4 Scout 17B 16E Instruct?

What is Llama 4 Scout good for?

How much is the token limit for Llama 4 Scout?

What GPU is needed for Llama 4 Scout?