Llama-3.3-70B-Instruct

Name: Llama 3.3 70B Instruct: Advanced Open-Source LLM
Brand: Telnyx
Price: 1 USD
Availability: InStock

Meta's 70B Llama 3.3 model delivering 405B-class performance in coding, reasoning, and instruction-following with a 128k context window across eight languages.

Start building GET Available Models

about

The December 2024 release scores 92.1 on IFEval, surpassing both Llama 3.1 405B at 88.6 and GPT-4o at 84.6 on the same instruction-following benchmark despite being roughly 6x smaller than the 405B. It hits 88.4% on HumanEval for code generation and runs at 276 tokens per second on Groq, making the 405B largely redundant for most production workloads.

Licensellama 3.3

Context window(in thousands)99,000

Use cases for Llama-3.3-70B-Instruct

405B-class instruction following: Scoring 92.1 on IFEval, higher than both Llama 3.1 405B and GPT-4o, it handles complex multi-constraint instructions that require precise adherence to formatting, tone, and content rules.
Multilingual code generation: With 88.4% on HumanEval across 8 supported languages, it generates and explains code in non-English developer contexts without quality loss.
Self-hosted enterprise deployment: Delivering 405B-quality at 70B compute requirements, it runs on standard GPU infrastructure for organizations that need frontier performance without external API dependencies.

Quality

Arena Elo1318

MMLU86

MT BenchN/A

Llama 3.3 70B Instruct scores 86.0% on MMLU (0-shot CoT) and 88.4% on HumanEval, matching GPT-4 Turbo (86.5% MMLU) on general knowledge while significantly exceeding it on code. Its IFEval score of 92.1 surpasses both Llama 3.1 405B (88.6) and GPT-4o (84.6), making it the strongest instruction-following model at the 70B scale on the sheet.

gpt-4-turbo-preview

1324

llama-3.3-70b-versatile

1318

Llama-3.3-70B-Instruct

1318

GPT-4 Omni

1316

Claude-3-7-Sonnet-Latest

1268

pricing

The cost of running Llama 3.3 70B Instruct with Telnyx Inference is $0.0006 per 1,000 tokens. Analyzing 1,000,000 customer chats at 1,000 tokens each would cost $600, the same price as Llama 3.1 70B but with improved instruction-following (IFEval 92.1 vs 88.6).

What's Twitter saying?

Developers praise Llama 3.3 70B Instruct's superior instruction following, scoring 92.1 on IFEval and outperforming Llama 3.1 405B (88.6) and GPT-4o (84.6).
Tech reviewers highlight its strong coding performance, with 88.4 on HumanEval (near Llama 3.1 405B's 89.0) and improvements over prior 70B models.
Commentators note its efficiency and cost-effectiveness, achieving 276 tokens/sec inference speed (25% faster than Llama 3.1 70B) at $0.10/M input tokens.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

No data available at this time, please try again later.

Organization	Model Name	Tasks	Languages Supported	Context Length	Parameters	Model Tier	License
No data available at this time, please try again later.

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models

RESOURCES

Get started

Check out our helpful tools to help get you started.

Test in the portal
Easily browse and select your preferred model in the AI Playground.
Test today
Explore the docs
Don’t wait to scale, start today with our public API endpoints.
Get started
Stay up to date
Keep an eye on our AI changelog so you don't miss a beat.
See updates

Sign up and start building

faqs

What is Llama 3.3 70B Instruct?

Llama 3.3 70B Instruct is Meta's latest instruction-tuned 70B model, achieving performance on par with the much larger Llama 3.1 405B on many tasks. It supports a 128K context window and excels at coding, reasoning, and multilingual generation.

Is Llama 3.3 70B Instruct free?

Yes, Llama 3.3 70B Instruct is released under Meta's community license for free commercial use. Weights are available on Hugging Face and the model can be accessed through various hosted inference providers.

What is the difference between Llama normal and instruct?

The base Llama model generates text through next-token prediction, while the Instruct variant is fine-tuned for following instructions and dialogue. For production applications, the Instruct version is recommended.

How many tokens can you have in Llama 3.3 70B?

Llama 3.3 70B supports a 128K token context window, enabling long-document processing and extended conversations. This matches the Llama 3.1 series context length.

Is Llama 3.3 70B good for coding?

Llama 3.3 70B Instruct delivers strong coding performance, approaching the results of Llama 3.1 405B on many programming benchmarks. It handles code generation, review, and debugging effectively through hosted inference platforms.

How does Llama 3.3 compare to 3.1?

Llama 3.3 70B achieves similar performance to Llama 3.1 405B on many benchmarks despite being roughly 6x smaller. It represents Meta's most efficient 70B model to date.

about

Use cases for Llama-3.3-70B-Instruct

405B-class instruction following: Scoring 92.1 on IFEval, higher than both Llama 3.1 405B and GPT-4o, it handles complex multi-constraint instructions that require precise adherence to formatting, tone, and content rules.
Multilingual code generation: With 88.4% on HumanEval across 8 supported languages, it generates and explains code in non-English developer contexts without quality loss.
Self-hosted enterprise deployment: Delivering 405B-quality at 70B compute requirements, it runs on standard GPU infrastructure for organizations that need frontier performance without external API dependencies.

What's Twitter saying?

Developers praise Llama 3.3 70B Instruct's superior instruction following, scoring 92.1 on IFEval and outperforming Llama 3.1 405B (88.6) and GPT-4o (84.6).
Tech reviewers highlight its strong coding performance, with 88.4 on HumanEval (near Llama 3.1 405B's 89.0) and improvements over prior 70B models.
Commentators note its efficiency and cost-effectiveness, achieving 276 tokens/sec inference speed (25% faster than Llama 3.1 70B) at $0.10/M input tokens.

Organization

Model Name

Tasks

Languages Supported

Context Length

Parameters

Model Tier

License

No data available at this time, please try again later.

faqs