Llama-3.3-70B-Instruct

Meta's 70B Llama 3.3 model delivering 405B-class performance in coding, reasoning, and instruction-following with a 128k context window across eight languages.

about

The December 2024 release scores 92.1 on IFEval, surpassing both Llama 3.1 405B at 88.6 and GPT-4o at 84.6 on the same instruction-following benchmark despite being roughly 6x smaller than the 405B. It hits 88.4% on HumanEval for code generation and runs at 276 tokens per second on Groq, making the 405B largely redundant for most production workloads.

Licensellama 3.3
Context window(in thousands)99,000

Use cases for Llama-3.3-70B-Instruct

  1. 405B-class instruction following: Scoring 92.1 on IFEval, higher than both Llama 3.1 405B and GPT-4o, it handles complex multi-constraint instructions that require precise adherence to formatting, tone, and content rules.
  2. Multilingual code generation: With 88.4% on HumanEval across 8 supported languages, it generates and explains code in non-English developer contexts without quality loss.
  3. Self-hosted enterprise deployment: Delivering 405B-quality at 70B compute requirements, it runs on standard GPU infrastructure for organizations that need frontier performance without external API dependencies.

Quality

Arena Elo1318
MMLU86
MT BenchN/A

Llama 3.3 70B Instruct scores 86.0% on MMLU (0-shot CoT) and 88.4% on HumanEval, matching GPT-4 Turbo (86.5% MMLU) on general knowledge while significantly exceeding it on code. Its IFEval score of 92.1 surpasses both Llama 3.1 405B (88.6) and GPT-4o (84.6), making it the strongest instruction-following model at the 70B scale on the sheet.

gpt-4-turbo-preview

1324

llama-3.3-70b-versatile

1318

Llama-3.3-70B-Instruct

1318

GPT-4 Omni

1316

Claude-3-7-Sonnet-Latest

1268

pricing

The cost of running Llama 3.3 70B Instruct with Telnyx Inference is $0.0006 per 1,000 tokens. Analyzing 1,000,000 customer chats at 1,000 tokens each would cost $600, the same price as Llama 3.1 70B but with improved instruction-following (IFEval 92.1 vs 88.6).

What's Twitter saying?

  • Developers praise Llama 3.3 70B Instruct's superior instruction following, scoring 92.1 on IFEval and outperforming Llama 3.1 405B (88.6) and GPT-4o (84.6).
  • Tech reviewers highlight its strong coding performance, with 88.4 on HumanEval (near Llama 3.1 405B's 89.0) and improvements over prior 70B models.
  • Commentators note its efficiency and cost-effectiveness, achieving 276 tokens/sec inference speed (25% faster than Llama 3.1 70B) at $0.10/M input tokens.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai
Model NameDeepSeek-R1-Distill-Qwen-14B
Taskstext generation
Languages SupportedEnglish
Context Length43,000
Parameters14.8B
Model Tiermedium
Licensedeepseek

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

Sign up and start building

faqs

Is Llama 3.3 70B Instruct free?

Yes, Llama 3.3 70B Instruct is open-source and free for commercial use under Meta's community license. Several inference providers also offer free tiers for testing the model through hosted APIs.

How good is Llama 3.3 70B Instruct?

Llama 3.3 70B delivers performance comparable to the much larger Llama 3.1 405B model. It scores 92.1 on IFEval for instruction-following, outperforming both Llama 3.1 405B and GPT-4o on that benchmark while being significantly cheaper to run.

What is the Llama 3.3 instruction model?

Llama 3.3 70B Instruct is Meta's instruction-tuned language model with 70 billion parameters, optimized for multilingual dialogue, coding, reasoning, and tool use. It supports eight languages and a 128K context window with improved JSON output for function calling.

How much memory does Llama 3.3 70B need?

Running Llama 3.3 70B at full precision requires approximately 140 GB of VRAM. With 4-bit quantization, it can run on a single GPU with 40+ GB VRAM such as an A100 or H100.

How much RAM is needed for Llama 3.3 70B?

For quantized inference, at least 48 GB of system RAM plus a GPU with 40+ GB VRAM is recommended. Full-precision deployment requires significantly more resources, typically dual A100 80GB GPUs.

Can I run Llama 3.3 70B locally?

Yes, with sufficient hardware. Using 4-bit quantization through tools like llama.cpp or Ollama, you can run it on a single high-VRAM GPU. For home servers, dual RTX 4090s or a single A100 are common configurations.

Llama 3.3 70B Instruct: Advanced Open-Source LLM