Telnyx - Global Communications Platform ProviderHome
Voice AIVoice APIInferenceMobile VoiceSpeech-to-TextText-to-speechSIP TrunkingSMS APIWhatsApp Business APIView all productsHealthcareFinanceTravel and HospitalityLogistics and TransportationContact CenterInsuranceRetail and E-CommerceSales and MarketingServices and DiningView all solutionsVoice AIVoice APIInferenceMobile VoiceSpeech-to-TextText-to-SpeechSIP TrunkingSMS APIWhatsApp Business APIGlobal NumbersIoT SIM CardView all pricingOur NetworkMission Control PortalCustomer storiesGlobal communicationsPartnersCareersEventsResource centerSupport centerAI TemplatesSETIDev DocsIntegrations
Contact usLog in
Contact usLog inSign up

Social

Company

  • Our Network
  • Global Coverage
  • Release Notes
  • Careers
  • Voice AI
  • AI Glossary
  • Shop

Legal

  • Data and Privacy
  • Report Abuse
  • Privacy Policy
  • Cookie Policy
  • Law Enforcement
  • Acceptable Use
  • Trust Center
  • Country Specific Requirements
  • Website Terms and Conditions
  • Terms and Conditions of Service

Compare

  • ElevenLabs
  • Vapi
  • Baseten
  • Together.ai
  • Twilio
  • Bandwidth
  • Vonage
  • Amazon Connect
© Telnyx LLC 2026
ISO • PCI • HIPAA • GDPR • SOC2 Type II

Ask AI

  • GPT
  • Claude
  • Perplexity
  • Gemini
  • Grok

Llama-3.3-70B-Instruct

Meta's 70B Llama 3.3 model delivering 405B-class performance in coding, reasoning, and instruction-following with a 128k context window across eight languages.

Start buildingGET Available Models

about

The December 2024 release scores 92.1 on IFEval, surpassing both Llama 3.1 405B at 88.6 and GPT-4o at 84.6 on the same instruction-following benchmark despite being roughly 6x smaller than the 405B. It hits 88.4% on HumanEval for code generation and runs at 276 tokens per second on Groq, making the 405B largely redundant for most production workloads.

Licensellama 3.3

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

No data available at this time, please try again later.
OrganizationModel NameTasksLanguages SupportedContext LengthParametersModel TierLicense
No data available at this time, please try again later.
HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models
RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

Sign up and start building

Sign upContact sales

faqs

What is Llama 3.3 70B Instruct?

Llama 3.3 70B Instruct is Meta's latest instruction-tuned 70B model, achieving performance on par with the much larger Llama 3.1 405B on many tasks. It supports a 128K context window and excels at coding, reasoning, and multilingual generation.

Is Llama 3.3 70B Instruct free?

Yes, Llama 3.3 70B Instruct is released under Meta's community license for free commercial use. Weights are available on Hugging Face and the model can be accessed through various hosted inference providers.

Context window(in thousands)
99,000

Use cases for Llama-3.3-70B-Instruct

  1. 405B-class instruction following: Scoring 92.1 on IFEval, higher than both Llama 3.1 405B and GPT-4o, it handles complex multi-constraint instructions that require precise adherence to formatting, tone, and content rules.
  2. Multilingual code generation: With 88.4% on HumanEval across 8 supported languages, it generates and explains code in non-English developer contexts without quality loss.
  3. Self-hosted enterprise deployment: Delivering 405B-quality at 70B compute requirements, it runs on standard GPU infrastructure for organizations that need frontier performance without external API dependencies.

Quality

Arena Elo1318
MMLU86
MT BenchN/A

Llama 3.3 70B Instruct scores 86.0% on MMLU (0-shot CoT) and 88.4% on HumanEval, matching GPT-4 Turbo (86.5% MMLU) on general knowledge while significantly exceeding it on code. Its IFEval score of 92.1 surpasses both Llama 3.1 405B (88.6) and GPT-4o (84.6), making it the strongest instruction-following model at the 70B scale on the sheet.

gpt-4-turbo-preview

1324

llama-3.3-70b-versatile

1318

Llama-3.3-70B-Instruct

1318

GPT-4 Omni

1316

Claude-3-7-Sonnet-Latest

1268

pricing

The cost of running Llama 3.3 70B Instruct with Telnyx Inference is $0.0006 per 1,000 tokens. Analyzing 1,000,000 customer chats at 1,000 tokens each would cost $600, the same price as Llama 3.1 70B but with improved instruction-following (IFEval 92.1 vs 88.6).

What's Twitter saying?

  • Developers praise Llama 3.3 70B Instruct's superior instruction following, scoring 92.1 on IFEval and outperforming Llama 3.1 405B (88.6) and GPT-4o (84.6).
  • Tech reviewers highlight its strong coding performance, with 88.4 on HumanEval (near Llama 3.1 405B's 89.0) and improvements over prior 70B models.
  • Commentators note its efficiency and cost-effectiveness, achieving 276 tokens/sec inference speed (25% faster than Llama 3.1 70B) at $0.10/M input tokens.
Test today
  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

    Get started
  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

    See updates
  • What is the difference between Llama normal and instruct?

    The base Llama model generates text through next-token prediction, while the Instruct variant is fine-tuned for following instructions and dialogue. For production applications, the Instruct version is recommended.

    How many tokens can you have in Llama 3.3 70B?

    Llama 3.3 70B supports a 128K token context window, enabling long-document processing and extended conversations. This matches the Llama 3.1 series context length.

    Is Llama 3.3 70B good for coding?

    Llama 3.3 70B Instruct delivers strong coding performance, approaching the results of Llama 3.1 405B on many programming benchmarks. It handles code generation, review, and debugging effectively through hosted inference platforms.

    How does Llama 3.3 compare to 3.1?

    Llama 3.3 70B achieves similar performance to Llama 3.1 405B on many benchmarks despite being roughly 6x smaller. It represents Meta's most efficient 70B model to date.