The December 2024 release scores 92.1 on IFEval, surpassing both Llama 3.1 405B at 88.6 and GPT-4o at 84.6 on the same instruction-following benchmark despite being roughly 6x smaller than the 405B. It hits 88.4% on HumanEval for code generation and runs at 276 tokens per second on Groq, making the 405B largely redundant for most production workloads.
Llama 3.3 70B Instruct scores 86.0% on MMLU (0-shot CoT) and 88.4% on HumanEval, matching GPT-4 Turbo (86.5% MMLU) on general knowledge while significantly exceeding it on code. Its IFEval score of 92.1 surpasses both Llama 3.1 405B (88.6) and GPT-4o (84.6), making it the strongest instruction-following model at the 70B scale on the sheet.
The cost of running Llama 3.3 70B Instruct with Telnyx Inference is $0.0006 per 1,000 tokens. Analyzing 1,000,000 customer chats at 1,000 tokens each would cost $600, the same price as Llama 3.1 70B but with improved instruction-following (IFEval 92.1 vs 88.6).
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Check out our helpful tools to help get you started.
Llama 3.3 70B Instruct is Meta's latest instruction-tuned 70B model, achieving performance on par with the much larger Llama 3.1 405B on many tasks. It supports a 128K context window and excels at coding, reasoning, and multilingual generation.
Yes, Llama 3.3 70B Instruct is released under Meta's community license for free commercial use. Weights are available on Hugging Face and the model can be accessed through various hosted inference providers.
The base Llama model generates text through next-token prediction, while the Instruct variant is fine-tuned for following instructions and dialogue. For production applications, the Instruct version is recommended.
Llama 3.3 70B supports a 128K token context window, enabling long-document processing and extended conversations. This matches the Llama 3.1 series context length.
Llama 3.3 70B Instruct delivers strong coding performance, approaching the results of Llama 3.1 405B on many programming benchmarks. It handles code generation, review, and debugging effectively through hosted inference platforms.
Llama 3.3 70B achieves similar performance to Llama 3.1 405B on many benchmarks despite being roughly 6x smaller. It represents Meta's most efficient 70B model to date.