The 70B variant debuted on ChatBot Arena with an ELO of roughly 1207, placing it between GPT-4-0613 and GPT-4 Turbo as the first open-weight model to compete directly with GPT-4 on human preference rankings. It scores 81.7% on HumanEval, surpassing GPT-4-0613 on code generation, and 82.0% on MMLU across 80 transformer layers.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Llama 3 70B Instruct is released under Meta's community license, which permits free use for commercial applications. Weights are available on Hugging Face and through hosted inference providers.
Llama 3 70B Instruct is Meta's instruction-tuned 70 billion parameter model, optimized for dialogue, code generation, and complex reasoning. It is available through and other hosted providers.
Llama 3 70B Instruct scores 82.0% on MMLU (5-shot) and 81.7% on HumanEval, placing it between GPT-4 (86.4% MMLU) and Llama 2 70B Chat (68.9% MMLU) on the same sheet. On code generation it surpasses GPT-4 (67.0% HumanEval), making it the first open-weight model to beat a GPT-4 variant on a major benchmark.
The cost per 1,000 tokens for utilizing the model with Telnyx Inference stands at $0.0010. To provide a perspective, analyzing 1,000,000 customer chats, presuming each chat is 1,000 tokens long, would cost $1,000.
The base Llama model is trained for next-token prediction, while the Instruct variant is fine-tuned with RLHF for following instructions and dialogue. The Instruct version is what most applications should use for chat and task completion.
Llama 3 Instruct refers to Meta's family of instruction-tuned models available in 8B and 70B sizes. The 70B variant delivers strong reasoning and coding performance, competitive with proprietary models at a fraction of the cost.
Llama 3 70B approaches GPT-4-level performance on many benchmarks, particularly coding and multilingual tasks. For production deployments, it offers a self-hostable alternative to proprietary API-only models.
Running Llama 3 70B at full precision requires approximately 140GB of VRAM, typically two A100 80GB GPUs. With quantization, a single 48GB GPU can handle it, or you can use hosted inference to avoid GPU provisioning entirely.