The 70B variant debuted on ChatBot Arena with an ELO of roughly 1207, placing it between GPT-4-0613 and GPT-4 Turbo as the first open-weight model to compete directly with GPT-4 on human preference rankings. It scores 81.7% on HumanEval, surpassing GPT-4-0613 on code generation, and 82.0% on MMLU across 80 transformer layers.
Llama 3 70B Instruct scores 82.0% on MMLU (5-shot) and 81.7% on HumanEval, placing it between GPT-4 (86.4% MMLU) and Llama 2 70B Chat (68.9% MMLU) on the same sheet. On code generation it surpasses GPT-4 (67.0% HumanEval), making it the first open-weight model to beat a GPT-4 variant on a major benchmark.
The cost per 1,000 tokens for utilizing the model with Telnyx Inference stands at $0.0010. To provide a perspective, analyzing 1,000,000 customer chats, presuming each chat is 1,000 tokens long, would cost $1,000.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Check out our helpful tools to help get you started.
Llama 3 70B Instruct is released under Meta's community license, which permits free use for commercial applications. Weights are available on Hugging Face and through hosted inference providers.
Llama 3 70B Instruct is Meta's instruction-tuned 70 billion parameter model, optimized for dialogue, code generation, and complex reasoning. It is available through Telnyx's inference platform and other hosted providers.
The base Llama model is trained for next-token prediction, while the Instruct variant is fine-tuned with RLHF for following instructions and dialogue. The Instruct version is what most applications should use for chat and task completion.
Llama 3 Instruct refers to Meta's family of instruction-tuned models available in 8B and 70B sizes. The 70B variant delivers strong reasoning and coding performance, competitive with proprietary models at a fraction of the cost.
Llama 3 70B approaches GPT-4-level performance on many benchmarks, particularly coding and multilingual tasks. For production deployments, it offers a self-hostable alternative to proprietary API-only models.
Running Llama 3 70B at full precision requires approximately 140GB of VRAM, typically two A100 80GB GPUs. With quantization, a single 48GB GPU can handle it, or you can use hosted inference to avoid GPU provisioning entirely.