Trained on 15 trillion tokens, a 7.5x increase over Llama 2, the 8B model matches Llama 2 70B on MMLU at 68.4% with 9x fewer parameters. It introduced a new tiktoken-based tokenizer with 128K vocabulary that produces 15% fewer tokens for English text, and shifted from PPO to Direct Preference Optimization for alignment.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Llama 3 8B Instruct is Meta's instruction-tuned 8 billion parameter model from the original Llama 3 release in April 2024. It is available through Telnyx's inference platform and on Hugging Face.
The base Llama 3 8B is trained for next-token prediction, while the Instruct variant is fine-tuned for following instructions and dialogue. For most applications, the Instruct version is recommended.
Llama 3 8B Instruct scores 67.4% on MMLU (5-shot), matching Llama 2 70B Chat (68.9%) with 9x fewer parameters on the same sheet. On HumanEval it reaches 62.2% versus Llama 2 70B's 29.9%, a dramatic code generation improvement. Trained on 15T tokens (7.5x more than Llama 2), it proved that data scale can substitute for parameter count.
The cost per 1,000 tokens for the Llama 3 Instruct (8B) model with Telnyx Inference is $0.0002. For instance, if an enterprise were to analyze 1,000,000 customer chats, each averaging 1,000 tokens, the total cost would be $200.
Llama 3 8B Instruct handles conversational AI, code generation, and summarization tasks well at low compute cost. It is a practical choice for production deployments that need efficient inference.
Yes, Llama 3 8B is released under Meta's community license for free commercial use. Weights are available on Hugging Face for download.
Llama 3.1 8B expands the context window from 8K to 128K tokens and adds multilingual support. For new projects, Llama 3.1 8B is the better choice with its improved capabilities.
Llama 3 8B requires about 16GB of VRAM at full precision or 8GB with quantization. An RTX 3070 or above handles the quantized version for local development.