HuggingFace's H4 team trained this model using distilled alignment instead of human RLHF, fine-tuning on UltraChat dialogues ranked by GPT-4 feedback through Direct Preference Optimization. It scored a 90.60% win rate on AlpacaEval, the highest of any 7B chat model at release, proving that AI-ranked preference data could substitute for human annotation at this scale.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Zephyr 7B Beta is an alignment-tuned model built on Mistral 7B by Hugging Face's H4 team, using DPO training for improved instruction following. It is available on Hugging Face and through inference providers.
Zephyr 7B Beta's main limitations are its 7B parameter size, which constrains complex reasoning, and its older training data. Newer models like Mistral 7B v0.2 and have since surpassed it on most benchmarks.
Zephyr 7B Beta scores 61.1% on MMLU and 7.34 on MT-Bench, outperforming Mistral 7B Instruct v0.1 (56.3% MMLU, 6.84 MT-Bench) on the same sheet across both measures. Its 90.6% win rate on AlpacaEval was the highest of any 7B model at release, achieved through DPO on GPT-4-ranked synthetic feedback rather than human RLHF.
The cost per 1,000 tokens for running the model with Telnyx Inference is $0.0002. To illustrate, if an organization were to analyze 1,000,000 customer chats, and each chat consisted of 1,000 tokens, the total cost would be $200.
Zephyr 7B Beta is released under the MIT license, making it completely free for commercial and research use. It can be downloaded from Hugging Face or accessed through hosted inference.
Zephyr 7B Beta is built on Mistral 7B but adds DPO alignment for better instruction following and chat quality. The base Mistral model is more versatile for fine-tuning, while Zephyr is ready for conversational use out of the box.
Yes, Zephyr 7B Beta runs on consumer GPUs with 8GB+ VRAM using quantized formats. Ollama provides the simplest local deployment path.