Nous Research applied a two-stage training process to Mistral 7B, first fine-tuning on roughly one million GPT-4-generated synthetic instructions via the OpenHermes dataset, then aligning with Direct Preference Optimization. It uses the ChatML prompt format for structured multi-turn dialogue and runs in under 5GB of VRAM with 4-bit quantization.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Nous Hermes 2 Mistral 7B DPO builds on the strong Mistral 7B base with DPO alignment training, improving instruction following and reducing harmful outputs. It is a competitive open-weight model for chat and reasoning tasks at the 7B scale.
Yes, Mistral 7B uses a decoder-only transformer architecture, which is standard for autoregressive text generation models. Nous Hermes 2 inherits this architecture and adds DPO-based alignment for improved instruction following.
Nous Hermes 2 Mistral 7B DPO scores 63.4% on MMLU, a 3-point improvement over the base Mistral 7B Instruct v0.1 (56.3%) on the same sheet after DPO alignment on GPT-4-generated instruction data. It trails Gemma 7B IT (64.3%) by about 1 point on MMLU but runs in under 5GB VRAM with 4-bit quantization, offering a trade-off between benchmark quality and deployment efficiency.
The cost of running the model with Telnyx Inference is $0.0002 per 1,000 tokens. For instance, analyzing 1,000,000 customer chats, assuming each chat is 1,000 tokens long, would cost $200.
7B refers to 7 billion parameters, which defines the model's size and capacity. Nous Hermes 2 Mistral 7B DPO uses all 7.3 billion parameters of the base model while adding alignment training that improves its behavior as a conversational assistant.
The base Mistral 7B architecture supports a 32K token context window using sliding window attention. Nous Hermes 2 Mistral 7B DPO inherits this capacity, making it suitable for moderate-length document processing and multi-turn conversations.
Direct Preference Optimization (DPO) is an alignment technique that trains models to prefer helpful, accurate responses over harmful or incorrect ones without needing a separate reward model. Nous Research applied DPO to produce a better-aligned variant of Mistral 7B for instruction-following tasks.
Nous Hermes 2 significantly improves on base Mistral 7B for instruction following, chat quality, and task completion. The DPO alignment makes it more reliable for production conversational applications while maintaining the base model's efficiency.