Built on Mistral's sparse mixture-of-experts architecture that routes each token through 2 of 8 expert networks, this Nous Research fine-tune keeps only 12.9B of its 46.7B total parameters active per forward pass. The DPO alignment stage improved its MT-Bench score over the base Mixtral Instruct while preserving the model's efficiency advantage over dense 70B-class alternatives.
Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.
| Organization | Model Name | Tasks | Languages Supported | Context Length | Parameters | Model Tier | License |
|---|---|---|---|---|---|---|---|
| No data available at this time, please try again later. |
Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.
Nous Hermes 2 Mixtral 8x7B DPO is a mixture-of-experts model from Nous Research, built on Mistral's Mixtral 8x7B architecture with DPO training. It features 56 billion total parameters with only a subset active per token, providing strong performance at efficient inference costs.
Nous Hermes 2 Mixtral 8x7B performs admirably on translation and complex topic understanding, and outperforms GPT-4 in certain areas like puzzles and roleplay. For general-purpose tasks, GPT-4 remains stronger overall. The model is available on multiple inference platforms as a free open-source alternative.
Nous Hermes 2 Mixtral 8x7B DPO scores 72.3% on MMLU, slightly above the base Mixtral 8x7B Instruct (70.6%) on the same sheet after DPO alignment. Its Arena ELO of 1,084 is about 30 points below the base Mixtral Instruct (1,114), a common pattern where DPO improves knowledge scores but can shift chat preference. It matches GPT-3.5 Turbo (70.0% MMLU) at the open-weight tier.
The cost of running the model with Telnyx Inference is $0.0003 per 1,000 tokens. To put this into perspective, analyzing 1,000,000 customer chats, assuming each chat is 1,000 tokens long, would cost $300.
The model excels at multilingual reasoning, content generation, and conversational AI with a 32K context window. Its DPO training gives it strong instruction-following capabilities and improved response quality compared to the base Mixtral model.
Yes, Nous Hermes 2 Mixtral 8x7B DPO is open-source and free to use. It is available on Hugging Face and through inference providers like Ollama and OpenRouter.