Nous Hermes 2 Mixtral 8x7B

Nous Research's mixture-of-experts model built on Mixtral 8x7B with DPO training, offering strong multilingual reasoning and content generation at 32k context.

about

Built on Mistral's sparse mixture-of-experts architecture that routes each token through 2 of 8 expert networks, this Nous Research fine-tune keeps only 12.9B of its 46.7B total parameters active per forward pass. The DPO alignment stage improved its MT-Bench score over the base Mixtral Instruct while preserving the model's efficiency advantage over dense 70B-class alternatives.

Licenseapache-2.0
Context window(in thousands)32768

Use cases for Nous Hermes 2 Mixtral 8x7B

  1. Cost-efficient multilingual reasoning: With 12.9B active parameters from 46.7B total via sparse expert routing, it delivers strong multilingual performance at a fraction of the compute cost of dense 70B models.
  2. DPO-aligned content generation: Direct Preference Optimization training improves MT-Bench scores over the base Mixtral Instruct, producing more helpful and coherent long-form content.
  3. 32K-context document analysis: The full 32K context window with DPO alignment enables it to process and summarize lengthy technical documents, research papers, and legal texts in a single pass.

Quality

Arena Elo1084
MMLUN/A
MT BenchN/A

Nous Hermes 2 Mixtral 8x7B DPO scores 72.3% on MMLU, slightly above the base Mixtral 8x7B Instruct (70.6%) on the same sheet after DPO alignment. Its Arena ELO of 1,084 is about 30 points below the base Mixtral Instruct (1,114), a common pattern where DPO improves knowledge scores but can shift chat preference. It matches GPT-3.5 Turbo (70.0% MMLU) at the open-weight tier.

GPT-3.5 Turbo

1105

Llama 2 Chat 70B

1093

Nous Hermes 2 Mixtral 8x7B

1084

Hermes 2 Pro Mistral 7B

1074

Mistral 7B Instruct v0.2

1072

pricing

The cost of running the model with Telnyx Inference is $0.0003 per 1,000 tokens. To put this into perspective, analyzing 1,000,000 customer chats, assuming each chat is 1,000 tokens long, would cost $300.

What's Twitter saying?

  • Strong puzzle-solving performance: Private LLM notes that the Nous Hermes 2 Mixtral 8x7B DPO model outperforms GPT-4 on certain puzzles, highlighting its particular strengths in specific problem-solving tasks.
  • Exceptional benchmark results: The model achieved impressive scores across major benchmarks—75.70% on GPT4All, 46.05% on AGIEval, and 49.70% on BigBench—significantly surpassing the base Mixtral model and even MistralAI's flagship Mixtral Finetune.
  • Quantized versions gaining traction: Teknium announced multiple quantized versions of Hermes Mixtral, including SFT+DPO GGUF and SFT GGUF variants, indicating developer interest in optimized forms of the model for different deployment scenarios.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai
Model NameDeepSeek-R1-Distill-Qwen-14B
Taskstext generation
Languages SupportedEnglish
Context Length43,000
Parameters14.8B
Model Tiermedium
Licensedeepseek

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

RESOURCES

Get started

Check out our helpful tools to help get you started.

  • Icon Resources ebook

    Test in the portal

    Easily browse and select your preferred model in the AI Playground.

  • Icon Resources Docs

    Explore the docs

    Don’t wait to scale, start today with our public API endpoints.

  • Icon Resources Article

    Stay up to date

    Keep an eye on our AI changelog so you don't miss a beat.

Sign up and start building

faqs

What is Nous Hermes 2 Mixtral 8x7B DPO?

Nous Hermes 2 Mixtral 8x7B DPO is a mixture-of-experts model from Nous Research, built on Mistral's Mixtral 8x7B architecture with DPO training. It features 56 billion total parameters with only a subset active per token, providing strong performance at efficient inference costs.

How does Nous Hermes 2 Mixtral compare to GPT-4?

Nous Hermes 2 Mixtral 8x7B performs admirably on translation and complex topic understanding, and outperforms GPT-4 in certain areas like puzzles and roleplay. For general-purpose tasks, GPT-4 remains stronger overall. The model is available on multiple inference platforms as a free open-source alternative.

What is Nous Hermes 2 Mixtral good for?

The model excels at multilingual reasoning, content generation, and conversational AI with a 32K context window. Its DPO training gives it strong instruction-following capabilities and improved response quality compared to the base Mixtral model.

Is Nous Hermes 2 Mixtral free?

Yes, Nous Hermes 2 Mixtral 8x7B DPO is open-source and free to use. It is available on Hugging Face and through inference providers like Ollama and OpenRouter.

#