Conversational AI

Our Early Benchmarks on ElevenLabs vs MiniMax TTS and Why It Matters

MiniMax Speech 2.6 matched or exceeded ElevenLabs in stability, pacing, and cost, making it a strong option for large-scale Voice AI.

By Abhishek Sharma

Teams keep asking why we added MiniMax Speech 2.6 text-to-speech model to the Telnyx stack. The short answer is that we tested it, compared it to ElevenLabs, and found a gap in cost and control that is hard to ignore.

This post walks through what we saw in early benchmarks, how the pricing models differ, and why MiniMax is becoming a practical choice for high-volume voice AI workloads.

The setup

We conducted comparative, side-by-side tests of MiniMax Speech 2.6 and ElevenLabs V3 Alpha.

Both models were evaluated using the same prompts, identical sampling parameters, and comparable voice profiles. The following sections show the prompts and summarize performance differences based on the resulting audio outputs.

Scenario 1: Rescheduling an appointment

Prompt:
“Thanks for your patience while I pull up the details. Before we finalize anything, I want to walk you through how this process works so there aren’t any surprises. When we reschedule an appointment, the system automatically checks for conflicts on your account, updates the confirmation settings, and sends a new reminder about two hours before the visit. If you need to cancel again later, that’s completely fine.. just tell me and I’ll take care of it. The important thing is that everything stays accurate so you don’t miss any deadlines. Okay, I’m ready when you are. Do you want to confirm the new time now?”

MiniMax Audio

ElevenLabs Audio

Verdict:
MiniMax uses a steady, single-line delivery for each clause (such as “checks for conflicts”), which turns the paragraph into a structured sequence that is easy for callers to track. ElevenLabs introduces more vocal rise and fall, emphasizing reassurance lines and the final question. MiniMax fits cases where clarity and step-order matter, while ElevenLabs leans toward warmth and expressiveness.

Scenario 2: Support-style reassurance

Prompt:
“I know this has been frustrating, and I appreciate you sticking with it. The good news is that we’ve already fixed most of the issue on our side. Just give me a moment to double-check one last thing. You’re all set now. If something doesn’t look right, just tell me and we’ll sort it out together.”

MiniMax Audio

ElevenLabs Audio

Verdict:
MiniMax uses a restrained emotional range, acknowledging frustration without over-emphasizing sentiment. It sounds like a seasoned support agent focused on brevity and respect. ElevenLabs adds more personality and brightness. For support experiences where tone should remain neutral and the assistant shouldn’t feel stylized, MiniMax offers a natural guardrail.

Scenario 3: Billing and account details

Prompt:
“I’ve pulled up your account. Your current balance is $142.75, and the next automatic payment is scheduled for February 12 at 9 a.m. If you’d like to switch the payment method, I can update it now. I can also email a full statement to you. What would you like to do next?”

MiniMax Audio

ElevenLabs Audio

Verdict:
Both models perform well, but MiniMax’s consistent spacing around numbers (balance, date, time) reduces the number of caller replays. In high-volume billing lines, fewer repeats translate to lower call duration and cost. ElevenLabs sounds more melodic but introduces small variations that may increase replay requests.

Public benchmark performance

MiniMax’s Speech-2.6HD currently holds a leading position on the global text-to-speech leaderboard. In a blind test conducted by Artificial Analysis’ Speech Arena on Hugging Face Spaces, thousands of listeners compared paired samples, and MiniMax consistently ranked above ElevenLabs’ top models for naturalness and prosody stability.

Screenshot 2025-12-10 121037.png

We also reviewed accuracy and quality signals from the Hugging Face leaderboard more broadly. MiniMax performs well across multilingual tasks and maintains coherence in long passages, which matches what we observed in our internal tests.

Pricing: where the gap starts

MiniMax is priced at $30 per million characters. ElevenLabs charges more for equivalent volume once past entry-tier limits. When converted to cost per generated minute, MiniMax becomes more predictable for large deployments.

For teams producing millions of characters per day, this difference becomes material. ElevenLabs’s strong brand recognition does not offset the compounding cost impact when the workload involves continuous real-time generation.

Telnyx integrates MiniMax as a first-class option in the Voice AI stack, which streamlines routing and reduces the operational overhead of stitching external TTS services into real-time calling flows.

Developers can still use ElevenLabs by bringing their own API key, but MiniMax offers lower cost and consistent output quality in most production scenarios.

Audio quality observations

ElevenLabs is often recognized for its expressive English voices, yet MiniMax Speech 2.6 demonstrated stronger stability and more reliable intonation in our testing. It produced cleaner transitions in multi-turn exchanges, avoided prosody issues in longer passages, and kept emphasis and pacing consistent when handling structured information.

These patterns held across different prompt styles. While ElevenLabs sometimes delivered richer expressiveness, MiniMax’s steadier output is often better suited for agents where clarity and accuracy take precedence.

Latency and performance considerations

MiniMax's Turbo variant supports fast synthesis, often under 250 ms. This is suitable for interactive voice experiences where delays above a few hundred milliseconds break the conversational rhythm.

Telnyx’s routing model reduces the overhead typically introduced by managing multiple external inference calls, and MiniMax’s integration as a supported TTS unnecessary orchestration layers to improve end-to-end responsiveness for real-time calls.

What we learned

| **Category** | **MiniMax Speech 2.6** | **ElevenLabs V3 Alpha** | | --- | --- | --- | | Long-form stability | Consistent pacing and prosody across long passages | Greater variation in pacing over longer outputs | | Number and date clarity | Clean spacing around numbers, dates, and times | Occasional variation in numeric delivery | | Expressiveness | Neutral and restrained | More expressive and stylized | | Best-fit use cases | Support, billing, scheduling, structured voice flows | Branded voices, expressive agents | | Latency | ~250 ms with Turbo variant | Higher and more variable depending on configuration | | Pricing model | $30 per million characters | Higher effective cost beyond entry-tier limits | | Cost predictability at scale | High | Lower as volume increases | | Native Telnyx integration | Yes | Bring-your-own API key |

MiniMax Speech 2.6 is competitive with ElevenLabs V3 Alpha in audio quality and often performs better in long-form stability and structured information delivery. Its pricing model is significantly more favorable for teams running large-scale voice AI workloads.

This does not replace ElevenLabs. It expands the set of high-quality TTS options available for developers building real-time voice systems. For teams sensitive to scale-driven cost and latency, MiniMax offers a strong alternative.

Want to compare TTS models and results? Join our subreddit.

Share on Social

Abhishek Sharma

Sr Technical Product Marketing Manager

Senior Technical Product Marketing Manager

Our Early Benchmarks on ElevenLabs vs MiniMax TTS and Why It Matters

The setup

Scenario 1: Rescheduling an appointment

MiniMax Audio

ElevenLabs Audio

Scenario 2: Support-style reassurance

MiniMax Audio

ElevenLabs Audio

Scenario 3: Billing and account details

MiniMax Audio

ElevenLabs Audio

Public benchmark performance

Pricing: where the gap starts

Audio quality observations

Latency and performance considerations

What we learned

Jump to:

Sign up for emails of our latest articles and news

Ask AI