Inference

MiniMax M3 Runs Best on Telnyx Inference

We benchmarked MiniMax M3 across Telnyx, Together AI, and Fireworks. Same model, same prompts, same streaming setup. Telnyx delivered the fastest E2E latency, 21% higher throughput, and the most consistent performance.

By Sonam Gupta, PhD

We benchmarked MiniMax M3 across three inference providers: Telnyx, Together AI, and Fireworks. Same model. Same prompts. Same streaming setup. Telnyx delivered the fastest end-to-end latency, the highest throughput, and the most consistent performance of the three.

This is not a cherry-picked result. MiniMax M3 won on the metric that matters most for production workloads: complete response time. It also won on raw generation speed. And the reliability gap vs. Fireworks was wide enough to be a story on its own.

What we tested

Six prompt profiles ranging from short-input/short-output (1k tokens in, 100 out) to long-input/long-output (100k in, 1k out). Ten streamed runs per provider per profile. 480 total successful requests across Telnyx, Together AI, and Fireworks.

We measured three things:

  • E2E latency: time from request to final token. The metric that reflects what users actually experience.
  • TTFT: time to first token. Matters for interactive feedback and voice agents.
  • Throughput: tokens per second after the first token arrives. Matters for long generations, batch processing, and cost per token.

E2E latency: Telnyx finishes first

MiniMax M3 E2E latency comparison across Telnyx, Together AI, and Fireworks

Telnyx MiniMax M3 posted a 7.56s median E2E across all six profiles. Together AI came in at 7.77s. Fireworks at 8.52s.

Telnyx won 4 of 6 profiles outright, including all three long-output scenarios (1k/1k, 10k/1k, 100k/1k). These are the profiles that reflect real production workloads: agents generating substantive responses, not just returning a single sentence.

p50 E2E by profile:

ProviderModel1k/10010k/100100k/1001k/1k10k/1k100k/1k
TelnyxMiniMax M31.70s2.67s5.85s10.12s8.11s13.30s
Together AIMiniMax M31.78s1.86s5.21s11.49s9.88s16.09s
FireworksMiniMax M36.29s2.04s4.87s12.78s9.26s25.48s

On 1k input with 1k output, Telnyx was 12% faster than Together AI and 21% faster than Fireworks. On 100k input with 1k output, Telnyx was 17% faster than Together AI and 48% faster than Fireworks.

The short-output profiles tell a similar story at the top. Together AI edged Telnyx on the 10k/100 profile (1.86s vs. 2.67s), but Telnyx won the 1k/100 profile at 1.70s. Fireworks was slowest on short prompts at 6.29s for 1k/100.

Throughput: 21% faster token generation

MiniMax M3 throughput comparison: Telnyx delivers 21% higher tokens per second

Telnyx MiniMax M3 delivered 144.9 tok/s median throughput. Together AI managed 119.8 tok/s. That is a 21% gap in generation speed, and it showed up consistently across profiles.

p50 throughput by profile:

ProviderModel1k/10010k/100100k/1001k/1k10k/1k100k/1k
TelnyxMiniMax M3171.1170.1123.4141.6174.6130.4
Together AIMiniMax M3110.7144.9124.4118.4157.496.0

Telnyx led on 5 of 6 profiles. The one exception, 100k/100, was essentially tied (123.4 vs. 124.4 tok/s). On every other profile, the gap ranged from 11% to 55% in Telnyx's favor.

Higher throughput means faster responses for long outputs, more requests handled per GPU hour, and lower effective cost per token. This is where infrastructure ownership shows up in the numbers.

Reliability: the gap that matters in production

Speed is half the story. Consistency is the other half.

Telnyx MiniMax M3 had a p95 E2E of 14.72s and a max of 19.85s. Fireworks MiniMax M3 had a p95 of 54.06s and a max of 111.51s. Same model. Same test. Fireworks' worst request took nearly two minutes.

This is not a rounding difference. A 111s max means a production user waited almost two minutes for a response. On Telnyx, the worst case was under 20 seconds.

Reliability comparison:

Providerp50 E2Ep95 E2EMax E2E
Telnyx MiniMax M37.56s14.72s19.85s
Together AI MiniMax M37.77s19.35s21.42s
Fireworks MiniMax M38.52s54.06s111.51s

Together AI was close to Telnyx on median latency, but Telnyx had the tighter tail: 14.72s p95 and 19.85s max versus Together's 19.35s p95 and 21.42s max. Fireworks had a comparable p50 at 8.52s, but much wider tail latency, reaching 54.06s p95 and 111.51s max.

For production workloads, the gap between p50 and tail latency is what determines whether performance feels consistently fast or occasionally breaks user expectations. In this MiniMax M3 comparison, Telnyx kept that gap the tightest.

TTFT: honest framing

Together AI MiniMax M3 had the lowest overall median TTFT at 1.57s, compared to Telnyx at 1.87s. Together AI won on a few short-prompt profiles.

But TTFT is a partial metric. It tells you how fast the first token arrives, not how fast the response completes. A provider that returns the first token in 0.5s but takes 30s to finish is worse for user experience than one that starts at 1.8s and finishes in 8s.

Telnyx won E2E where it counts: on the longer-output profiles that reflect real agent and application workloads. First token fast but whole response slow is a bad user experience. Telnyx starts competitively and finishes first.

Owned infrastructure is the differentiator

The same model on three providers produced three very different results. You don't need to optimize around the model, you need to optimize around the infrastructure it runs on.

Most inference providers rent GPU capacity from cloud vendors and resell it. That means shared tenancy, variable scheduling, and no control over the hardware layer. When your workload spikes, you compete with everyone else on the same GPUs. Tail latencies balloon and throughput drops unpredictably. You see it in the data: Fireworks hitting 111s on a single request, Together stalling mid-stream on other model runs.

Telnyx owns its GPU infrastructure. We control the scheduling, the quantization, and the network path from GPU to user. That control is why throughput is 21% higher, E2E is consistently faster, and the worst case stays under 20 seconds instead of creeping past two minutes.

If you are running production inference, you already know the pain: unpredictable latency that makes your agents feel broken, throughput ceilings that limit how many requests you can serve per GPU hour, and reliability gaps that force you to build retry logic and fallback chains just to paper over someone else's infrastructure problem.

Get started with Telnyx Inference

Run MiniMax M3, Kimi K2.6, GLM-5.1, and Qwen3-235B on serverless GPU infrastructure with predictable latency and published pricing. Available in US, EU, and APAC regions.

Get started on Telnyx today or contact sales for volume pricing and dedicated endpoints.

Methodology

  • Six prompt profiles: 1k/100, 10k/100, 100k/100, 1k/1k, 10k/1k, 100k/1k
  • 10 streamed runs per provider/model/profile cell
  • 480 successful requests in the clean dataset
  • Telnyx: MiniMaxAI/MiniMax-M3-MXFP8, moonshotai/Kimi-K2.6, zai-org/GLM-5.1-FP8
  • Together AI: MiniMaxAI/MiniMax-M3, moonshotai/Kimi-K2.6
  • Fireworks: accounts/fireworks/models/minimax-m3, accounts/fireworks/models/kimi-k2p6, accounts/fireworks/models/glm-5p1
  • Together AI GLM 5.1 excluded (requires dedicated endpoint, not serverless)
  • Fireworks Tier 1 data excluded due to heavy rate limiting; Tier 2 rerun completed 180/180 with no 429s
  • Fireworks throughput unavailable (rejected stream_options, no output token counts returned)
  • MiniMax M3 is a logical comparison across providers; exact model IDs differ (Telnyx runs MXFP8, others use their own serverless IDs)
Share on Social