We benchmarked MiniMax M3 across Telnyx, Together AI, and Fireworks. Same model, same prompts, same streaming setup. Telnyx delivered the fastest E2E latency, 21% higher throughput, and the most consistent performance.
We benchmarked MiniMax M3 across three inference providers: Telnyx, Together AI, and Fireworks. Same model. Same prompts. Same streaming setup. Telnyx delivered the fastest end-to-end latency, the highest throughput, and the most consistent performance of the three.
This is not a cherry-picked result. MiniMax M3 won on the metric that matters most for production workloads: complete response time. It also won on raw generation speed. And the reliability gap vs. Fireworks was wide enough to be a story on its own.
Six prompt profiles ranging from short-input/short-output (1k tokens in, 100 out) to long-input/long-output (100k in, 1k out). Ten streamed runs per provider per profile. 480 total successful requests across Telnyx, Together AI, and Fireworks.
We measured three things:

Telnyx MiniMax M3 posted a 7.56s median E2E across all six profiles. Together AI came in at 7.77s. Fireworks at 8.52s.
Telnyx won 4 of 6 profiles outright, including all three long-output scenarios (1k/1k, 10k/1k, 100k/1k). These are the profiles that reflect real production workloads: agents generating substantive responses, not just returning a single sentence.
p50 E2E by profile:
| Provider | Model | 1k/100 | 10k/100 | 100k/100 | 1k/1k | 10k/1k | 100k/1k |
|---|---|---|---|---|---|---|---|
| Telnyx | MiniMax M3 | 1.70s | 2.67s | 5.85s | 10.12s | 8.11s | 13.30s |
| Together AI | MiniMax M3 | 1.78s | 1.86s | 5.21s | 11.49s | 9.88s | 16.09s |
| Fireworks | MiniMax M3 | 6.29s | 2.04s | 4.87s | 12.78s | 9.26s | 25.48s |
On 1k input with 1k output, Telnyx was 12% faster than Together AI and 21% faster than Fireworks. On 100k input with 1k output, Telnyx was 17% faster than Together AI and 48% faster than Fireworks.
The short-output profiles tell a similar story at the top. Together AI edged Telnyx on the 10k/100 profile (1.86s vs. 2.67s), but Telnyx won the 1k/100 profile at 1.70s. Fireworks was slowest on short prompts at 6.29s for 1k/100.

Telnyx MiniMax M3 delivered 144.9 tok/s median throughput. Together AI managed 119.8 tok/s. That is a 21% gap in generation speed, and it showed up consistently across profiles.
p50 throughput by profile:
| Provider | Model | 1k/100 | 10k/100 | 100k/100 | 1k/1k | 10k/1k | 100k/1k |
|---|---|---|---|---|---|---|---|
| Telnyx | MiniMax M3 | 171.1 | 170.1 | 123.4 | 141.6 | 174.6 | 130.4 |
| Together AI | MiniMax M3 | 110.7 | 144.9 | 124.4 | 118.4 | 157.4 | 96.0 |
Telnyx led on 5 of 6 profiles. The one exception, 100k/100, was essentially tied (123.4 vs. 124.4 tok/s). On every other profile, the gap ranged from 11% to 55% in Telnyx's favor.
Higher throughput means faster responses for long outputs, more requests handled per GPU hour, and lower effective cost per token. This is where infrastructure ownership shows up in the numbers.
Speed is half the story. Consistency is the other half.
Telnyx MiniMax M3 had a p95 E2E of 14.72s and a max of 19.85s. Fireworks MiniMax M3 had a p95 of 54.06s and a max of 111.51s. Same model. Same test. Fireworks' worst request took nearly two minutes.
This is not a rounding difference. A 111s max means a production user waited almost two minutes for a response. On Telnyx, the worst case was under 20 seconds.
Reliability comparison:
| Provider | p50 E2E | p95 E2E | Max E2E |
|---|---|---|---|
| Telnyx MiniMax M3 | 7.56s | 14.72s | 19.85s |
| Together AI MiniMax M3 | 7.77s | 19.35s | 21.42s |
| Fireworks MiniMax M3 | 8.52s | 54.06s | 111.51s |
Together AI was close to Telnyx on median latency, but Telnyx had the tighter tail: 14.72s p95 and 19.85s max versus Together's 19.35s p95 and 21.42s max. Fireworks had a comparable p50 at 8.52s, but much wider tail latency, reaching 54.06s p95 and 111.51s max.
For production workloads, the gap between p50 and tail latency is what determines whether performance feels consistently fast or occasionally breaks user expectations. In this MiniMax M3 comparison, Telnyx kept that gap the tightest.
Together AI MiniMax M3 had the lowest overall median TTFT at 1.57s, compared to Telnyx at 1.87s. Together AI won on a few short-prompt profiles.
But TTFT is a partial metric. It tells you how fast the first token arrives, not how fast the response completes. A provider that returns the first token in 0.5s but takes 30s to finish is worse for user experience than one that starts at 1.8s and finishes in 8s.
Telnyx won E2E where it counts: on the longer-output profiles that reflect real agent and application workloads. First token fast but whole response slow is a bad user experience. Telnyx starts competitively and finishes first.
The same model on three providers produced three very different results. You don't need to optimize around the model, you need to optimize around the infrastructure it runs on.
Most inference providers rent GPU capacity from cloud vendors and resell it. That means shared tenancy, variable scheduling, and no control over the hardware layer. When your workload spikes, you compete with everyone else on the same GPUs. Tail latencies balloon and throughput drops unpredictably. You see it in the data: Fireworks hitting 111s on a single request, Together stalling mid-stream on other model runs.
Telnyx owns its GPU infrastructure. We control the scheduling, the quantization, and the network path from GPU to user. That control is why throughput is 21% higher, E2E is consistently faster, and the worst case stays under 20 seconds instead of creeping past two minutes.
If you are running production inference, you already know the pain: unpredictable latency that makes your agents feel broken, throughput ceilings that limit how many requests you can serve per GPU hour, and reliability gaps that force you to build retry logic and fallback chains just to paper over someone else's infrastructure problem.
Run MiniMax M3, Kimi K2.6, GLM-5.1, and Qwen3-235B on serverless GPU infrastructure with predictable latency and published pricing. Available in US, EU, and APAC regions.
Get started on Telnyx today or contact sales for volume pricing and dedicated endpoints.
Related articles