A head-to-head latency benchmark of three leading inference providers across 540 streamed requests.
A head-to-head latency benchmark of Telnyx, Together.ai, and Fireworks.ai across 540 streamed requests on three frontier open-weight models.
We ran 540 streamed chat completions across three inference providers (Telnyx, Together.ai, and Fireworks.ai) on three open-weight models (Kimi K2.6, GLM-5.1, and MiniMax-M2.7) from a single US-region host. Here's what matters:
Related articles
Time-to-first-token (TTFT) is the most commonly cited inference benchmark. For some workloads like voice AI, or real-time agents, it's the right metric. For others, batch processing, agentic chains, end-to-end latency (E2E) and throughput matter more. The question isn't which metric is better. It's which metric maps to what you're building.
Our benchmark found a consistent pattern: providers that win on TTFT don't always win on end-to-end latency (E2E).
The clearest example: GLM-5.1 at 10k input, 1k output.
| Provider | TTFT (p50) | E2E (p50) | Throughput |
|---|---|---|---|
| Fireworks | 1,672 ms | 40,156 ms | 31.9 tok/s |
| Together | 1,472 ms | 27,328 ms | 57.4 tok/s |
| Telnyx | 1,346 ms | 15,946 ms | 83.4 tok/s |
Fireworks delivers the first token in 1.7 seconds. But the full response takes over 40 seconds. Telnyx delivers the full response in under 16 seconds, 2.5x faster than Fireworks, 1.7x faster than Together.
If you're building a real-time product, your users don't experience "first token." They experience the full answer. E2E is the metric that maps to user experience. Throughput is the metric that maps to cost-per-token at scale.
When evaluating inference providers, ask:
Voice AI is the clearest example of why TTFT matters. When a user speaks to an agent, every millisecond of first-token delay is dead air. The response doesn't stream in progressively like a chatbot, the user is waiting for the agent to start talking.
That's why Kimi K2.6 is the model we recommend for voice and real-time applications. Its non-reasoning mode stays highly intelligent while delivering lower TTFT than GLM-5.1. If you're building voice AI, Kimi K2.6 on Telnyx is the right tool.
This is where the gap is widest. Telnyx wins on E2E latency at every single profile, short and long output, small and large context.
Long-output workloads (1k output target):
| Profile | Telnyx E2E | Together E2E | Fireworks E2E | Telnyx Throughput | Together Throughput |
|---|---|---|---|---|---|
| 1k input, 1k output | 8,331 ms | 36,362 ms | 11,453 ms | 152 tok/s | 33 tok/s |
| 10k input, 1k output | 8,990 ms | 41,094 ms | 10,604 ms | 145 tok/s | 29 tok/s |
| 100k input, 1k output | 11,065 ms | 49,838 ms | 13,924 ms | 124 tok/s | 27 tok/s |
Telnyx completes MiniMax-M2.7 long-output requests 3-6x faster than Together and slightly faster than Fireworks. At 100k input, Together takes nearly a full minute; Telnyx finishes in 11 seconds.
Short-output workloads: Same story. Telnyx E2E ranges from 1.2-2.3 seconds. Together is 3-5.6 seconds. Fireworks is 1.7-2.9 seconds.
The throughput gap: 125-170 tok/s on Telnyx vs 27-42 tok/s on Together. Together's FP4 quantization doesn't compensate — their throughput is a fraction of Telnyx's FP8.
Verdict: If you're running MiniMax-M2.7, the provider choice isn't close. Telnyx is faster, more consistent, and delivers 3-6x the throughput.
GLM-5.1 tells the "TTFT vs E2E" story best.
Fireworks is consistently the fastest to first token on GLM-5.1 at short contexts. But that early lead evaporates on longer outputs because Fireworks' effective throughput is dramatically lower.
Throughput comparison (tok/s, p50):
| Profile | Telnyx | Together | Fireworks |
|---|---|---|---|
| 1k in, 100 out | 109 | 81 | 44 |
| 1k in, 1k out | 94 | 62 | 36 |
| 10k in, 100 out | 113 | 89 | 51 |
| 10k in, 1k out | 83 | 57 | 32 |
| 100k in, 100 out | 84 | 71 | 59 |
| 100k in, 1k out | 82 | 53 | 39 |
Telnyx delivers 81-113 tok/s on GLM-5.1 vs 32-59 tok/s on Fireworks. That's roughly 2x the throughput at every profile. For workloads generating longer outputs, this compounds into massive E2E differences:
Verdict: Fireworks may give you the first token faster, but Telnyx gives you the full answer faster, by a factor of 2-2.5x on production-length outputs.
Kimi K2.6 is the most evenly matched. Fireworks leads TTFT consistently. E2E is closer:
| Profile | Telnyx E2E | Together E2E | Fireworks E2E |
|---|---|---|---|
| 1k in, 100 out | 1,754 ms | 1,901 ms | 1,242 ms |
| 1k in, 1k out | 10,212 ms | 28,304 ms | 11,026 ms |
| 10k in, 1k out | 10,878 ms | 14,458 ms | 9,582 ms |
| 100k in, 1k out | 13,741 ms | 23,960 ms | 12,602 ms |
Fireworks has a slight edge on short-output E2E. But on long-output, the gap between Telnyx and Fireworks is small (within 10-15%), while Together falls significantly behind.
Throughput is competitive across all three providers on Kimi, with Telnyx and Fireworks trading the lead depending on profile.
Verdict: Kimi K2.6 is the model to reach for when you're building voice agents or real-time applications. Its non-reasoning mode is still highly intelligent, and it delivers lower TTFT than GLM-5.1, which is the metric that matters most when your users are waiting for an agent to speak. For voice AI, the TTFT advantage plus regional availability and data sovereignty make Telnyx the clear choice.
Latency averages tell one story. Tail behavior tells another.
We flagged every cell where a single run exceeded 5x the cell's median:
| Provider | Outlier cells (max > 5x median) | Worst single event |
|---|---|---|
| Together | 15 | 206-second mid-stream stall (GLM-5.1) |
| Telnyx | 4 | 36.7s E2E on MiniMax 100k input (median: 2.3s) |
| Fireworks | 3 | 12.1s TTFT on Kimi 100k input (median: 1.2s) |
The Together GLM-5.1 event: On a 100k-input, 1k-output request, the stream produced 423 chunks over 194 seconds. Then a 143-second gap appeared between chunks 363 and 364, after which streaming resumed normally. This wasn't a connection issue. Data flowed on both sides of the gap. It was a mid-stream stall inside Together's infrastructure.
For a chatbot, a 143-second pause is a broken experience. For an agent making sequential LLM calls, it's a cascading delay. For a voice AI pipeline, it's a dropped call.
Also notable: Together's FP4 quantization was expected to deliver throughput advantages over FP8. It didn't. On both GLM-5.1 and MiniMax-M2.7, Together's FP4 delivered lower throughput than Telnyx's FP8.
| If you care about... | Choose... | Why |
|---|---|---|
| MiniMax-M2.7 performance | Telnyx | 3-6x faster E2E, 3-6x throughput vs Together |
| GLM-5.1 throughput | Telnyx | 2x throughput advantage vs Fireworks at all profiles |
| Voice AI and real-time | Telnyx | Kimi K2.6 has lowest TTFT on our platform + regional availability + data sovereignty |
| Production reliability | Telnyx or Fireworks | Together had 15 outlier cells vs 4 and 3 |
| Long-output workloads | Telnyx | TTFT advantage doesn't carry through to E2E on competitors |
| Regional availability | Telnyx | Serverless in US, EU, APAC (Dubai + São Paulo coming) |
| Data sovereignty | Telnyx | In-region compute by default; competitors are US-concentrated |
| Kimi K2.6 TTFT vs E2E | Fireworks | Fireworks leads raw TTFT, Telnyx is within 10-15% on E2E; voice AI ecosystem tilts toward Telnyx |
This benchmark was conducted on April 23, 2026. Results reflect provider performance at that time. Inference infrastructure changes frequently, we recommend running your own benchmarks for production decisions. Raw data and methodology are available on request.