Conversational AI

How Telnyx Fixed Voice AI Latency with Co-Located Infrastructure

Abishek Sharma
By Abhishek Sharma

A telephony provider. A transcription service. An LLM in some cloud region. A text-to-speech engine somewhere else. Each connection adds latency.

This results in awkward pauses that immediately signal "I'm talking to a bot."

Telnyx_Blog_Why Most Voice AI Platforms Feel Robotic_AI-Stack-Comparison@2x.webp

We've run voice infrastructure for over a decade, processing billions of call minutes for hospitals, financial institutions, and global tech platforms. What we found is that the biggest bottleneck in voice AI is the network hops between them.

The Problem:

Telnyx_Blog_Why Most Voice AI Platforms Feel Robotic_Latency-Scale@2x.webp Humans respond within 200 milliseconds in conversation. When response time exceeds 300-500ms, conversations feel unnatural. Above 1200ms? Users hang up.

Here's where most platforms fail. Each service in the chain adds 20-50ms of network delay before any AI processing happens.

A typical call flow:

  • User speaks to telephony provider (50ms)
  • Audio to transcription service (50ms)
  • Text to LLM in distant cloud (50ms)
  • Response to text-to-speech service (50ms)
  • Audio back to user (50ms)

That's 250ms minimum in network hops alone.

Now add STT processing (100-300ms), LLM inference (350-1000ms), and TTS synthesis (90-200ms). You're at 800ms - 1.5 seconds total.

Research shows customers hang up 40% more frequently when agents take longer than 1 second to respond.

Contact centers report lower satisfaction scores when delays exceed 500ms.

Recent benchmarks confirm the impact.

Telnyx_Blog_Why Most Voice AI Platforms Feel Robotic_Round-Trip-Comparison@2x.webp

Twilio's voice channel shows average latency of 950ms, reflecting the overhead of extensive carrier integrations.
Vonage faces similar challenges, with latency ranging from 800-1200ms.

Why Third-Party Telephony Makes It Worse

Most AI platforms treat telephony as an afterthought.

Something you bolt on via a third-party CPaaS provider. This adds another layer.

Base telephony latency within the same region is around 200ms. For global calls (Asia to US), that jumps to 500ms just for audio transport. If your phone number is registered in a different region, you're adding even more hops as the call routes through your "home" country's network.

When you chain together separate services for each step, delays stack up and unpredictability increases.

When one service hits rate limits or experiences a regional outage, the entire call fails.

Because different vendors handle different parts of the stack, troubleshooting becomes a finger-pointing exercise across multiple dashboards.

How We Solved It With Co-Located Infrastructure

Instead of stitching together services, we co-located our GPU infrastructure with our telephony network, in the same data centers.

Audio from an incoming call hits our transcription models, LLM inference, and text-to-speech engines without leaving our private network.

This means there are zero external API calls, no cross-cloud data transfer, and no unpredictable jitter from the public internet.

This enables a response time of less than one second from the moment a user finishes speaking until they hear our reply.

We've deployed this in our US, Australia and European (Paris) regions, with expansion to MENA underway.

Telnyx_Blog_Why Most Voice AI Platforms Feel Robotic_Private-Backbone@2x.webp

Each region gets dedicated GPU clusters positioned directly adjacent to our telephony core.

We are also a licensed carrier in 30+ markets and operate a private MPLS fiber backbone connecting 17 global points of presence.

So when you deploy a voice AI agent with us, audio is directly connected to our owned PSTN connections, gets processed by regional GPUs, and returns through the same network.

A private network solution like this can offer up to 40% reduction in call setup times and improved audio quality in challenging network environments.

With no vendor handoffs we also get a single observability plane from RTP packets to model outputs.

Compare this to platforms that rent infrastructure. They're routing audio through a CPaaS provider's gateway, public internet, a cloud provider's compute region, then another CPaaS hop back.

Each hop introduces jitter, potential packet loss, and variable latency you can't control.

Where This Actually Matters

If you're building demos or prototyping, latency might not matter yet. But for production voice AI agents handling real interactions at scale, it determines whether users trust the system.

In customer support, 800ms of lag causes users to talk over the agent, which breaks intent recognition and forces conversation loops.

For multi-turn workflows like appointment booking, the pauses make the agent feel slow, leading to abandoned calls.

This is critical in healthcare or finance, where sensitive data is handled, trust is paramount and easily eroded by delays. If an agent takes even half a second to acknowledge a query before checking an account balance or verifying a transaction, user trust is already lost.

How to Test This

If you're evaluating voice AI platforms, here's what we'd benchmark:

  • Real-world RTT.
    Make 100 concurrent calls over actual PSTN (mobile + landline).
    Measure p95 latency under load, not ideal conditions.
  • Barge-in handling.
    Can users interrupt mid-sentence without the agent cutting them off or getting confused?
  • Geographic variance.
    If your business is global, test from different regions.
    Does latency spike when calling from Europe to a US-hosted system?
  • Tool execution latency.
    Configure the agent to look up data in Salesforce and make an API call to your payment processor. Measure end-to-end time. If reading a customer record takes 2 seconds, the agent can't have a natural conversation.
  • Failure recovery.
    Inject 2-5% packet loss. Real mobile networks drop packets. How gracefully does the system recover?

We've measured latencies ranging from 200ms (co-located stacks like ours) to 1500ms+ (platforms relying on multiple third-party services). The difference shows up immediately in user behavior.

Sub second roundtrip latency is the difference between a system users trust and one they hang up on.

We built the only stack that delivers it consistently because we own every layer from the PSTN to the GPU.

How are you benchmarking latency in your own stack?.

Share on Social

Related articles

Sign up and start building.