Conversational AI

Run Telnyx Voice AI Assistants with Any OpenAI-Compatible LLM

Use your own OpenAI-compatible model with Telnyx Voice AI. Bedrock, Azure, or self-hosted LLMs now supported.

Most voice AI platforms lock you into their inference stack. If you already run models on Bedrock or your own GPUs, integrating them usually means rebuilding the whole telephony layer from scratch.

We removed that constraint. You can now run Telnyx Voice AI Assistants on any OpenAI-compatible model endpoint without losing carrier-grade voice performance.

What’s now supported

You can point your AI Assistant at:

AWS Bedrock - Use Anthropic, Meta, Cohere, or Mistral models under your existing AWS contract
Azure OpenAI Service - Route to Microsoft-hosted GPT models where regional or compliance rules apply
Self-hosted inference servers - Run vLLM, sglang, or TGI on your own GPUs
Custom fine-tuned models - Use models trained on proprietary data such as support logs, product catalogs, or internal documentation
Any OpenAI-compatible endpoint - Platforms like Baseten, Replicate, or together.ai work out of the box

The only requirement: your endpoint must implement the OpenAI Chat Completions API. If it accepts /v1/chat/completions requests, it will work.

How it works

In the Mission Control Portal, create or edit an AI Assistant. Under the Agent tab, enable Use Custom LLM and provide:

Base URL - The public endpoint for your inference server
API Key - Stored securely as an Integration Secret
Model Name - Auto-populated if your endpoint supports /models, otherwise entered manually

Telnyx validates the connection before saving. Once configured, your assistant routes LLM calls to your endpoint instead of Telnyx’s GPU clusters. Voice synthesis, speech recognition, and call control still run on Telnyx infrastructure.

You can test immediately in the portal or deploy to production numbers.

The latency trade-off

Our default architecture delivers sub-second round-trip time by running inference on GPUs co-located with our telephony core. Using an external LLM adds network hops.

If your inference server is near a Telnyx PoP (for example, us-east-1), expect an additional 20-50 ms. Remote or variable endpoints may add 100-300 ms per turn. That’s still faster than most stitched-together voice stacks but no longer guaranteed sub-200 ms.

When response speed is critical, use Telnyx’s built-in inference.

When model control, compliance, or cost take priority, bring your own endpoint.

When this matters

This option fits when:

You have existing Bedrock, Azure, or GCP credits and want to use them for inference
Inference must stay within a specific region for compliance
Your models are fine-tuned on proprietary data that can’t leave your environment
You’re running high-volume workloads on self-managed GPU infrastructure

If none of those apply, Telnyx’s built-in LLM library (Llama, Mistral, Gemini, GPT-4) remains the fastest path.

Availability

Custom LLM support is available now for all Telnyx AI Assistant users.

Documentation: developers.telnyx.com/docs/inference/ai-assistants/custom-llm

Whether you’re optimizing for latency, cost, or compliance, you can now choose exactly where your model runs and keep everything else running on Telnyx.

Building Voice AI assistants with Telnyx? Join our subreddit.

Share on Social

Run Telnyx Voice AI Assistants with Any OpenAI-Compatible LLM

What’s now supported

How it works

The latency trade-off

When this matters

Availability

Jump to:

Sign up for emails of our latest articles and news

Sign up and start building.