Most voice AI platforms lock you into their inference stack. If you already run models on Bedrock or your own GPUs, integrating them usually means rebuilding the whole telephony layer from scratch.
We removed that constraint. You can now run Telnyx Voice AI Assistants on any OpenAI-compatible model endpoint without losing carrier-grade voice performance.
You can point your AI Assistant at:
The only requirement: your endpoint must implement the OpenAI Chat Completions API. If it accepts /v1/chat/completions requests, it will work.
In the Mission Control Portal, create or edit an AI Assistant. Under the Agent tab, enable Use Custom LLM and provide:
/models, otherwise entered manuallyTelnyx validates the connection before saving. Once configured, your assistant routes LLM calls to your endpoint instead of Telnyx’s GPU clusters. Voice synthesis, speech recognition, and call control still run on Telnyx infrastructure.
You can test immediately in the portal or deploy to production numbers.
Our default architecture delivers sub-second round-trip time by running inference on GPUs co-located with our telephony core. Using an external LLM adds network hops.
If your inference server is near a Telnyx PoP (for example, us-east-1), expect an additional 20-50 ms. Remote or variable endpoints may add 100-300 ms per turn. That’s still faster than most stitched-together voice stacks but no longer guaranteed sub-200 ms.
When response speed is critical, use Telnyx’s built-in inference.
When model control, compliance, or cost take priority, bring your own endpoint.
This option fits when:
If none of those apply, Telnyx’s built-in LLM library (Llama, Mistral, Gemini, GPT-4) remains the fastest path.
Custom LLM support is available now for all Telnyx AI Assistant users.
Documentation: developers.telnyx.com/docs/inference/ai-assistants/custom-llm
Whether you’re optimizing for latency, cost, or compliance, you can now choose exactly where your model runs and keep everything else running on Telnyx.
Related articles