Global inference. Local data.

Access GLM-5.2, Kimi K2.6, and MiniMax-M3 on dedicated, globally deployed GPUs. Cost-effective, OpenAI-compatible, no infrastructure management required.

CiscoOpenAITalkdeskAmerican Red CrossZillowMicrosoftCosmoIBMState of IowaCiscoOpenAITalkdeskAmerican Red CrossZillowMicrosoftCosmoIBMState of Iowa
AGENT RUNTIME

Frontier models that earn their place

Hosted models are chosen deliberately, not to fill a dropdown. Kimi K2.6 for real-time voice AI, GLM-5.2 for dev work, and MiniMax-M3 for cost.

Loading...
WHY TELNYX

The shift to open-weight

Open-weight models are leaving closed-source behind. Same quality, fraction of the cost. Telnyx hosts OS models on GPU infrastructure we own, so there's no cloud provider markup in your per-token price. Switch from closed-source models and save up to 75%, no compromise on quality, no vendor lock-in.

FEATURES

Production-ready inference APIs

OpenAI-compatible endpoints that work with your existing SDK and deploy globally.

  • In-region deployment

    Inference runs in the Americas, Europe, and APAC with MENA and LATAM coming soon. Your data stays where your users are, and stays private.

  • OpenAI-compatible API

    Save up to 75% on your inference bills using your existing OpenAI SDK by changing the base URL to access open-source models.

  • Function calling

    Connect LLMs to external tools and APIs to build agents that take action, not just generate text.

  • Autoscaling

    Dedicated GPUs handle concurrent requests and scale automatically with your workload, no capacity planning or cold starts to worry about.

  • Fine-tuning

    Customize models with your own data via the Fine-Tuning API using the same infrastructure and API key.

  • Structured output

    JSON mode and regex constraints ensure inference output conforms to your schema for production-grade reliability.

HOW IT WORKS

Migrate in minutes

OpenAI-compatible. Change your base URL, that's it.

curl -i -X POST "https://api.telnyx.com/v2/ai/chat/completions" \
     -H "Authorization: Bearer $TELNYX_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
       "model": "kimi-k2-5",
       "messages": [{"role": "user", "content": "Hello, World!"}]
     }'
PRICING

Transparent pricing, massive savings

Starting at $0.21 per 1M tokens. No GPU rental fees, no compute surcharges, no minimums.

$0.21

Starting cost per 1M tokens

PRODUCTS

Building AI that reaches beyond the chat?

Your AI doesn't have to stop at text. Telnyx runs text-to-speech, voice AI, and telephony on the same infrastructure. Same API key, same network, same bill.

Sign up and start building.

Test frontier models running on edge compute. Telnyx gives you the infrastructure and support to deploy inference workloads globally from one platform.

Sign up for Telnyx Inference

FAQ

Inference APIs let you send prompts to a deployed model and get predictions back over HTTP, without managing GPU hardware yourself. They wrap model serving behind a standard chat completions interface so any application can generate text, embeddings, or function calls on demand.

Inference APIs let you send prompts to a deployed model and get predictions back over HTTP, without managing GPU hardware yourself. They wrap model serving behind a standard chat completions interface so any application can generate text, embeddings, or function calls on demand.