Inference (AI) APIs—Fast, flexible, cost-effective

ABOUT

Dedicated GPU infrastructure for fast inference

Many factors can influence how well AI models perform, including the hardware they run on. Top-tier model performance often demands substantial computational resources, creating a balancing act between cost efficiency and speed.

Our powerful network of owned GPUs delivers rapid inference for high performance without excessive costs or extended timelines. Combined with Telnyx Storage, you can easily upload your data into buckets for instant summarization and automatic embedding. Use data across proprietary and open-source models for the perfect balance of control, cost efficiency, and speed your business needs to stay ahead.

Try it out

Chat with an LLM

Utilize custom data in proprietary and open-source models, or build your own on dedicated GPU infrastructure for fast inference at low costs. Talk to an expert

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

FEATURES

Quick insights with fully-featured APIs

Confidently implement AI into your applications with dedicated infrastructure and distributed storage.

Instant embeddings
Data in AI-enabled storage buckets can be vectorized in seconds to feed LLMs for fast, contextualized inference.
Function Calling
Build smarter applications with function calling for open-source models.
Autoscaling
Count on our dedicated GPUs to handle a high volume of requests concurrently and scale automatically based on your workload to ensure optimal performance at all times.
JSON mode
Ensure your inference output conforms to a regular expression or JSON schema for specific applications.
Model flexibility
Choose the best model for your use case. We currently support models from OpenAI, Meta, and MosaicML—with more on the way.
Low latency
Go from data to inference in near-real time with the co-location of Telnyx GPUs and Storage.

BENEFITS

Scale confidently

Leverage our dedicated network of GPUs to scale your AI-powered services effortlessly.

>4K

GPUs

Cost-effective

Thanks to our dedicated infrastructure, Telnyx users can save over 40% compared to OpenAI and MosaicML on embeddings alone.

40%

Cheaper embeddings

Supported models

Access the latest open-source LLMs on one platform within days of release. Easily switch between models for ultimate flexibility.

60+

Models

PRODUCTS

See what you can build with our suite of AI APIs

HOW IT WORKS

Inference step 1 - Set up a portal account

1/4

PRICING

See our pricing

Get started with Telnyx Inference today via the Mission Control Portal. View our full pricing here.

Starting at

$0.0004

inference per 1K tokens

See pricing

RESOURCES

Start building

Take a look at our helpful tools to get started

curl -i -X POST \
  https://api.telnyx.com/v2/ai/chat/completions \
  -H 'Authorization: Bearer YOUR_TELNYX_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello, World!"
      }
    ],
    "model": "meta-llama/Meta-Llama-3-8B-Instruct"
  }'

Stay up to date
We post the latest updates from our AI platform on the changelog page, so you can stay in the know.
Latest updates
Smart README creation
Create accurate READMEs using Telnyx's AI platform for seamless data management and inference.
Watch demo
Explore the library
Explore 20+ large language models ready for testing and integration into your AI projects.
Reduce RTT

Sign up and start building

FAQ

Inference in AI refers to the process by which a machine learning model applies its learned knowledge to make decisions or predictions based on new, unseen data. It's the phase where the trained model is utilized to interpret, understand, and derive conclusions from data inputs it wasn't exposed to during the training phase.

Resources, Docs, Support

Inference hub

Find tips, best practices, and guides for Inference

Inference API Quickstart | Telnyx
Introduction
Read more
AI Playground Quickstart | Telnyx
Tutorial for AI Playground Quickstart. Start building on Telnyx today.
Read more
Function Calling | Telnyx
In this tutorial, you'll learn how to connect large language models to external tools using our chat completions API. This includes:
Read more
Voice Assistant Quickstart | Telnyx
In this tutorial, you'll learn how to configure a voice assistant with Telnyx. You won't have to write a single line of code or create an account with anyone besides Telnyx. You'll be able to talk to your assistant over the phone in under five minutes.
Read more
Langchain Integration | Telnyx
Langchain
Read more
Inference APIs | Telnyx
Incorporate AI into your applications with ease.
Read more
Get available models API | Telnyx
This endpoint returns a list of Open Source and OpenAI models that are available for use.
Read more
Create a chat completion API | Telnyx
Chat with a language model. This endpoint is consistent with the OpenAI Chat Completions API and may be used with the OpenAI JS or Python SDK.
Read more
List assistants API | Telnyx
Retrieve a list of all AI Assistants configured by the user.
Read more
Transcribe speech to text API | Telnyx
Transcribe speech to text. This endpoint is consistent with the OpenAI Transcription API and may be used with the OpenAI JS or Python SDK.
Read more
Telnyx Blog | CPaas & UCaaS Resources
Find data-driven research, comprehensive guides and all things SIP trunking, voice and SMS APIs, wireless and more.
Read more
Telnyx Blog | CPaas & UCaaS Resources - Page 2
Page 2 - Find data-driven research, comprehensive guides and all things SIP trunking, voice and SMS APIs, wireless and more.
Read more
Telnyx releases new Inference product to public beta
Discover Telnyx's unified AI platform, combining storage and inference. Streamline your AI workflows, enjoy cost-effective GPUs and rapid insights.
Read more
How to use inference APIs to drive AI adoption
Inference APIs drive AI adoption by enabling real-time applications, multimodal systems, and personalized solutions with speed and scalability.
Read more
What is an inference engine? Definition and applications
Aptly named, inference engines are what make AI run. Learn what they are, how they work, and how you can use them in your AI applications.
Read more
Learn how Telnyx started building our new AI Inference tools
We built Telnyx Inference as a platform where developers can easily harness the power of AI with fast, contextualized inference.
Read more
Build your AI applications on a fast GPU network
Telnyx Inference is built on a Telnyx-owned GPU network, resulting in lower costs and accelerated time to market for AI applications.
Read more
How to leverage inference models in business and development
If you want to use AI and ML effectively, you have use inference models. Learn what they are and how they can work for your business.
Read more
The evolution of AI systems infrastructure
AI systems are changing the world. But where did these systems originate from, and where are they headed next?
Read more
What is machine learning inference?
You’ve heard of AI, but have you heard of machine learning inference? Learn what ML inference is and how you can apply it to innovate in your industry.
Read more
ElevenLabs alternative: Top platforms for scalable voice AI
Discover top ElevenLabs alternatives and why Telnyx offers a better voice AI stack with lower latency, real-time control, and LLM flexibility.
Read more
The Better ElevenLabs Alternative | Telnyx
See why Telnyx beats ElevenLabs. Get better pricing, built-in telecom stack, and full AI infrastructure control. Switch to Telnyx for better voice AI.
Read more
Get Started with Telnyx Storage & Inference Guide | Telnyx Help Center
This article provides you with a guide to setting up Telnyx Storage on your account
Read more
Getting Started | Telnyx Help Center
Get Started with a Mission Control Account. Start building on Telnyx today.
Read more
ElevateAI Proof-of-Concept Setup Guide | Telnyx Help Center
Step-by-step guide to integrate Telnyx with ElevateAI for transcription and recording.
Read more
Telnyx Storage | Telnyx Help Center
Here you will find a collection of FAQs and guides on all things Telnyx Storage.
Read more
Specifications | Telnyx Help Center
Telnyx's technical specs: Whitelisting, SIP protocols, STUN server, DTMF, and more.
Read more
Voice API Essentials | Telnyx Help Center
In this collection you will find helpful links that explain the mission control portal features and troubleshooting tips.
Read more
AI and Machine Learning [Use Cases]
See how AI and machine learning can enhance your projects. Explore Telnyx use cases today.
Read more
Conversational AI [Integration] Use Cases
Boost engagement and efficiency through Telnyx's Conversational AI. Start integrating now.
Read more

Instant embeddings

Function Calling

Autoscaling

JSON mode

Model flexibility

Low latency

BENEFITS

Scale confidently

GPUs

Cost-effective

Cheaper embeddings

Supported models

Models

See what you can build with our suite of AI APIs

1.Set up a portal account

2.Create a data bucket

3.Vectorize data

4.Generate a response

See our pricing

Stay up to date

Smart README creation

Explore the library

Sign up and start building

What is inference?

What are embeddings?

Are GPUs necessary for AI models to run?

How can Telnyx help me incorporate AI into user-facing applications?

Inference API Quickstart | Telnyx

AI Playground Quickstart | Telnyx

Function Calling | Telnyx

Voice Assistant Quickstart | Telnyx

Langchain Integration | Telnyx

Inference APIs | Telnyx

Get available models API | Telnyx

Create a chat completion API | Telnyx

List assistants API | Telnyx

Transcribe speech to text API | Telnyx

Telnyx Blog | CPaas & UCaaS Resources

Telnyx Blog | CPaas & UCaaS Resources - Page 2

Telnyx releases new Inference product to public beta

How to use inference APIs to drive AI adoption

What is an inference engine? Definition and applications

Learn how Telnyx started building our new AI Inference tools

Build your AI applications on a fast GPU network

How to leverage inference models in business and development

The evolution of AI systems infrastructure

What is machine learning inference?

ElevenLabs alternative: Top platforms for scalable voice AI

The Better ElevenLabs Alternative | Telnyx

Get Started with Telnyx Storage & Inference Guide | Telnyx Help Center

Getting Started | Telnyx Help Center

ElevateAI Proof-of-Concept Setup Guide | Telnyx Help Center

Telnyx Storage | Telnyx Help Center

Specifications | Telnyx Help Center

Voice API Essentials | Telnyx Help Center

AI and Machine Learning [Use Cases]

Conversational AI [Integration] Use Cases