Inference

Name: Inference (AI) APIs—Fast, flexible, cost-effective
Brand: Telnyx
Price: 0.0004 USD
Availability: InStock

ABOUT

Dedicated GPU infrastructure for fast inference

Many factors can influence how well AI models perform, including the hardware they run on. Top-tier model performance often demands substantial computational resources, creating a balancing act between cost efficiency and speed.

Our powerful network of owned GPUs delivers rapid inference for high performance without excessive costs or extended timelines. Combined with Telnyx Storage, you can easily upload your data into buckets for instant summarization and automatic embedding. Use data across proprietary and open-source models for the perfect balance of control, cost efficiency, and speed your business needs to stay ahead.

Try it out

Chat with an LLM

Utilize custom data in proprietary and open-source models, or build your own on dedicated GPU infrastructure for fast inference at low costs. Talk to an expert

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

FEATURES

Quick insights with fully-featured APIs

Confidently implement AI into your applications with dedicated infrastructure and distributed storage.

Instant embeddings
Data in AI-enabled storage buckets can be vectorized in seconds to feed LLMs for fast, contextualized inference.
Function Calling
Build smarter applications with function calling for open-source models.
Autoscaling
Count on our dedicated GPUs to handle a high volume of requests concurrently and scale automatically based on your workload to ensure optimal performance at all times.
JSON mode
Ensure your inference output conforms to a regular expression or JSON schema for specific applications.
Model flexibility
Choose the best model for your use case. We currently support models from OpenAI, Meta, and MosaicML—with more on the way.
Low latency
Go from data to inference in near-real time with the co-location of Telnyx GPUs and Storage.

BENEFITS

Scale confidently

Leverage our dedicated network of GPUs to scale your AI-powered services effortlessly.

>4K

GPUs

Cost-effective

Thanks to our dedicated infrastructure, Telnyx users can save over 40% compared to OpenAI and MosaicML on embeddings alone.

40%

Cheaper embeddings

Supported models

Access the latest open-source LLMs on one platform within days of release. Easily switch between models for ultimate flexibility.

20+

Models

Always-on support

Access our free around-the-clock support—available to every customer.

24/7

Award-winning support

PRODUCTS

See what you can build with our suite of AI APIs

Cloud Storage

Discover low-latency, cost-effective storage for your AI applications. Store data and embeddings in one palce.

LEARN MORE

Create embeddings from stored documents for use in Inference.

Embeddings API

Easily build a vector database from stored files for context-rich inference.

Learn more

Choose from hundreds of open-source LLMs in our model directory.

Model Library

Choose from a range of state-of-the-art proprietary and open-source LLMs and stay on the bleeding edge of AI.

Learn more

HOW IT WORKS

Inference step 1 - Set up a portal account

1/4

PRICING

See our pricing

Get started with Telnyx Inference today via the Mission Control Portal. View our full pricing here.

Starting at

$0.0004

inference per 1K tokens

See pricing

RESOURCES

Start building

Take a look at our helpful tools to get started

curl -i -X POST \
  https://api.telnyx.com/v2/ai/chat/completions \
  -H 'Authorization: Bearer YOUR_TELNYX_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Hello, World!"
      }
    ],
    "model": "meta-llama/Meta-Llama-3-8B-Instruct"
  }'

Stay up to date
We post the latest updates from our AI platform on the changelog page, so you can stay in the know.
Latest updates
Smart README creation
Create accurate READMEs using Telnyx's AI platform for seamless data management and inference.
Watch demo
Explore the library
Explore 20+ large language models ready for testing and integration into your AI projects.
Explore library

Sign up and start building

FAQ

Inference in AI refers to the process by which a machine learning model applies its learned knowledge to make decisions or predictions based on new, unseen data. It's the phase where the trained model is utilized to interpret, understand, and derive conclusions from data inputs it wasn't exposed to during the training phase.

Inference

Dedicated GPU infrastructure for fast inference

Chat with an LLM

Quick insights with fully-featured APIs

Instant embeddings

Function Calling

Autoscaling

JSON mode

Model flexibility

Low latency

See what you can build with our suite of AI APIs

Cloud Storage

Embeddings API

Model Library

1.Set up a portal account

2.Create a data bucket

3.Vectorize data

4.Generate a response

Start building

Stay up to date

Smart README creation

Explore the library

What is inference?

What are embeddings?

Are GPUs necessary for AI models to run?

How can Telnyx help me incorporate AI into user-facing applications?