# Telnyx AI: Inference — Full Documentation
> Complete page content for Inference (AI section) of the Telnyx developer docs (https://developers.telnyx.com).
> Root index: https://developers.telnyx.com/llms.txt · Lightweight index for this subsection: https://telnyx.com/llms/ai/inference.txt

## Models

### Models

> Source: https://developers.telnyx.com/docs/inference/models.md

Open-source LLMs hosted on Telnyx GPU infrastructure. All models accessible via the [Chat Completions API](/api-reference/openai-chat/create-a-chat-completion-openai-compatible) (OpenAI-compatible).

## Chat Models

| Model ID | Parameters | Context Length | Best For |
|----------|:----------:|:--------------:|----------|
| `moonshotai/Kimi-K2.6` | 1.0T | 256K | Highest intelligence, voice AI (with thinking disabled) **(Recommended)** |
| `zai-org/GLM-5.1-FP8` | 753.9B | 202K | Most efficient reasoning, function calling |
| `MiniMaxAI/MiniMax-M2.7` | — | 2M | Cheapest while maintaining high intelligence |

## Embedding Models

| Model ID | Dimensions | Best For |
|----------|:----------:|----------|
| `thenlper/gte-large` | 1024 | Text embeddings |

---

### Regions & Availability

> Source: https://developers.telnyx.com/docs/inference/models/regions.md

GPU infrastructure across four regions on three continents. Requests are processed on a best-effort basis in the region nearest the ingress domain you call.

## Current Regions

| Region | Location |
|--------|----------|
| US East | Atlanta |
| US West | Denver |
| EU | Paris |
| Asia-Pacific | Sydney |

## Routing

Inference **processing is latency-based and best-effort, influenced by the ingress domain you call**, not by your account's data locality setting:

| Ingress domain | Preferred region |
|----------------|--------|
| `api.telnyx.com` | US |
| `api.telnyx.eu` | EU |
| `api.telnyx.com.au` | APAC |

Calling a regional ingress domain (for example, `api.telnyx.eu`) directs requests to the nearest GPU region for that domain under normal conditions. Telnyx does **not guarantee** processing location: during failover or capacity events, requests are processed at the next-lowest-latency region rather than failing. A region-selection API parameter is on the roadmap.

## Data Residency

Processing location and storage location are controlled separately:

- **Processing in transit** is latency-based and best-effort, influenced by the ingress domain you call (see [Routing](#routing) above). It is not a guaranteed processing location.
- **Storage at rest** depends on the endpoint. The **chat completions** endpoint does not store request or response data. The **responses** endpoint stores conversations, and that storage is governed by your [Data Locality](/docs/account-setup/data-locality) setting.

For a full cross-product breakdown (including Voice AI Assistants), see the [Data Residency & Compliance FAQ](/docs/inference/data-residency).

## Roadmap

- Region selection API parameter
- Per-region model status and latency metrics
- Edge inference for sub-50ms response times

---

### Pricing

> Source: https://developers.telnyx.com/docs/inference/models/pricing.md

Pay-per-token. No minimums, no commitments.

For current per-model pricing, see [telnyx.com/pricing/inference-api](https://telnyx.com/pricing/inference-api).

| Category | Basis | Notes |
|----------|-------|-------|
| Text generation | Per 1M tokens (input + output) | Input and output priced separately; cached input tokens at a discount |
| Audio transcription | Per second of audio | Varies by model |
| Text-to-speech | Per 1M characters | Varies by voice/model |
| Embeddings | Per 1M tokens | Single rate |

---

### Data Residency & Compliance FAQ

> Source: https://developers.telnyx.com/docs/inference/data-residency.md

This page answers common customer questions about where data is processed and stored, which providers are used, retention, and model training.

Telnyx AI spans two products that handle **processing location** differently:

- **Inference API** — chat completions, the responses endpoint, and related model APIs.
- **Voice AI Assistants** — telephony-based conversational agents.

The single most important thing to understand:

Telnyx offers **hard controls for data at rest** (storage location and retention), but **does not offer hard controls for processing location**. Where a request is *processed* is **latency-based and best-effort** — it is influenced, not guaranteed. Under failover or capacity events, processing shifts to the next-best region rather than failing.

This FAQ describes how the product works technically. It is not a legal commitment. Contractual terms — including data processing terms, training opt-outs, and any region commitments — are handled through your account team, a Data Processing Agreement (DPA), and applicable Telnyx terms. For written confirmations, [contact support](mailto:support@telnyx.com) or your Telnyx account manager.

---

## Processing vs. storage: the key distinction

| | Processing in transit (best-effort, not guaranteed) | Storage at rest (hard control) |
| --- | --- | --- |
| **Inference API** | Latency-based, influenced by the **ingress domain** you call (`api.telnyx.com`, `api.telnyx.eu`, `api.telnyx.com.au`). Not tied to data locality. Not guaranteed. | Chat completions: **not stored**. Responses endpoint (stores conversations): governed by your **data locality** setting. |
| **Voice AI Assistants** | Influenced by the **anchorsite** on the assistant's TeXML application. Best-effort, not guaranteed. | Governed by your **data locality** flag, plus the **data-retention** setting for conversation content. |

[Data Locality](/docs/account-setup/data-locality) governs **storage at rest** for covered data types. Neither the data locality flag nor the anchorsite is a hard guarantee of where live **processing** happens.

---

## Inference API

### Where is Inference processing performed?

Inference **processing in transit is latency-based and best-effort, influenced by the ingress domain you call**, not by your data locality setting:

| Ingress domain | Preferred region |
| --- | --- |
| `api.telnyx.com` | US |
| `api.telnyx.eu` | EU |
| `api.telnyx.com.au` | APAC |

Calling a regional ingress domain (for example, `api.telnyx.eu`) directs requests to the nearest GPU region for that domain on a best-effort basis. Telnyx does **not guarantee** the processing location: during failover or capacity events, requests are processed at the next-lowest-latency region rather than failing. See [Inference Regions & Availability](/docs/inference/models/regions) for the underlying GPU regions.

### Does Inference store my data?

It depends on the endpoint:

- **Chat completions endpoint** — **does not store** request or response data.
- **Responses endpoint** — **stores conversations**. For stored data, your [Data Locality](/docs/account-setup/data-locality) setting dictates the storage region.

### Can Inference traffic be pinned to a specific region?

Not as a hard guarantee. Routing is **latency-based and best-effort**: calling a regional ingress domain (for example, `api.telnyx.eu`) directs requests to that region under normal conditions, but Telnyx does **not guarantee** processing location. During failover or capacity events, requests are processed at the next-lowest-latency region rather than failing. If you have a strict compliance requirement for guaranteed processing location, [contact support](mailto:support@telnyx.com) to discuss what is possible for your account.

---

## Voice AI Assistants

### Where is Voice AI Assistant processing performed?

For Voice AI Assistants, processing location is **influenced by the anchorsite** configured on the assistant's **TeXML application** — not by the data locality flag. Setting the anchorsite (for example, Frankfurt for the EU) directs media/processing to that region under normal conditions.

The anchorsite is **best-effort, not a hard control**. Telnyx does not guarantee processing location: under failover or capacity events, processing can shift to another region rather than failing the call.

### Where is Voice AI Assistant data stored?

**Storage location at rest is a hard control, governed by your [Data Locality](/docs/account-setup/data-locality) flag.** Retention of conversation content is further controlled by the **data-retention** setting (see [Data retention](#data-retention-and-model-training)). Recording storage can also be directed to your own storage destination, which Telnyx respects.

### Are call audio, transcripts, prompts, responses, summaries, or recordings ever handled outside the configured region?

- **Processing** location is influenced by the assistant's anchorsite, but is **best-effort and not guaranteed**.
- **Storage at rest** is a hard control, following your **data locality** flag. Recordings can be directed to a customer-controlled storage destination, which Telnyx respects.

Telnyx does **not** contractually guarantee blanket "EU-only processing." Processing controls are best-effort only, "processing" is defined very broadly, and some components — for example, third-party STT/TTS providers, or operational/security/fraud handling — may involve activity outside a single region. The specifics depend on the providers and features you enable. Confirm written data commitments with your account team and DPA before making representations to your own customers.

### Example: EU-focused Voice AI setup

A typical EU-oriented configuration combines:

- **Data locality:** EU (Germany) — a hard control over storage at rest
- **Anchorsite on the TeXML app:** an EU site (for example, Frankfurt) — best-effort influence over media/processing location
- **Voice API endpoint:** `api.telnyx.eu`
- **SIP endpoint:** `sip.telnyx.eu`

This keeps storage in the EU (a hard control via data locality) and steers processing toward the EU (best-effort via the anchorsite). STT/TTS provider choice also matters — some providers are self-hosted by Telnyx and some are third parties (see below).

---

## STT, TTS, and LLM providers (Voice AI)

For Voice AI Assistants, the STT, TTS, and LLM providers in use depend on the models and voices **you select**. Some are **self-hosted by Telnyx** (run on Telnyx-operated infrastructure); others are **third-party** services that Telnyx integrates with. This distinction matters for compliance: self-hosted models keep that processing step within Telnyx infrastructure, whereas third-party models route that step to the vendor.

Hosting (self-hosted vs. third-party) is about *which infrastructure* performs the step, not a guarantee of *region*. Processing region is best-effort for all providers — see the [processing vs. storage](#processing-vs-storage-the-key-distinction) note above.

### Speech-to-text (STT)

| Model | Provider | Hosting |
| --- | --- | --- |
| `deepgram/flux` | Deepgram | Self-hosted by Telnyx |
| `deepgram/nova-3` | Deepgram | Self-hosted by Telnyx |
| `deepgram/nova-2` | Deepgram | Self-hosted by Telnyx |
| `assemblyai/universal-streaming` | AssemblyAI | Self-hosted by Telnyx |
| `speechmatics/standard` | Speechmatics | Self-hosted by Telnyx |
| `distil-whisper/distil-large-v2` | Whisper (English-only) | Self-hosted by Telnyx |
| `azure/fast` | Azure | Third-party |
| `soniox/stt-rt-v4` | Soniox | Third-party |
| `xai/grok-stt` | xAI | Third-party |

### Text-to-speech (TTS)

TTS is delivered through Telnyx's TTS gateway, which integrates multiple providers. The provider depends on the voice you select:

| Provider | Hosting |
| --- | --- |
| Telnyx (in-house voices, including Telnyx Ultra) | Self-hosted by Telnyx |
| Rime | Self-hosted by Telnyx |
| Resemble | Self-hosted by Telnyx |
| ElevenLabs | Third-party |
| AWS | Third-party |
| Azure | Third-party |
| Minimax | Third-party |
| Inworld | Third-party |
| xAI | Third-party |

See [Text to Speech voices](/docs/tts-stt/tts-available-voices) for the current voice catalog.

### Large language model (LLM)

The assistant's model is served through Telnyx's inference platform. The model in use is the one you configure on the assistant.

**Self-hosted by Telnyx** (open models served on Telnyx infrastructure) include the **Qwen** and **Moonshot (Kimi)** model families — for example, `Qwen/Qwen3-235B-A22B`, `moonshotai/Kimi-K2.5`, and `moonshotai/Kimi-K2.6`.

**Third-party** models — including those from **Anthropic** (Claude), **OpenAI** (GPT), and **Google** (Gemini) — are **not self-hosted**. When you select one of these, the prompt is sent to that external provider to generate the response.

The available models evolve over time — for the current catalog and which models are recommended for assistants, see [Models](/docs/inference/models).

If data residency or third-party data sharing is a concern, choose a self-hosted model (a Qwen or Moonshot/Kimi model) to keep prompt and response generation on Telnyx infrastructure. Region remains best-effort even for self-hosted models.

### Can STT, TTS, or LLM processing be restricted to the EU?

There is **no hard guarantee** of processing region for any provider — processing is best-effort. In addition:

- **Self-hosted** providers keep that processing step on Telnyx infrastructure, but region remains best-effort.
- **Third-party** providers route that step to the vendor, whose own region behavior applies.

If you need STT, TTS, or LLM processing constrained to a specific region, [contact support](mailto:support@telnyx.com) so we can advise which self-hosted provider/model combinations best fit your requirement. Hard region guarantees are not offered for processing.

---

## Recordings

### Are call recordings disabled by default?

No — for Voice AI Assistants, **call recordings are enabled by default**, and you can turn them off. When recordings are enabled, the recording is stored as Media Storage, which is subject to your [Data Locality](/docs/account-setup/data-locality) setting. Disable recording on the assistant (or per call) if you do not want recordings retained.

---

## Data retention and model training

### What does the data-retention setting control?

Voice AI Assistants expose a **data-retention** privacy setting (`privacy_settings.data_retention`). It is **enabled by default**. When you disable it, the assistant stops persisting conversation **content** while continuing the minimum processing needed to run and bill the call.

When `data_retention` is **disabled**, conversation content is **not retained**:

| Item | Behavior when retention is off |
| --- | --- |
| Conversation messages / transcripts | Not persisted to the conversations store |
| Insights | Not retained. An insight may be computed transiently in-memory to support live conversation behavior, but the conversation and its insights are not stored |
| Transcript and assistant answer in observability logs | Not retained; replaced with placeholders (for example, `[transcript not available]` / `[answer not available]`) |
| LLM request/response content logging | Disabled |
| TTS cache | Disabled, so synthesized audio is not cached |

A limited set of records is still **retained** even when conversation retention is off, because they are required to operate and bill the service:

| Item | Behavior when retention is off |
| --- | --- |
| Latency / timing metrics | Retained (timing only, no conversation content) |
| Billing, security, and fraud-prevention records | Retained as required for legitimate business and compliance purposes |

The data-retention flag governs retention of **conversation content** for Voice AI Assistants. Disabling it stops persistence of conversation content and insights; it does not change where data that *is* retained lives — storage region is controlled by [Data Locality](/docs/account-setup/data-locality). Recordings are governed separately by the recording setting (see [Recordings](#recordings) above). For a guarantee tailored to your exact configuration (audio, tool inputs/outputs, memory, observability traces, and third-party provider logs), confirm in writing with your account team and DPA.

### Can a customer opt out of model improvement / training / evaluation?

Customer data handling for model training is governed by Telnyx's applicable terms and DPA. If you require an opt-out from model improvement, training, or evaluation — for both input and output data, and covering Telnyx and any third-party AI providers in your configuration — [contact your account team](mailto:support@telnyx.com) to confirm the governing terms and document the opt-out.

---

## Usage reporting and billing

### Can usage be broken down by assistant, phone number, or metadata/tag?

Usage and conversation data can be attributed using identifiers such as the assistant, the associated phone number, and metadata. For subscriber-level or per-tag billing breakdowns, [contact support](mailto:support@telnyx.com) to confirm which dimensions are available and how to structure metadata/tags for clean attribution. See [Agent Observability](/docs/inference/ai-assistants/agent-observability) and [Session Analysis](/docs/reporting/session-analysis).

---

## Related resources

- [Data Locality](/docs/account-setup/data-locality)
- [Inference Regions & Availability](/docs/inference/models/regions)
- [Models](/docs/inference/models)
- [Transcription Settings](/docs/inference/ai-assistants/transcription-settings)
- [Text to Speech voices](/docs/tts-stt/tts-available-voices)
- [Agent Observability](/docs/inference/ai-assistants/agent-observability)

---

## Integrations

### Integrations

> Source: https://developers.telnyx.com/docs/inference/integrations.md

OpenAI-compatible API. Swap `base_url` and `api_key` in any framework that supports OpenAI.

## Quick Reference

| Framework | Swap Method | Guide |
|-----------|-------------|-------|
| OpenAI SDK | `base_url` in client constructor | [OpenAI Migration](/docs/inference/openai) |
| LangChain | `base_url` in `ChatOpenAI` | [LangChain](/docs/inference/langchain-integration) |
| LlamaIndex | `api_base` in `OpenAILike` | [LlamaIndex](/docs/inference/llama-index) |
| CrewAI | `OPENAI_BASE_URL` env var or `base_url` in LLM | [CrewAI](/docs/inference/crewai) |
| LiveKit | Telnyx as LLM provider | [LiveKit](/docs/inference/livekit) |

## Environment Variables

Route all OpenAI SDK calls through Telnyx with no code changes:

```shell
export OPENAI_API_KEY=your_telnyx_api_key
export OPENAI_BASE_URL=https://api.telnyx.com/v2/ai/openai
```

---

### OpenAI Migration

> Source: https://developers.telnyx.com/docs/inference/openai.md

Swap two environment variables and change the model name. That's it.

```shell
export OPENAI_BASE_URL='https://api.telnyx.com/v2/ai/openai'
export OPENAI_API_KEY='KEY***'
```

```python
from openai import OpenAI

client = OpenAI()  # picks up env vars
chat_completion = client.chat.completions.create(
    model="zai-org/GLM-5.1-FP8",
    messages=[{"role": "user", "content": "Tell me about Telnyx"}],
    temperature=0.0,
    stream=True,
)
```

Or pass explicitly:

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("TELNYX_API_KEY"),
    base_url="https://api.telnyx.com/v2/ai/openai",
)
chat_completion = client.chat.completions.create(
    model="zai-org/GLM-5.1-FP8",
    messages=[{"role": "user", "content": "Tell me about Telnyx"}],
    temperature=0.0,
    stream=True,
)
```

## Reasoning models

Reasoning models such as `zai-org/GLM-5.1-FP8` add a `reasoning_content` field alongside
the usual `content`. It holds the model's chain-of-thought and appears on `message`
(non-streaming) or `delta` (streaming). Read it the same way you read `content`:

```python
chat_completion = client.chat.completions.create(
    model="zai-org/GLM-5.1-FP8",
    messages=[{"role": "user", "content": "Tell me about Telnyx"}],
)
message = chat_completion.choices[0].message

# Reasoning models populate reasoning_content; other models leave it None.
if getattr(message, "reasoning_content", None):
    print("reasoning:", message.reasoning_content)
print("answer:", message.content)
```

## Chat Completions Compatibility

| Parameter | Telnyx | OpenAI |
|-----------|:------:|:------:|
| `messages` | ✅ | ✅ |
| `model` | ✅ | ✅ |
| `stream` | ✅ | ✅ |
| `max_tokens` | ✅ | ✅ |
| `temperature` | ✅ | ✅ |
| `top_p` | ✅ | ✅ |
| `frequency_penalty` | ✅ | ✅ |
| `presence_penalty` | ✅ | ✅ |
| `n` | ✅ | ✅ |
| `stop` | ✅ | ✅ |
| `logit_bias` | ✅ | ✅ |
| `logprobs` | ✅ | ✅ |
| `top_logprobs` | ✅ | ✅ |
| `seed` | ✅ | ✅ |
| `response_format` | ✅ | ✅ |
| `tool_choice` | ✅ | ✅ |
| `tools` | ✅ | ✅ |
| `function` | ✅ | ✅ |
| `retrieval` | ✅ | ❌ |
| `guided_json` | ✅ | ❌ |
| `guided_regex` | ✅ | ❌ |
| `guided_choice` | ✅ | ❌ |
| `min_p` | ✅ | ❌ |
| `use_beam_search` | ✅ | ❌ |
| `best_of` | ✅ | ❌ |
| `length_penalty` | ✅ | ❌ |
| `early_stopping` | ✅ | ❌ |
| `user` | ❌ | ✅ |

## Transcriptions Compatibility

| Parameter | Telnyx | OpenAI |
|-----------|:------:|:------:|
| `file` | ✅ | ✅ |
| `model` | ✅ | ✅ |
| `response_format` | ✅ | ✅ |
| `timestamp_granularities[]` → `segment` | ✅ | ✅ |
| `timestamp_granularities[]` → `word` | ❌ | ✅ |
| `language` | ❌ | ✅ |
| `prompt` | ❌ | ✅ |
| `temperature` | ❌ | ✅ |

---

### LangChain

> Source: https://developers.telnyx.com/docs/inference/langchain-integration.md

OpenAI-compatible. Use `ChatOpenAI` with a `base_url` swap.

## Setup

```shell
pip install langchain-openai
```

## Usage

```python
import os
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.telnyx.com/v2/ai/openai",
    api_key=os.getenv("TELNYX_API_KEY"),
    model="zai-org/GLM-5.1-FP8",
)

for chunk in llm.stream("Help me plan my vacation"):
    print(chunk.content, end="", flush=True)
```

## Function Calling

```python
import os
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"The weather in {location} is sunny and 72°F."

llm_with_tools = ChatOpenAI(
    base_url="https://api.telnyx.com/v2/ai/openai",
    api_key=os.getenv("TELNYX_API_KEY"),
    model="zai-org/GLM-5.1-FP8",
).bind_tools([get_weather])

result = llm_with_tools.invoke("What's the weather in Chicago?")
print(result.tool_calls)
```

## Streaming

```python
from langchain_core.messages import HumanMessage

messages = [HumanMessage(content="Explain quantum computing in 3 sentences")]
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
```

---

### LlamaIndex

> Source: https://developers.telnyx.com/docs/inference/llama-index.md

OpenAI-compatible. Use `OpenAILike` with `api_base` swap.

## Setup

```shell
pip install llama-index-core llama-index-llms-openai-like
```

## Usage

```python
import os
from llama_index.llms.openai_like import OpenAILike
from llama_index.core.llms import ChatMessage

llm = OpenAILike(
    api_base="https://api.telnyx.com/v2/ai/openai",
    api_key=os.getenv("TELNYX_API_KEY"),
    model="zai-org/GLM-5.1-FP8",
    is_chat_model=True,
)

chat = llm.stream_chat([ChatMessage(role="user", content="Help me plan my vacation")])
for chunk in chat:
    print(chunk.delta, end="")
```

## RAG with Embeddings

Combine with [Telnyx Embeddings](/docs/inference/embeddings) for retrieval-augmented generation. See the [Embeddings guide](/docs/inference/embeddings) for document upload and indexing.

---

### CrewAI

> Source: https://developers.telnyx.com/docs/inference/crewai.md

OpenAI-compatible. Use as LLM backend for CrewAI agents.

## Setup

```shell
pip install crewai
```

## Usage

Set environment variables for global routing:

```shell
export TELNYX_API_KEY=your_telnyx_api_key
export OPENAI_BASE_URL=https://api.telnyx.com/v2/ai/openai
```

Or configure per-agent:

```python
import os
from crewai import Agent, Task, Crew, LLM

llm = LLM(
    model="zai-org/GLM-5.1-FP8",
    base_url="https://api.telnyx.com/v2/ai/openai",
    api_key=os.getenv("TELNYX_API_KEY"),
)

researcher = Agent(
    role="Research Analyst",
    goal="Find and analyze information",
    backstory="You are an experienced research analyst.",
    llm=llm,
)

writer = Agent(
    role="Technical Writer",
    goal="Write clear, accurate reports",
    backstory="You are a skilled technical writer.",
    llm=llm,
)

research_task = Task(
    description="Research the latest trends in AI infrastructure",
    agent=researcher,
)

write_task = Task(
    description="Write a summary report based on the research findings",
    agent=writer,
)

crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = crew.kickoff()
print(result)
```

## Tool Calling

```python
from crewai.tools import tool

@tool("Search the web")
def search_web(query: str) -> str:
    """Search the web for information."""
    return f"Results for: {query}"

researcher = Agent(
    role="Research Analyst",
    goal="Find and analyze information",
    backstory="You are an experienced research analyst.",
    llm=llm,
    tools=[search_web],
)
```

---

### LiveKit

> Source: https://developers.telnyx.com/docs/inference/livekit.md

LiveKit's [agent framework](https://docs.livekit.io/agents/overview/) lets you build real-time, programmable voice agents. Telnyx integrates with LiveKit through the OpenAI plugin for LLM inference and through `livekit-plugins-telnyx` for native STT and TTS.

## Voice assistant example

This example is based on LiveKit's [agents examples repo](https://github.com/livekit/agents/tree/main/examples), modified to use Telnyx for LLM inference.

### Set up and activate a virtual env

```bash
python -m venv venv
source venv/bin/activate
```

### Install requirements

```bash
pip install -r requirements.txt
pip install livekit-plugins-telnyx
```

### Download files

This downloads model weights for voice-activity detection:

```bash
python agent.py download-files
```

### Agent code

The following code uses Telnyx for LLM inference via `openai.LLM.with_telnyx()`, with Telnyx STT and TTS.

```python
from dotenv import load_dotenv
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, voice, llm
from livekit.plugins import openai, silero, telnyx

load_dotenv()

async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a helpful voice assistant powered by Telnyx. "
            "Keep responses short and conversational."
        ),
    )

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    session = voice.AgentSession(
        llm=openai.LLM.with_telnyx(
            model="zai-org/GLM-5.1-FP8",
        ),
        vad=silero.VAD.load(),
        stt=telnyx.STT(),
        tts=telnyx.TTS(voice="Telnyx.NaturalHD.astra"),
        chat_ctx=initial_ctx,
    )
    session.start(ctx.room)

    await session.say("Hey, how can I help you today?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```

### Set environment variables

```bash
export TELNYX_API_KEY=
export LIVEKIT_URL=
export LIVEKIT_API_KEY=
export LIVEKIT_API_SECRET=
```

### Run the agent worker

```bash
python agent.py dev
```

### Test with a LiveKit frontend

Use the [LiveKit Agents Playground](https://agents-playground.livekit.io) to test your agent without building a frontend.

## Telnyx STT & TTS plugin

The `livekit-plugins-telnyx` package provides native Telnyx STT and TTS plugins for LiveKit agents.

```bash
pip install livekit-plugins-telnyx
```

### STT

Use `telnyx.STT()` for real-time speech-to-text via Telnyx's WebSocket streaming API:

```python
from livekit.plugins import telnyx

session = voice.AgentSession(
    stt=telnyx.STT(),
    # ... other plugins
)
```

### TTS

Use `telnyx.TTS()` for real-time text-to-speech. Pass a `voice` parameter to select a specific voice:

```python
from livekit.plugins import telnyx

session = voice.AgentSession(
    tts=telnyx.TTS(voice="Telnyx.NaturalHD.astra"),
    # ... other plugins
)
```

See [TTS available voices](/docs/tts-stt/tts-available-voices) for the full list of voice options.

## Related resources

- [LiveKit Telnyx LLM plugin docs](https://docs.livekit.io/agents/models/llm/plugins/telnyx/)
- [SIP trunk configuration for LiveKit](/docs/voice/sip-trunking/livekit-configuration-guide)
- [Inference getting started](/docs/inference)

---

## Tutorials

### Inference API

> Source: https://developers.telnyx.com/docs/inference/getting-started.md

## Prerequisites

- [Telnyx account](https://telnyx.com/sign-up)
- [API Key](https://portal.telnyx.com/#/app/auth/v2)
- Python 3.8+

Install the OpenAI SDK:

```shell
pip install openai
```

The Inference API is OpenAI-compatible. Any OpenAI SDK works with a `base_url` swap.

## Python

```python
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("TELNYX_API_KEY"),
  base_url="https://api.telnyx.com/v2/ai/openai",
)

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "Tell me about Telnyx"}
  ],
  model="zai-org/GLM-5.1-FP8",
  stream=True
)

# GLM-5.1 is a reasoning model: it streams its thinking in `reasoning_content`
# before the final answer in `content`. Print both so you can see the reasoning.
reasoning_started = False
content_started = False
for chunk in chat_completion:
  delta = chunk.choices[0].delta
  if getattr(delta, "reasoning_content", None):
    if not reasoning_started:
      print("--- reasoning ---")
      reasoning_started = True
    print(delta.reasoning_content, end="", flush=True)
  if delta.content:
    if not content_started:
      print("\n--- answer ---")
      content_started = True
    print(delta.content, end="", flush=True)
```

  Reasoning models such as `zai-org/GLM-5.1-FP8` return their chain-of-thought in a
  separate `reasoning_content` field (on `message` for non-streaming responses, or
  `delta` when streaming). Models without reasoning simply omit it, so the
  `getattr(..., "reasoning_content", None)` guard works for every model.

## Core Concepts

### Messages
Chat history passed to the model.

### Roles
Every message has a role: **system**, **user**, **assistant**, or **tool**.
- **system** — model behavior instructions
- **user** — end-user input
- **assistant** — model output
- **tool** — function call results. See [Function Calling](/docs/inference/functions).

### Models
[Available Models](/docs/inference/models) lists all hosted LLMs with context lengths and capabilities.

### Streaming
Server-sent events, same as OpenAI.

## What Next?

| I want to... | Go to |
|:-------------|:------|
| Build a voice assistant | [No-Code Voice Assistant](/docs/inference/ai-assistants/no-code-voice-assistant) |
| Call custom code from the model | [Function Calling](/docs/inference/functions) / [Streaming Functions](/docs/inference/streaming-functions) |
| Ground responses in documents | [Embeddings](/docs/inference/embeddings) |
| Identify themes in data | [Clusters](/docs/inference/clusters) |
| Migrate from OpenAI | [OpenAI Migration](/docs/inference/openai) |
| Browse all models | [Available Models](/docs/inference/models) |

---

### Function Calling

> Source: https://developers.telnyx.com/docs/inference/functions.md

In this tutorial, you'll learn how to connect large language models to external tools using our [chat completions API](/api-reference/openai-chat/create-a-chat-completion-openai-compatible). This includes:
- Defining a function
- Enabling the language model to choose the function
- Executing the function
- Sharing the results with the language model

## Introduction
Using the `tools` field, you can enable a language model to choose functions to call. The [chat completions API](/api-reference/openai-chat/create-a-chat-completion-openai-compatible) does not call the function itself. It will return the arguments you need to execute the function yourself.

Of the open-source language models hosted on Telnyx, `zai-org/GLM-5.1-FP8` is especially good at calling functions. While we recommend you start with this model, every model in our API supports the `tools` interface.

## Simple `get_current_weather` example

A popular toy example for function calls is the `get_current_weather` example.

The following code defines a function and passes it to the language model via the `tools` field.

Make sure you have set the `TELNYX_API_KEY` environment variable.

```python
import os

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("TELNYX_API_KEY"),
    base_url="https://api.telnyx.com/v2/ai/openai",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use",
                    },
                },
                "required": ["location", "unit"],
            },
        }
    }
]

messages = [
    {"role": "user", "content": "How is the weather in Chicago?"}
]

chat_completion = client.chat.completions.create(
    model="zai-org/GLM-5.1-FP8",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)
print(chat_completion.choices[0].message)
```

A `tool_choice` of `auto` lets the language model decide to call a function (or not).

The options for `tool_choice` are:
- `required`: this forces the language model to choose a tool
- `none`: this forces the language model to NOT choose a tool
- `auto`: this lets the language model decide

If the language model chooses a function, the above will result in a response like this, with the `tool_calls` field populated.

```
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_c31258d2-78a8-4566-b716-e3b2a774cbdb', function=Function(arguments='{"location": "Chicago", "unit": "fahrenheit"}', name='get_current_weather'), type='function')])
```

## Defining functions programmatically
In the next example, we will implement and execute `get_current_weather`. To do this cleanly, we are first going to define a helper function `func_to_tool` that extends the `schema` function from Jeremy Howard's [A Hacker's Guide to Language Models](https://github.com/fastai/lm-hackers/blob/main/lm-hackers.ipynb).

```python
import inspect
import os
import json
from typing import Literal

from openai import OpenAI
from pydantic import create_model

def func_to_tool(f):
    kw = {
        n: (o.annotation, ... if o.default==inspect.Parameter.empty else o.default)
        for n, o in inspect.signature(f).parameters.items()
        }
    s = create_model(f.__name__, **kw).model_json_schema()
    tool_json = {
        "type": "function",
        "function": {
            "name": s["title"],
            "description": inspect.getdoc(f),
            "parameters": s
        }
    }
    return tool_json

TempUnit = Literal['celsius', 'fahrenheit']

def get_current_weather(location: str, unit: TempUnit = 'fahrenheit'):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": unit})
    elif "chicago" in location.lower():
        return json.dumps({"location": "Chicago", "temperature": "22", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

client = OpenAI(
    api_key=os.getenv("TELNYX_API_KEY"),
    base_url="https://api.telnyx.com/v2/ai/openai",
)
tools = [func_to_tool(get_current_weather)]
messages = [
    {"role": "user", "content": "How is the weather in Chicago?"}
]
chat_completion = client.chat.completions.create(
    model="zai-org/GLM-5.1-FP8",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)
print(chat_completion.choices[0].message)
```

This is now functionally equivalent to the first example, but instead of writing and maintaining the verbose JSON definition ourselves, we can generate it programmatically from the executable Python function.

## Executing functions
Ok, it's nice that the language model wants to execute the `get_current_weather`, but how do we actually do that, and incorporate the results back into the interaction?

Continuing from the `chat_completion` response in the previous example

```python
assistant_message = chat_completion.choices[0].message
tool_calls = assistant_message.tool_calls
content = assistant_message.content
if tool_calls:
    messages.append(assistant_message)
    available_functions = {"get_current_weather": get_current_weather}
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(**function_args)
        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )
    second_chat_completion = client.chat.completions.create(
        model="zai-org/GLM-5.1-FP8",
        messages=messages,
    )
    print(second_chat_completion.choices[0].message.content)
else:
    print(content)
```

Now, we will get our answer from the language model, incorporating the output from the function call.

```
It's 22°F in Chicago.
```

---

### Streaming and Parallel Calls

> Source: https://developers.telnyx.com/docs/inference/streaming-functions.md

In the [previous tutorial](/docs/inference/functions), we learned the basics for defining and executing functions using our [chat completions API](/api-reference/openai-chat/create-a-chat-completion-openai-compatible).

In this tutorial, we will introduce more advanced use cases:
- Streaming function calls
- Passing multiple functions
- Executing function calls in parallel

For low-latency contexts, streaming and parallel calls are especially helpful.

## Defining our functions

First, we will define two functions we want to execute in parallel: `sleep` and `dream`.

Our goal is to use the `dream` function to make an API call to the Telnyx chat completions endpoint while we `sleep`.

We will also re-use the `func_to_tool` helped function we defined in the [previous tutorial](/docs/inference/functions) to easily convert between our Python functions and the JSON we need to pass to the `tools` field for our chat completions API.

Make sure you have set the `TELNYX_API_KEY` environment variable

```python
import asyncio
import inspect
import json
import os
from openai import AsyncOpenAI
from pydantic import create_model

# Configuration
API_KEY = os.getenv("TELNYX_API_KEY")
BASE_URL = "https://api.telnyx.com/v2/ai/openai"
MODEL = "zai-org/GLM-5.1-FP8"

client = AsyncOpenAI(api_key=API_KEY, base_url=BASE_URL)

async def sleep(seconds: int):
    """Sleep for a given number of seconds."""
    await asyncio.sleep(seconds)
    return f"I slept for {seconds} seconds!"

async def dream(subject: str):
    """Dream about a given subject."""
    chat_completion = await client.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "user",
                "content": f"BRIEFLY (one sentence max) describe a dream about {subject}"
            }
        ]
    )
    return chat_completion.choices[0].message.content

def func_to_tool(f):
    """Convert a function to a tool JSON schema."""
    kw = {
        n: (o.annotation, ... if o.default == inspect.Parameter.empty else o.default)
        for n, o in inspect.signature(f).parameters.items()
    }
    schema = create_model(f.__name__, **kw).model_json_schema()
    tool_json = {
        "type": "function",
        "function": {
            "name": schema["title"],
            "description": inspect.getdoc(f),
            "parameters": schema
        }
    }
    return tool_json
```

## Parsing Streaming Tools + Executing Tasks in Parallel
Next we will define a few functions to help us parse and execute tasks in parallel.

### handle_tool_calls
The `handle_tool_calls` function will iterate over streamed chunks from the chat completions endpoint. The language model may invoke multiple tool calls to be executed in parallel and will differentiate them using the `index` attribute on the chunk.

As we progress through the stream, we will build our local copy of this list of function calls in the `tool_calls` list.

The first chunk of a new tool call will contain the `name` of the function. This enables you to give early feedback to users that a function will be executed. In this example, we simply print the name of the function when it is detected.

As we build the arguments from the streamed chunks, we attempt to parse what we have built as JSON. Once we have a valid JSON object, we create an async task to be scheduled for execution (if we have not already done so).

**NB: Telnyx guarantees valid JSON is returned for tool calls, so you don't have to worry about lengthy retries or fuzzy matching.**

### execute_tasks
This function executes the tasks from the previous function and returns the results as they are completed, enabling users to receive feedback as soon as possible.

### func_wrapper
This is a trivial helper function that exposes the tool call ID and function name to `execute_tasks`

```python
async def func_wrapper(func, tool_call_id, **kwargs):
    """Wrap a function to return its ID + name when executed."""
    result = await func(**kwargs)
    return tool_call_id, func.__name__, result

async def execute_tasks(tasks):
    """Execute asynchronous tasks and collect their results."""
    results = []
    for task in asyncio.as_completed(tasks):
        tool_call_id, func_name, result = await task
        print(f"Executed {func_name}, results: {result}")
        results.append(
            {
                "tool_call_id": tool_call_id,
                "role": "tool",
                "name": func_name,
                "content": result,
            }
        )
    return results

async def handle_tool_calls(chat_completion, function_map):
    """Handle streaming tool calls from chat completion."""
    tool_calls = []
    tasks = []
    tasked_tool_ids = set()

    async for chunk in chat_completion:
        delta = chunk.choices[0].delta
        if delta and delta.tool_calls:
            # We have detected tool calls from the LLM
            tcchunklist = delta.tool_calls
            for tcchunk in tcchunklist:
                index = tcchunk.index or 0
                if len(tool_calls) <= index:
                    # Based on the index, we have a new tool call
                    tool_calls.append(
                        {
                            "id": "",
                            "type": "function",
                            "function": {
                                "name": "",
                                "arguments": ""
                            }
                        }
                    )
                tc = tool_calls[index]

                if tcchunk.id:
                    tc["id"] += tcchunk.id
                if tcchunk.function.name:
                    tc["function"]["name"] += tcchunk.function.name
                    print(f"Detected function: {tcchunk.function.name}")
                if tcchunk.function.arguments:
                    tc["function"]["arguments"] += tcchunk.function.arguments
                    try:
                        kwargs = json.loads(tc["function"]["arguments"])
                    except json.JSONDecodeError:
                        # We don't have the full arguments JSON yet
                        continue
                    else:
                        if tc["id"] not in tasked_tool_ids:
                            func_name = tc["function"]["name"]
                            print(f"Executing {func_name} with {kwargs}")
                            wrapped_func = func_wrapper(function_map[func_name], tc["id"], **kwargs)
                            task = asyncio.create_task(wrapped_func)
                            tasks.append(task)
                            tasked_tool_ids.add(tc["id"])

    return tool_calls, tasks
```

## Putting it all together

With our helper functions defined, we are ready to stream and execute multiple function calls in parallel. In this code, we:
- Ask the language model to `sleep` and `dream` at the same time
- Execute the returned tool calls in parallel
- Provide the results back to the language model and get a final response

```python
async def main():
    prompt = "Take a quick 10 second power nap and dream about Telnyx. Then write a haiku about it!"
    messages = [{"role": "user", "content": prompt}]
    print(f"Prompt: {prompt}")
    
    functions = [sleep, dream]
    function_map = {f.__name__: f for f in functions}
    tools = [func_to_tool(func) for func in functions]

    chat_completion = await client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=tools,
        tool_choice="required",
        stream=True
    )

    tool_calls, tasks = await handle_tool_calls(chat_completion, function_map)

    messages.append(
        {
            "role": "assistant",
            "tool_calls": tool_calls,
        }
    )

    task_results = await execute_tasks(tasks)
    messages.extend(task_results)

    print("Sending results back to LLM...")
    print()
    second_chat_completion = await client.chat.completions.create(
        model=MODEL,
        messages=messages,
        stream=True,
    )

    # GLM-5.1 is a reasoning model: stream reasoning_content (its thinking) and
    # content (the final answer) separately. Non-reasoning models omit the former.
    async for chunk in second_chat_completion:
        delta = chunk.choices[0].delta
        if getattr(delta, "reasoning_content", None):
            print(delta.reasoning_content, end="", flush=True)
        if delta.content:
            print(delta.content, end="", flush=True)
    print()

if __name__ == "__main__":
    asyncio.run(main())
```

The output of the print statements in this script will look something like this.

Notice that `sleep` was detected and executed first, but `dream` still returned results first.

```
Prompt: Take a quick 10 second power nap and dream about Telnyx. Then write a haiku about it!
Detected function: sleep
Executing sleep with {'seconds': 10}
Detected function: dream
Executing dream with {'subject': 'Telnyx'}
Executed dream, results: In my dream, I was walking through a futuristic cityscape where Telnyx's logo was emblazoned on skyscrapers, and I could hear the hum of millions of concurrent voice calls and messages being transmitted seamlessly through their network.
Executed sleep, results: I slept for 10 seconds!
Sending results back to LLM...

Here is a haiku about Telnyx:

Telnyx city glows
Voices whisper through the air
Connected we stand
```

---

### PR Reviewer - Github Action

> Source: https://developers.telnyx.com/docs/inference/pr-reviewer.md

## Introduction

Welcome to the PR Reviewer by Telnyx GitHub Action! This guide will teach you how to set up and use the PR Reviewer By Telnyx, which leverages open-source language models running on Telnyx GPUs to automatically review your pull requests.

## Prerequisites

- [Sign up for a free Telnyx account](https://telnyx.com/sign-up) if you haven't already.

## Setup guide

### Step 1: Obtain Your Telnyx API Key

1. Log in to your [Telnyx account](https://portal.telnyx.com/).
2. Navigate to the **API Keys** section in the Telnyx portal.
3. Click on **Create API Key**.
4. Copy the generated API key and store it in a secure location.

### Step 2: Add Your Telnyx API Key as a Secret on GitHub

1. In your GitHub repository, go to **Settings** > **Secrets and variables** > **Actions**.
2. Click on **New repository secret**.
3. Name the secret `TELNYX_API_KEY`.
4. Paste your Telnyx API key in the **Value** field and click **Add secret**.

### Step 3: Create the GitHub workflow file

To integrate the Telnyx PR Reviewer into your project, follow these steps:

1. In your repository, create a new file at `.github/workflows/review_pr.yml`.
2. Copy and paste the following configuration into the file:

   ```yaml
   name: PR Review

   on:
     pull_request:
       types: [opened, synchronize, reopened]

   permissions:
     pull-requests: write

   jobs:
     review:
       runs-on: ubuntu-latest

       steps:
         - name: PR Review
           uses: team-telnyx/reviewpr@main
           with:
             telnyx_api_key: ${{ secrets.TELNYX_API_KEY }}
             model_name: "zai-org/GLM-5.1-FP8"
   ```

3. Commit the file to your repository.

### Step 4: Optional Configuration

The `model_name` parameter in the workflow file is optional. If omitted, the action will use a default language model. If you wish to specify a different model, replace `'meta-llama/Meta-Llama-3.1-8B-Instruct'` with your desired model from the Telnyx [LLM Library](https://telnyx.com/products/llm-library).

## Core Concepts

### GitHub Actions

GitHub Actions automate workflows directly in your GitHub repository. In this case, the PR Reviewer By Telnyx is triggered by pull request events, such as when a PR is opened or updated.

### Telnyx Inference API

The PR Reviewer By Telnyx uses the Telnyx Inference API to analyze and review the content of pull requests. This API allows interaction with large language models (LLMs) hosted on Telnyx infrastructure.

### Model Selection

Your choice of LLM will affect the quality and behavior of the reviews. You can experiment with different models from the Telnyx [LLM Library](https://telnyx.com/products/llm-library) to find the best fit for your project.

### Automatic PR Reviews

Once configured, the PR Reviewer By Telnyx automatically generates a review for every pull request based on the content, providing suggestions or feedback powered by the chosen language model.

## Not sure how to get started?

| I want to...                    | Relevant Tutorial                                                  |
| :------------------------------ | :----------------------------------------------------------------- |
| Learn more about GitHub Actions | [GitHub Actions Documentation](https://docs.github.com/en/actions) |
| Explore more Telnyx models      | [Telnyx LLM Library](https://telnyx.com/products/llm-library)      |

## Additional references

- Dive into our [Telnyx Inference API documentation](https://developers.telnyx.com/docs/inference)
- Explore our full [API reference](/api-reference/openai-chat/create-a-chat-completion-openai-compatible)
- Review our [OpenAI Compatibility Matrix](/docs/inference/openai)
- Check out our [pricing page](https://telnyx.com/pricing/inference-api)

---

### AI SMS Outfit Recommender with OpenMeteo

> Source: https://developers.telnyx.com/docs/inference/ai-outfit-recommender.md

Today we will be making a fun script to text us a nice recommendation for outfits based on the weather every morning. It will look something like this at the end:

![Weather Recommendation Screenshot](/img/telnyx-weather-sms-rec.jpg)

This project has 3 main components:
1. Check the weather using the free OpenMeteo API
2. Pass the weather info to Telnyx Inference using the model of our choice
3. Send the recommendation to the user using Telnyx SMS

Let's get started!

## Checking the weather with OpenMeteo
[OpenMeteo](https://open-meteo.com/) is a great free API that allows you to retrieve the forecast for the current day. They have a ton of options for what you can retrieve, but for this demo, we will stick with just the temperature and weather code (although feel free to experiment with other things like humidity!).

The following functions can be used to retrieve a weather_description that we can feed into our Telnyx Inference model:

```python
def get_weather(latitude, longitude):
    url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,weathercode&temperature_unit=fahrenheit&timezone=auto"
    response = requests.get(url)
    data = response.json()

    if response.status_code == 200:
        current = data["current"]
        temperature = current["temperature_2m"]
        weathercode = current["weathercode"]
        weather_description = get_weather_description(weathercode)

        return f"Temperature: {temperature}°F, {weather_description}"
    else:
        return "Failed to fetch weather data"

def get_weather_description(code):
    weather_codes = {
        0: "Clear sky",
        1: "Mainly clear",
        2: "Partly cloudy",
        3: "Overcast",
        45: "Fog",
        48: "Depositing rime fog",
        51: "Light drizzle",
        53: "Moderate drizzle",
        55: "Dense drizzle",
        61: "Slight rain",
        63: "Moderate rain",
        65: "Heavy rain",
        71: "Slight snow fall",
        73: "Moderate snow fall",
        75: "Heavy snow fall",
        77: "Snow grains",
        80: "Slight rain showers",
        81: "Moderate rain showers",
        82: "Violent rain showers",
        85: "Slight snow showers",
        86: "Heavy snow showers",
        95: "Thunderstorm",
        96: "Thunderstorm with slight hail",
        99: "Thunderstorm with heavy hail",
    }
    return weather_codes.get(code, "Unknown")

```
It looks like a lot of code, but the majority is just converting the weather codes to a nicely formatted string that the model will be able to understand. Notice also that the longitude and latitude are required to skip the need for an API key, so we will use Chicago's latitude and longitude, 41.9 and 87.6 for this example.

## Getting our recommendation text from Telnyx Inference

Now that we have the current weather, let's make a call to Telnyx Inference to generate a text to send to our user. There are [many state-of-the-art open source models available through the Telnyx API](https://telnyx.com/products/llm-library), so for this one we will select [GLM-5.1-FP8 from Zhipu AI](https://developers.telnyx.com/docs/inference/models).

Let's write a function to retrieve a good weather recommendation from Telnyx Inference. We will just use the `requests` library to not add an additional pip requirement, but the Telnyx LLM API is also compatible with the OpenAI Python and JS SDKs, see the [OpenAI Migration Guide Here](https://developers.telnyx.com/docs/inference/openai).

```python
def get_clothing_recommendation(weather_data):
    url = "https://api.telnyx.com/v2/ai/openai/chat/completions"
    payload = json.dumps(
        {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a helpful assistant that texts me every morning with a brief outfit recommendations based on the weather. Be friendly and brief.",
                },
                {
                    "role": "user",
                    "content": f"The weather today is: {weather_data}. What should I wear?",
                },
            ],
            "model": "zai-org/GLM-5.1-FP8",
            "max_tokens": 100,
        }
    )
    headers = {
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Authorization": f"Bearer {os.getenv('TELNYX_API_KEY')}",
    }
    response = requests.post(url, headers=headers, data=payload)
    return response.json()["choices"][0]["message"]["content"]
```

Make sure your `TELNYX_API_KEY` is set in your environment variables so that they can be loaded with `os.getenv('TELNYX_API_KEY')`. The API spec for chat completions can be found here if you would prefer to use HTTP requests instead of the OpenAI client or want to play around with some of the LLM parameters offered.

The output of this function will be a string with our recommendation, for example:

"Good morning. Perfect day ahead. Why not try a light, pastel-colored short-sleeved shirt, paired with some beige or light-gray shorts? Add some loafers or sneakers, and you're all set for a sunny day. Have a great one!"

## Sending our text to the user

Great! Now that we have our weather and text recommendation, we can send the text to the user. Sending a message with Telnyx SMS is easy, follow the [tutorial here if you have not set up a Telnyx number yet](https://developers.telnyx.com/docs/messaging/messages/send-message). We can use the following snippet to send a text using Telnyx:

```python
import telnyx
telnyx.api_key = os.getenv("TELNYX_API_KEY")

def send_sms(to_number, message):
    return telnyx.Message.create(
        from_=os.getenv("TELNYX_PHONE_NUMBER"),
        to=to_number,
        text=message,
    )
```

## Putting it all together

Now that we have all the pieces in place, let's run the script! We can use the following sequence to chain everything together:

```python
latitude = 40.7128  # Chicago latitude
longitude = -74.0060  # Chicago longitude

# Get weather data
print("Getting weather data...")
weather_description = get_weather(latitude, longitude)
print(f"Description received: {weather_description}")
to_number = "+1YOUR_DESTINATION_NUMBER"  # Example phone number

# Get clothing recommendation
print("Getting clothing recommendation...")
recommendation = get_clothing_recommendation(weather_description)
print(f"Recommendation received: {recommendation}")

full_text = f"{recommendation}\n\n{weather_description}"

# Send SMS
print("Sending SMS...")
res = send_sms(to_number, full_text)
print("SMS sent!")
```

And we see the following output:
```
Getting weather data...
Description received: Temperature: 81.8°F, Clear sky

Getting clothing recommendation...
Recommendation received: Good morning. Perfect day ahead. Why not try a light, pastel-colored short-sleeved shirt, paired with some beige or light-gray shorts? Add some loafers or sneakers, and you're all set for a sunny day. Have a great one!

Sending SMS...
SMS sent!
```

Great work! We now have our script to send the user weather recommendations based on the weather. To improve this, test out different weather attributes such as humidity to influence the outfit recommendations, or adjust the prompt so the model knows what you generally like to wear or what you have in your closet. You could also run this script automatically every day at a certain time using a `cronjob` or the task scheduler of your choice. Thanks for following along!

---

## API Reference (Inference)

### OpenAI Chat

- [Create a chat completion (OpenAI-compatible)](https://developers.telnyx.com/api-reference/openai-chat/create-a-chat-completion-openai-compatible.md): Chat with a language model. This endpoint is consistent with the OpenAI Chat Completions API and may be used with the OpenAI JS or Python SDK by setting the ba…
- [Get available models (OpenAI-compatible)](https://developers.telnyx.com/api-reference/openai-chat/get-available-models-openai-compatible.md): Lists every model currently available to your account on Telnyx Inference, including SOTA open-source LLMs hosted on Telnyx GPUs (for example `moonshotai/Kimi-…
- [Create an OpenAI-compatible response](https://developers.telnyx.com/api-reference/openai-chat/create-an-openai-compatible-response.md): Create a response using Telnyx's OpenAI-compatible Responses API. This endpoint is compatible with the OpenAI Responses API and may be used with the OpenAI JS…

### Fine Tuning

- [List fine tuning jobs](https://developers.telnyx.com/api-reference/fine-tuning/list-fine-tuning-jobs.md): Retrieve a list of all fine tuning jobs created by the user.
- [Create a fine tuning job](https://developers.telnyx.com/api-reference/fine-tuning/create-a-fine-tuning-job.md): Create a new fine tuning job.
- [Get a fine tuning job](https://developers.telnyx.com/api-reference/fine-tuning/get-a-fine-tuning-job.md): Retrieve a fine tuning job by `job_id`.
- [Cancel a fine tuning job](https://developers.telnyx.com/api-reference/fine-tuning/cancel-a-fine-tuning-job.md): Cancel a fine tuning job.

### OpenAI Embeddings

- [Create embeddings](https://developers.telnyx.com/api-reference/openai-embeddings/create-embeddings.md): Creates an embedding vector representing the input text. This endpoint is compatible with the OpenAI Embeddings API and may be used with the OpenAI JS or Pytho…
- [List embedding models](https://developers.telnyx.com/api-reference/openai-embeddings/list-embedding-models.md): Returns a list of available embedding models. This endpoint is compatible with the OpenAI Models API format.

### Chat

- [Summarize file content](https://developers.telnyx.com/api-reference/chat/summarize-file-content.md): Generate a summary of a file's contents.