How to wire real PSTN phone calls into GetStream's Vision Agents framework using the Telnyx media streaming plugin for inbound and outbound AI voice agents.
The Telnyx plugin for Vision Agents adds real PSTN phone calling to GetStream's open voice AI framework. AI voice agents built with Vision Agents can receive inbound calls and place outbound calls on real phone numbers using Telnyx Call Control and bidirectional Media Streaming, with PCMU, PCMA, and L16 audio conversion built in. Install it with uv add "vision-agents[telnyx]" and run the inbound or outbound example to get a phone agent live in under ten minutes.
Vision Agents is an open source Python framework from GetStream for building voice and vision AI agents. It abstracts the messy parts of realtime AI: edge transport, WebRTC negotiation, model provider selection, audio framing, and conversation state. Developers pick a transport (Stream, local, Tencent), a realtime model (Gemini, OpenAI, Inworld, Qwen, xAI, AWS Bedrock), and an optional telephony provider. The framework wires them together so an agent can speak, hear, and reason across a single call session.
The repo lives at github.com/GetStream/Vision-Agents. Plugins extend the framework with carriers, model providers, and infrastructure. The Telnyx plugin is one of two telephony plugins in the official catalog, the other being Twilio.
The plugin exposes four primitives that together cover the full PSTN bridge:
CallRegistry: tracks active calls, validation tokens, and optional async prepare tasks that pre-warm the agent and Stream call before the media WebSocket connects.TelnyxCall: dataclass for a call session with from_number, to_number, and await_prepare().MediaStream: WebSocket handler for Telnyx Media Streaming. Parses connected, start, media, stop, error, mark, and dtmf events, and exposes audio_track plus send_audio() for bidirectional audio.attach_phone_to_call: bridges Telnyx RTP audio bidirectionally to the Stream WebRTC call participant.Audio conversion helpers handle PCMU and PCMA at 8 kHz for default inbound, plus L16 at 16 kHz for bidirectional RTP. The plugin source, tests, and examples total roughly 2,700 lines across 19 files, and the PR merged into main on June 24, 2026 as PR #594.

A browser-based voice agent is fine for a hackathon. Production voice AI has different constraints.
Inbound at scale. Customer-facing voice agents need real phone numbers with E.164 routing, carrier-grade termination, and answer supervision. A user dials a 1-800 number, the call lands on a Telnyx Call Control App, the webhook fires, the agent answers. This is the same call path a contact center uses.
Outbound at scale. Outbound voice AI, appointment reminders, IVR replacement, debt recovery, AI-driven follow-up, all of it needs a programmable dial API with a real from number, a connection_id, and media streaming back to the agent. The Vision Agents plugin uses the Telnyx Call Control Dial API with stream_url set to a public WebSocket.
Co-located inference. Telnyx runs STT and TTS inference on its own GPU infrastructure. For voice AI, that means the speech pipeline sits close to the call path instead of bouncing across regions. Lower latency, more deterministic jitter, fewer cold starts. This matters because every 100 ms of voice latency shows up as conversational awkwardness.
Webhook signature verification. Telnyx signs every webhook with an Ed25519 key. The plugin's example helpers include parse_verified_telnyx_webhook for verifying the Telnyx-Ed25519-Signature header before any call is registered. This is application-level code, not a public API surface, but the pattern ships in the examples so production callers have a copy-paste path.
The Telnyx plugin is intentionally parallel to the existing Twilio plugin in Vision Agents. Same public surface (CallRegistry, MediaStream, attach_phone_to_call), same Stream edge transport underneath, same examples structure (inbound and outbound). Developers who already know the Twilio plugin can swap carriers without rewriting their agent code.
| Capability | Telnyx Plugin | Twilio Plugin |
|---|---|---|
| PSTN calling | Telnyx Call Control | Twilio Voice API |
| Media transport | Telnyx Media Streaming (WebSocket) | Twilio Media Streams (WebSocket) |
| Audio codecs | PCMU, PCMA, L16 | PCMU, L16 |
| Webhook signatures | Ed25519 (TELNYX_PUBLIC_KEY) | Twilio signature header |
| Inference proximity | STT/TTS on Telnyx GPUs | Provider-dependent |
| Outbound dial | Call Control Dial API with connection_id | Twilio REST Dial |
| Self-serve credits | Telnyx promo codes supported | Twilio trial credits |
The choice between them usually comes down to which carrier you already have a relationship with, which regions you need coverage in, and whether you want your speech inference on the same network as your call path.
Before you run the examples, you need:
TELNYX_API_KEY)TELNYX_PHONE_NUMBER)call.initiated, call.answered, and call.hangup webhooks, and its ID is the connection_id used by the outbound Dial API.TELNYX_PUBLIC_KEY) for webhook signature verification in production.STREAM_API_KEY, STREAM_API_SECRET) for the edge transport.GOOGLE_API_KEY) for the Gemini Realtime model used in the default examples.A common setup mistake is using a forwarding-only phone number connection instead of a Call Control App. The plugin explicitly validates this in the preflight checks and will tell you what is missing.
From the Vision Agents repo root, or any project that already uses the framework:
Or install the plugin as a standalone package:
Create a .env at the repo root with the variables from the prerequisites section. The examples load it automatically with python-dotenv.
Start ngrok:
Run the inbound example with the --setup-telnyx flag. The example creates a temporary Call Control App, sets the webhook URL to your ngrok tunnel, routes your Telnyx number to the app, and cleans up on shutdown:
Dial your Telnyx number from any phone. The webhook fires, the call is registered, the agent answers, and you should hear the Gemini Realtime model respond. Hang up to end the call. The example restores your previous number routing on shutdown.
Without --setup-telnyx, the example runs preflight checks against your existing Call Control App and exits with a clear error if the webhook URL does not match, the number is not routed, or the app is not active.
The outbound example starts the same FastAPI server, then places a call to a destination number you specify:
The outbound flow pre-registers a call ID in the registry, dials through the Call Control API with connection_id set to the Call Control App ID, and waits for Telnyx to connect to the WebSocket media endpoint. From there, the media handler is the same as inbound.
On restricted Telnyx accounts, verify the destination number before dialing. On unrestricted accounts, outbound works to any valid E.164 number.
Once Telnyx answers or places a call, it opens a WebSocket to your server at wss://<NGROK_URL>/telnyx/media/{call_id}/{token}. The plugin's MediaStream.accept() handles the WebSocket handshake, and stream.run() processes inbound Telnyx events until the call ends.
Inbound audio from the PSTN caller arrives as PCMU payloads in media events. The plugin converts them to PCM at 8 kHz, feeds them to the Stream edge transport, and the realtime model receives them as the agent's audio input. Outbound audio from the agent goes the other way: PCM from the Stream call back through the plugin, converted to PCMU, and sent to Telnyx as outbound RTP frames via stream_bidirectional_mode=rtp.

This bidirectional bridge is what makes the call feel like a real conversation. Both sides can interrupt, both sides hear each other in real time, and the agent can be interrupted mid-sentence by a human, which is the bar a production voice agent has to clear.
Have questions about the Telnyx plugin for Vision Agents? Join our subreddit.
Related articles