Voice AI agents don't have to interrupt humans on group calls. With Telnyx's Skip Turn, the AI tracks speakers and stays silent when not addressed.

Telnyx's Skip Turn tool gives its voice AI agents speaker awareness on multi-participant calls (calls where two or more humans share the line with the assistant). The assistant stays silent when the humans are talking to each other, then resumes when someone addresses it directly. Use cases include warm transfer to a specialist, scheduling between two humans, and live conferencing where the assistant captures the conversation.
Related articles
Most voice AI is trained on calls with two parties: one human, one assistant. The assistant responds after every human turn because that's the only useful behavior in a 1:1 call.
Drop a third person into the call and the same behavior becomes a bug. A scheduling assistant inviting a colleague to compare availability talks over both of them. A support assistant connecting a customer to a specialist cuts in mid-explanation. A sales assistant looping in a technical lead on a discovery call answers every question pointed at the engineer.
The fix is not better silence detection or longer voice activity detection (VAD) thresholds. The assistant has to understand who is speaking and who they are addressing, then choose whether to respond, defer, or stay silent. That capability is speaker awareness.
Speaker awareness has three components.
First, identifying the current speaker. Most voice stacks treat audio as a single stream attributed to whichever participant has the loudest channel. Multi-participant calls require per-participant identity tracking through the LLM context window, not just the transcription layer.
Second, identifying who they are addressing. A speaker turning to another participant uses different language than a speaker addressing the assistant. "Can you check Thursday at 3?" is ambiguous out of context. With speaker context, the assistant can tell whether "you" means another human or the assistant itself.
Third, deciding whether to respond, defer, or stay silent. The assistant needs an explicit instruction for "I am not the addressee, do not generate a response this turn." Without that, the model defaults to its training behavior, which is to respond after every turn.
Skip Turn is a tool the assistant can call when it determines that a turn is between participants and not addressed to it. The assistant takes no spoken action. It does not end the call, mute itself, or disable any other tool. It simply chooses not to speak for that turn.
Skip Turn pairs with a feature called Keyterm Boost on the Voice tab. Keyterm Boost is a transcription setting that tells the speech-to-text engine to give extra weight to specific words you supply, so they get recognized more accurately when spoken. Adding the participant names to Keyterm Boost (e.g., Telnyx, Amber, Enzo) lifts transcription accuracy for those exact words, which makes name-based addressing rules more reliable. Keyterm Boost is supported by deepgram/flux and deepgram/nova-3.
The result is a call where the assistant invites a third party via the Invite tool, confirms the join, then stays silent while the participants compare notes. When one of them turns back and says "Amber, book that for 3pm," the assistant resumes and books the meeting.
Listen to a multi-participant call: James + Enzo + Amber sample, embedded from the Multi-Participant Calls dev guide.
Most voice AI stacks treat turn-taking as a voice activity detection (VAD) problem layered on top of a 1:1 conversation model. The assistant decides when to speak based on silence thresholds, end-of-utterance detection, and barge-in heuristics. None of those signals encode who the speaker is or who they are addressing.
Public documentation across the major voice agent platforms (Vapi, Retell, Bland, ElevenLabs) shows the same shape today. Each is built around a 1:1 conversational call (one human, one assistant). Warm transfer to a human is typically supported by handing the call off and the agent leaving the line. Multi-party scenarios are handled outside the agent runtime, through raw call control APIs.
The major platforms do not describe a built-in capability for the assistant to remain on a call with two or more humans and stay silent during their cross-talk.
Telnyx ships speaker-aware turn-taking as a built-in capability, on a single network that handles call control and AI inference.
The first is warm transfer with a third party staying on the line. A support assistant connects a customer to a specialist, stays on to document the resolution, and only resumes speaking to confirm the next step. With Skip Turn the assistant defers to the specialist's explanation instead of interrupting it.
The second is scheduling between two humans. An assistant invites a colleague to compare availability. The two humans negotiate ("how about Thursday?" "no, Friday works better") while the assistant waits. When they confirm, the assistant books the meeting and reads back the time.
The third is live negotiation with the assistant as scribe. A real estate agent and a counterpart agent talk. The assistant tracks the conversation, looks up comparable sales when asked, and confirms the final terms. The back-and-forth between the two humans stays clean.
These are three examples. Any workflow that puts two or more humans on a call with an AI assistant is a candidate for multi-participant voice AI.
Add a Skip Turn tool to your assistant and update instructions to describe when to defer. Full setup is in the Multi-Participant Calls dev guide. New to voice AI assistants? Start with the Voice Assistant Quickstart.
Optionally add participant names to Keyterm Boost on the Voice tab for higher transcription accuracy on those names. See the Transcription Settings guide for the full list of supported models.
Speaker awareness is not a feature added on top of an LLM. It is a capability that requires the call control layer and the AI inference layer to share state in real time. Telnyx ships it because both layers run on the same network, on the same stack, with sub-200ms round-trip latency end to end. This is what AI Agent Infrastructure looks like.
Add Skip Turn and Invite to your assistant in the Telnyx Mission Control Portal, or follow the Multi-Participant Calls developer guide.
Working on a multi-participant use case that doesn't fit the patterns above? Talk to sales.