ultravox-v0_4_1-llama-3_1-8b

Name: Ultravox v0.4.1 Llama 3.1 8B: Real-Time Voice AI Model
Brand: Telnyx
Price: 1 USD
Availability: InStock

A speech-language model from Fixie AI that pairs Llama 3.1 8B with a Whisper encoder, enabling direct audio understanding and speech-to-text reasoning.

Start building GET Available Models

about

Fixie AI's Ultravox replaces the traditional ASR-then-LLM pipeline by fusing a frozen Whisper large-v3-turbo encoder with Llama 3.1 8B through a trained multi-modal adapter. Audio embeddings are injected directly into the LLM's input space via a special <|audio|> pseudo-token, achieving roughly 150ms time-to-first-token on A100 hardware without requiring a separate transcription step.

LicenseMIT

Context window(in thousands)8000

Use cases for ultravox-v0_4_1-llama-3_1-8b

Low-latency voice agents: By fusing audio directly into the LLM embedding space, Ultravox eliminates the separate ASR step and achieves roughly 150ms time-to-first-token for spoken input.
Spoken language understanding: The Whisper encoder processes audio semantics rather than just transcription, enabling the model to interpret tone, emphasis, and intent alongside words.
Audio-grounded retrieval: The special audio token mechanism allows it to answer questions about spoken content without generating an intermediate transcript.

Quality

Arena EloN/A

MMLUN/A

MT BenchN/A

Ultravox v0.4 is a speech-language model, so standard text benchmarks like MMLU do not apply directly. Its Llama 3.1 8B backbone scores 69.4% on MMLU (5-shot), but the model's value is in audio processing: it achieves roughly 150ms time-to-first-token on spoken input by fusing Whisper and Llama without a separate ASR step, unlike traditional cascaded pipelines.

Claude-Opus-4-6

1501

GLM-5

1456

gpt-5.1

1455

Kimi-K2.5

1454

gpt-5.2

1440

pricing

The cost of running Ultravox v0.4 with Telnyx Inference is $0.0002 per 1,000 tokens for the text component. The Whisper encoder processes audio at $0.003 per minute. A voice agent handling 100,000 one-minute calls would cost approximately $300 for audio processing plus $200 for text generation.

What's Twitter saying?

Developers praise Ultravox's low latency and human-like conversations, with benchmarks showing it matches Whisper Large v3 + Llama 3.1 8B and outperforms GPT-4o Realtime in speech understanding and accuracy.
Tech reviewers highlight superior voice quality and vast options (over 1,000 models), calling it the best platform with amazing support and community.
Commentators note limited LLM customization and control as a downside, though its end-to-end audio understanding excels in real-time interaction.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

Organizationdeepseek-ai

Model NameDeepSeek-R1-Distill-Qwen-14B

Taskstext generation

Languages SupportedEnglish

Context Length43,000

Parameters14.8B

Model Tiermedium

Licensedeepseek

Organization	Model Name	Tasks	Languages Supported	Context Length	Parameters	Model Tier	License
deepseek-ai	DeepSeek-R1-Distill-Qwen-14B	text generation	English	43,000	14.8B	medium	deepseek
fixie-ai	ultravox-v0_4_1-llama-3_1-8b	audio text-to-text	Multilingual	8,000	8.7B	small	mit
google	gemma-2b-it	text generation	English	8,192	2.5B	small	gemma
google	gemma-7b-it	text generation	English	8,192	8.5B	small	gemma
meta-llama	Llama-3.3-70B-Instruct	text generation	Multilingual	99,000	70.6B	large	llama3.3
meta-llama	Llama-Guard-3-1B	safety classification	Multilingual	128,000	1.5B	small	llama3.3
meta-llama	Meta-Llama-3.1-70B-Instruct	text generation	Multilingual	99,000	70.6B	large	llama3.1
meta-llama	Meta-Llama-3.1-8B-Instruct	text generation	Multilingual	131,072	8.0B	small	llama3.1
minimaxai	MiniMax-M2.5	text generation	English	2,000,000	0	large	minimaxai
minimaxai	MiniMax-M2.7	text generation	English	200,000	0	large	minimaxai
mistralai	Mistral-7B-Instruct-v0.1	text generation	English	8,192	7.2B	small	apache-2.0
mistralai	Mistral-7B-Instruct-v0.2	text generation	English	32,768	7.2B	small	apache-2.0
mistralai	Mixtral-8x7B-Instruct-v0.1	text generation	Multilingual	32,768	46.7B	medium	apache-2.0
moonshotai	Kimi-K2.5	text generation	English	256,000	1.0T	large	modified-mit
Qwen	Qwen3-235B-A22B	text generation	English	32,768	235.1B	large	apache-2.0
zai-org	GLM-5.1-FP8	text generation	English	202,752	753.9B	large	mit
anthropic	claude-3-7-sonnet-latest	text generation	Multilingual	200,000	0	large	anthropic
anthropic	claude-haiku-4-5	text generation	Multilingual	200,000	0	large	anthropic
anthropic	claude-opus-4-6	text generation	Multilingual	200,000	0	large	anthropic
anthropic	claude-sonnet-4-20250514	text generation	Multilingual	200,000	0	large	anthropic
google	gemini-2.0-flash	text generation	Multilingual	1,048,576	0	large	google
google	gemini-2.5-flash	text generation	Multilingual	1,048,576	0	large	google
google	gemini-2.5-flash-lite	text generation	Multilingual	1,048,576	0	large	google
groq	gpt-oss-120b	text generation	English	131,072	117.0B	large	groq
groq	kimi-k2-instruct	text generation	English	131,072	1.0T	large	groq
groq	llama-3.3-70b-versatile	text generation	Multilingual	131,072	70.6B	large	llama3.3
groq	llama-4-maverick-17b-128e-instruct	text generation	Multilingual	1,000,000	400.0B	large	llama4
groq	llama-4-scout-17b-16e-instruct	text generation	Multilingual	128,000	109.0B	large	llama4
openai	gpt-3.5-turbo	text generation	Multilingual	4,096	0	large	openai
openai	gpt-4	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-0125-preview	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-0314	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-0613	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-1106-preview	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-32k-0314	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4-turbo-preview	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4.1	text generation	Multilingual	1,047,576	0	large	openai
openai	gpt-4.1-mini	text generation	Multilingual	1,047,576	0	large	openai
openai	gpt-4o	text generation	Multilingual	128,000	0	large	openai
openai	gpt-4o-mini	text generation	Multilingual	128,000	0	large	openai
openai	gpt-5	text generation	Multilingual	400,000	0	large	openai
openai	gpt-5-mini	text generation	Multilingual	400,000	0	large	openai
openai	gpt-5.1	text generation	Multilingual	400,000	0	large	openai
openai	gpt-5.2	text generation	Multilingual	400,000	0	large	openai
openai	o1-mini	text generation	Multilingual	128,000	0	large	openai
openai	o1-preview	text generation	Multilingual	128,000	0	large	openai
openai	o3-mini	text generation	Multilingual	200,000	0	large	openai
xai-org	grok-2	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-2-latest	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-beta	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-fast	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-fast-beta	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-fast-latest	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-latest	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-mini	text generation	Multilingual	131,072	0	large	xai
xai-org	grok-3-mini-fast	text generation	Multilingual	131,072	0	large	xai

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models

RESOURCES

Get started

Check out our helpful tools to help get you started.

Test in the portal
Easily browse and select your preferred model in the AI Playground.
Test today
Explore the docs
Don’t wait to scale, start today with our public API endpoints.
Get started
Stay up to date
Keep an eye on our AI changelog so you don't miss a beat.
See updates

Sign up and start building

faqs

What is Ultravox?

Ultravox is a speech-language model from Fixie AI that directly understands spoken audio without requiring a separate speech-to-text step. It pairs a Llama 3.1 8B backbone with a Whisper encoder to process audio input and generate text responses in a single model.

How does Ultravox work?

Ultravox uses a frozen Whisper large v3 turbo encoder to process audio and a multi-modal adapter to translate audio features into the Llama 3.1 8B language model's embedding space. Only the adapter is trained while both Whisper and Llama remain frozen, making training efficient.

What is Ultravox good for?

Ultravox is designed for real-time voice agent applications, speech-to-speech translation, and spoken audio analysis. Its time-to-first-token of approximately 150ms makes it suitable for low-latency voice interactions where traditional STT-then-LLM pipelines would be too slow.

How fast is Ultravox?

Ultravox v0.4.1 achieves a time-to-first-token of approximately 150ms and generates 50-100 tokens per second on an A100 40GB GPU. This speed makes it practical for real-time conversational applications that require immediate audio understanding.

ultravox-v0_4_1-llama-3_1-8b

about

Use cases for ultravox-v0_4_1-llama-3_1-8b

Quality

pricing

What's Twitter saying?

Explore Our LLM Library

Chat with an LLM

Selecting LLMs for Voice AI

Create an account

Choose ultravox-v0_4_1-llama-3_1-8b

Enter your API key

Prompt the LLM

Get started

Test in the portal

Explore the docs

Stay up to date

Sign up and start building

faqs

What is Ultravox?

How does Ultravox work?

What is Ultravox good for?

How fast is Ultravox?

ultravox-v0_4_1-llama-3_1-8b

about

Use cases for ultravox-v0_4_1-llama-3_1-8b

Quality

pricing

What's Twitter saying?

Explore Our LLM Library

DeepSeek-R1-Distill-Qwen-14B