Qwen3 235B A22B

Name: Qwen3 235B A22B: Mixture-of-Experts Model with Thinking Capabilities
Brand: Telnyx
Price: 1 USD
Availability: InStock

Alibaba's flagship 235B mixture-of-experts model activating 22B parameters, supporting dual thinking modes, 119 languages, and native MCP tool integration.

Start building GET Available Models

about

Alibaba's flagship routes each token through 2 of 128 experts, keeping 22B of 235B total parameters active, trained on 36 trillion tokens spanning 119 languages and dialects. It switches between thinking and non-thinking modes within a single model without requiring separate reasoning variants, scoring 85.7 on AIME 2024 and 70.7 on LiveCodeBench v5, with native MCP tool integration for agentic workflows.

Licenseapache-2.0

Context window(in thousands)32,768

Use cases for Qwen3 235B A22B

Hybrid thinking without model switching: The dual thinking/non-thinking mode operates within a single model, eliminating the need to route between separate reasoning and fast-response variants in production.
119-language content processing: With the broadest language coverage of any frontier MoE, it handles translation, summarization, and Q&A across languages that most models do not support.
MCP-native tool orchestration: Built-in Model Context Protocol integration enables it to connect directly to external tools, databases, and APIs without custom function-calling wrappers.

Quality

Arena Elo1422

MMLUN/A

MT BenchN/A

Qwen3 235B scores 88.4% on MMLU and 93.1% on MMLU-Redux, with AIME 2024 at 85.7%. Compared to GPT-5 (92.5% MMLU) on the same sheet, it trails by about 4 points on MMLU but covers 119 languages, the broadest language coverage of any model at this tier. Its dual thinking/non-thinking mode operates within a single model, unlike competitors that require separate reasoning variants.

gpt-5.2

1440

gpt-5

1426

Qwen3 235B A22B

1422

gpt-4.1

1413

Gemini-2.5-Flash

1411

pricing

Running Qwen3 235B through Telnyx Inference costs $0.70 per million input tokens and $2.80 per million output tokens. Processing 1,000,000 multilingual queries across 119 languages at 1,000 tokens each would cost approximately $1,750, significantly less than comparable frontier models like GPT-4.1 ($5,000).

What's Twitter saying?

Developers praise Qwen3 235B-A22B for its strong benchmark performance matching top models like DeepSeek R1 and o1-mini, with impressive local runs on modest hardware like quad 3090s at 2.5-3 tokens/sec in BF16.
Users appreciate the switchable thinking/non-thinking modes for balancing reasoning on complex tasks (e.g., coding, math) with fast conversational speed, though some note high RAM usage (~90GB).
Critics highlight limitations in coding and novel tasks, with excessive thinking slowing local use, failures in evals like micromanager, and underwhelming real-world results despite benchmarks.

Explore Our LLM Library

Discover the power and diversity of large language models available with Telnyx. Explore the options below to find the perfect model for your project.

No data available at this time, please try again later.

Organization	Model Name	Tasks	Languages Supported	Context Length	Parameters	Model Tier	License
No data available at this time, please try again later.

TRY IT OUT

Chat with an LLM

Powered by our own GPU infrastructure, select a large language model, add a prompt, and chat away. For unlimited chats, sign up for a free account on our Mission Control Portal here.

HOW IT WORKS

Selecting LLMs for Voice AI

GET Available Models

RESOURCES

Get started

Check out our helpful tools to help get you started.

Test in the portal
Easily browse and select your preferred model in the AI Playground.
Test today
Explore the docs
Don’t wait to scale, start today with our public API endpoints.
Get started
Stay up to date
Keep an eye on our AI changelog so you don't miss a beat.
See updates

Sign up and start building

faqs

What is Qwen3 235B A22B?

Qwen3 235B A22B is Alibaba Cloud's flagship mixture-of-experts model with 235 billion total parameters, activating 22 billion per inference pass. It is available on Hugging Face and through hosted inference providers including Telnyx.

What is the difference between Qwen2.5 7B and Qwen3 8B?

Qwen3 8B introduces "thinking mode" for extended reasoning and improved multilingual support over Qwen2.5 7B. The Qwen3 architecture also adds better tool calling and structured output generation.

What hardware is needed for Qwen3 235B?

Running the full Qwen3 235B at full precision requires multiple high-end GPUs (400GB+ total VRAM). The MoE architecture activates only 22B parameters per pass, making it efficient through hosted inference platforms that handle GPU provisioning.

Is Qwen 3 better than DeepSeek?

Qwen3 235B is competitive with DeepSeek R1 on reasoning and coding benchmarks, with advantages in multilingual support and tool calling. The comparison depends on specific tasks, and both models are available through inference providers.

Is Qwen3 open source?

Qwen3 models are released under the Apache 2.0 license, making them fully open for commercial use. Weights for all sizes are available on Hugging Face.

What is Qwen3's thinking mode?

Qwen3 introduces a "thinking mode" that enables step-by-step reasoning before generating a final response, similar to OpenAI's o1 approach. This can be toggled on or off through the API, allowing users to trade latency for accuracy on complex tasks.

about

Use cases for Qwen3 235B A22B

Hybrid thinking without model switching: The dual thinking/non-thinking mode operates within a single model, eliminating the need to route between separate reasoning and fast-response variants in production.
119-language content processing: With the broadest language coverage of any frontier MoE, it handles translation, summarization, and Q&A across languages that most models do not support.
MCP-native tool orchestration: Built-in Model Context Protocol integration enables it to connect directly to external tools, databases, and APIs without custom function-calling wrappers.

pricing

What's Twitter saying?

Developers praise Qwen3 235B-A22B for its strong benchmark performance matching top models like DeepSeek R1 and o1-mini, with impressive local runs on modest hardware like quad 3090s at 2.5-3 tokens/sec in BF16.
Users appreciate the switchable thinking/non-thinking modes for balancing reasoning on complex tasks (e.g., coding, math) with fast conversational speed, though some note high RAM usage (~90GB).
Critics highlight limitations in coding and novel tasks, with excessive thinking slowing local use, failures in evals like micromanager, and underwhelming real-world results despite benchmarks.

Organization

Model Name

Tasks

Languages Supported

Context Length

Parameters

Model Tier

License

No data available at this time, please try again later.

faqs