How to Choose an Inference Model: The Efficient Frontier

If you need...	Use...	Because...
Voice AI and real-time responses	Kimi K2.6	Lowest TTFT on our platform, non-reasoning mode stays intelligent
High-throughput reasoning and function calling	GLM-5.1-FP8	2x throughput vs competitors, best for batch and agentic workloads
Best intelligence-per-dollar	MiniMax-M2.7	3-6x faster than competitors, highest throughput per dollar
MoE efficiency for balanced workloads	Qwen3-235B-A22B	235B total / 22B active params, strong reasoning at low compute cost

The Efficient Frontier: How to Choose an Inference Model