GLM-5.2 by Z.ai is now available on Telnyx Inference, hosted on our owned GPU infrastructure.
GLM-5.2 is the highest-ranked open-weight model on Artificial Analysis, with an Intelligence Index of 51, outpacing MiniMax-M3 (44), DeepSeek V4 Pro (44), and Kimi K2.6 (43). The model scores 99.2 on AIME 2026 and 62.1 on SWE-bench Pro, placing it in the top tier for both reasoning and coding.
The model uses Dynamic Sparse Attention (DSA) with IndexShare, which reuses sparse attention indexers across every four transformer layers to reduce per-token FLOPs by 2.9x at 1M context. The result is stable long-context performance for codebase analysis, document processing, and multi-turn agent sessions, without the performance degradation that hits standard attention at long sequence lengths.
reasoning_effort settings, from faster responses to maximum depth on complex reasoning tasks.Telnyx owns the B300 GPUs running GLM-5.2, which means no cloud provider markup baked into every token, no rate limits set by a third party, and no rented GPU fleet introducing variable performance. We control the hardware and the network end to end, so throughput and latency are predictable, not best-effort.
Open-weight models are matching or beating closed-source on quality. The difference is price. Customers switching from closed-source APIs are seeing 75%+ cost reductions with no compromise on quality and no vendor lock-in.
Send your first request through the OpenAI-compatible Telnyx Inference API:
Get started with Inference documentation or sign up in the portal.