GLM-5.2 is now available on Telnyx Inference

23, Jun 2026

GLM-5.2 by Z.ai is now available on Telnyx Inference, hosted on our owned GPU infrastructure.

GLM-5.2 is the highest-ranked open-weight model on Artificial Analysis, with an Intelligence Index of 51, outpacing MiniMax-M3 (44), DeepSeek V4 Pro (44), and Kimi K2.6 (43). The model scores 99.2 on AIME 2026 and 62.1 on SWE-bench Pro, placing it in the top tier for both reasoning and coding.

The model uses Dynamic Sparse Attention (DSA) with IndexShare, which reuses sparse attention indexers across every four transformer layers to reduce per-token FLOPs by 2.9x at 1M context. The result is stable long-context performance for codebase analysis, document processing, and multi-turn agent sessions, without the performance degradation that hits standard attention at long sequence lengths.

What's new

  • 1M token context window: Process up to 1,048,576 tokens in a single request, enough for entire codebases, long documents, or extended multi-turn conversations without chunking.
  • Adjustable thinking effort: Control reasoning depth per request with reasoning_effort settings, from faster responses to maximum depth on complex reasoning tasks.

GLM-5.2 hosted on on Telnyx infrastructure

Telnyx owns the B300 GPUs running GLM-5.2, which means no cloud provider markup baked into every token, no rate limits set by a third party, and no rented GPU fleet introducing variable performance. We control the hardware and the network end to end, so throughput and latency are predictable, not best-effort.

Open-weight models are matching or beating closed-source on quality. The difference is price. Customers switching from closed-source APIs are seeing 75%+ cost reductions with no compromise on quality and no vendor lock-in.

Getting started

Send your first request through the OpenAI-compatible Telnyx Inference API:

curl https://api.telnyx.com/v2/ai/chat/completions \
  -H "Authorization: Bearer $TELNYX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-5.2-FP8",
    "messages": [
      {"role": "user", "content": "Explain the IndexShare optimization in GLM-5.2."}
    ]
  }'

Get started with Inference documentation or sign up in the portal.