Table of Contents
Z.ai launched GLM-5.2 on June 13, 2026, alongside a subscription service called the GLM Coding Plan. The plan gives developers access to one of the strongest open-source coding models available, but the sticker prices are higher than what early reports suggested.
What Is GLM-5.2?
GLM-5.2 is a coding-specific AI model built by Z.ai, a Chinese AI company formerly known as Zhipu AI. It uses a Mixture-of-Experts (MoE) architecture, a design in which the model has 753 billion total parameters but activates only 40 billion of them for any given request. This keeps per-token compute costs down without losing capacity.
The model has a 1-million-token context window. In plain terms, it can hold an entire mid-sized code repository in memory at once. Output caps at 131,072 tokens per response are enough for multi-file code rewrites in a single pass.
Z.ai released the model weights under an MIT license, meaning anyone can download, modify, and self-host the model for commercial use. That’s a rare move for a model at this level of performance.
On benchmarks, GLM-5.2 scored 81.0 on Terminal-Bench 2.1 and 62.1% on SWE-bench Pro. It trails Claude Opus 4.8 (85.0 on Terminal-Bench) by a few points and outperforms GPT-5.5 (58.6% on SWE-bench Pro). One caveat: Z.ai shipped the model without published benchmarks at launch. These scores came later from vendor data and third-party testing, and haven’t had broad independent verification yet.

The Tiers
The GLM Coding Plan has three subscription levels: Lite, Pro, and Max. Each tier grants you a pool of prompts that refreshes every five hours.
The base (undiscounted) prices are $18 per month for Lite, $72 per month for Pro, and $160 per month for Max. Z.ai applies discounts depending on your billing cycle: 10% off for monthly billing, 20% off for quarterly, and 30% off for yearly. On the yearly plan, Lite drops to $12.60 per month, Pro to $50.40, and Max to $112. All three tiers show a “2nd year” renewal price that appears identical to the first-year discounted rate.

The Lite plan is built for lightweight iteration on small repositories. Z.ai’s page describes it as including a “base usage allowance” with rolling access to the latest flagship models. It supports 20+ coding tools, including Claude Code. Based on earlier documentation, Lite gives you roughly 80 prompts per five-hour cycle. That’s enough for light tinkering, but one intense coding session with GLM-5.2 can drain it fast. The flagship model burns through quota at a higher rate than older models like GLM-4.7.
The Pro plan marked “Popular” on the pricing page gives you everything in Lite plus 5x the usage. That works out to roughly 400 prompts per five-hour cycle. It’s built for day-to-day development on mid-sized repositories. Pro subscribers get priority access to new flagship models, a curated set of MCP tools (web search, web reader, vision), and faster generation speeds. For developers who code daily with an AI agent, this is the tier Z.ai is pushing.
The Max plan gives you everything in Pro plus 20x the Lite usage, roughly 1,600 prompts per five-hour cycle. It’s built for advanced users working on mid-to-large repositories. Max subscribers get first access to new flagship models and dedicated resources during peak times, meaning your throughput shouldn’t drop when the service is under heavy load.
Is the Pricing Competitive?
At full price, these plans are not cheap. The undiscounted Pro at $72 per month costs more than three times what Claude Code Pro charges ($20 per month). Max at $160 per month is more expensive than Claude Code’s $100 Max tier. GitHub Copilot Pro costs $10 per month. Cursor Pro costs $20 per month.
The yearly discounts change the math. Lite at $12.60 per month puts it close to GitHub Copilot Pro. Pro at $50.40 per month (yearly) is still 2.5x the price of Claude Code Pro, but Z.ai’s argument is that you get more usable throughput. Max at $112 per month (yearly) is comparable to Claude Code Max at $100.
The real value question is how much you can actually do with the quota. The Coding Plan operates on prompt-based quotas that refresh every five hours, not monthly credits like most competitors. A developer who works through multiple five-hour cycles per day can get significantly more total throughput than a flat monthly allocation would allow. Z.ai claims the plan gives you “tens of billions of tokens” per month for “roughly 1% of standard API costs.” The standalone API prices GLM-5.2 at $1.40 per million input tokens and $4.40 per million output tokens. Those are already below Anthropic and OpenAI rates.
The caveat is the multiplier system. During peak hours (14:00–18:00 UTC+8), GLM-5.2 burns quota at 3x the standard rate. Outside peak hours, it’s normally 2x, though a promotion running through September 2026 drops off-peak to 1x. A developer working during US or European business hours that land in Z.ai’s off-peak window gets the best deal right now. If that promotion expires and off-peak jumps back to 2x, the cost-per-token advantage shrinks.
So the pricing is competitive on throughput, especially for heavy users on the yearly plan who work off-peak. On sticker price alone, it’s more expensive than Claude Code and Cursor at the Pro and Max levels.
How the Plan Works in Practice
A few operational details matter if you’re considering signing up.
Quotas refresh every five hours, and there’s a separate weekly limit that resets every seven days from your subscription date. If you hit your five-hour limit, you wait. The system won’t pull from your separate Z.ai account balance to cover Coding Plan overages. You simply stop until the next cycle.
GLM-5.2 has two reasoning modes: High and Max. High mode is faster and cheaper on quota; use it for routine edits. Max mode runs deeper reasoning passes and costs more, but it’s where the model performs best on hard problems: complex refactors, multi-step plans, and architectural decisions. Tools like Claude Code map their /effort commands to these modes directly.
To get the full 1-million-token context window, you need to manually select the “glm-5.2[1m]” model variant. The default configuration doesn’t enable it automatically.
The plan works with major coding agents: Claude Code, Cline, OpenCode, ZCode, Roo Code, and several others, 20+ tools according to the pricing page. Z.ai built a dedicated API endpoint that mirrors Anthropic’s protocol, so developers can swap from Claude to GLM-5.2 by changing an environment variable. Existing project configurations, CLAUDE.md files, slash commands, and MCP server setups keep working without changes.
One strict rule: the Coding Plan is limited to officially supported tools. Using your plan quota through unauthorized third-party tools or custom integrations is prohibited. If you need to build your own agent, you should use the standalone API instead.
Subscriptions auto-renew. You can upgrade immediately by paying the price difference, but downgrades take effect after the current billing cycle. Plans are non-refundable.
Who Should Buy This?
The GLM Coding Plan makes the most sense for developers who need high-volume AI coding throughput and are willing to pay for it or who need an alternative to Anthropic’s models, given the ongoing regulatory uncertainty around US export controls.
The Lite tier on a yearly plan ($12.60/month) is a low-risk way to test whether GLM-5.2 works for your workflow. The Pro tier is where most daily users will land, but at $50–$72 per month, depending on billing cycle, it’s a real commitment compared to Claude Code Pro at $20. The Max tier is for developers running autonomous multi-hour agent sessions where quota limits are a real constraint.
Three things to watch: the benchmark scores haven’t had wide independent verification yet, the multiplier system means your effective quota depends heavily on when you work and which model you pick, and the off-peak promotion expires in September 2026. Try the off-peak window, route routine tasks to GLM-4.7, and save GLM-5.2 for the hard stuff.