Why is the API for GLM-5.1 more expensive than GLM-5 when the model size is the same?
Hi team and community,
I noticed that the API pricing for GLM-5.1 is higher than GLM-5 on the Z.ai platform:
GLM-5.1: Input $1.4 / Output $4.4
GLM-5: Input $1 / Output $3.2
As far as I know, both models share the same architecture and parameter size (744B total, 40B active MoE).
So my question is: Why the price increase?
Is the inference efficiency worse due to defaults like Thinking Mode or agentic optimizations? Or is it purely a business decision (value-based pricing) because GLM-5.1 is highly optimized and much smarter via post-training?
What puzzles me most is that since GLM-5 and GLM-5.1 share the same architecture and parameter size, the inference cost (hardware requirement) should be identical. In an open-source ecosystem, anyone hosting the model would simply replace 5 with 5.1 at zero additional operational cost.
Therefore, choosing 5 over 5.1 just because it's 'cheaper' seems fundamentally irrational from a purely technical standpoint. Is this API pricing strictly a business strategy (value-based pricing to recover R&D costs), or is there an invisible technical overhead in 5.1 that I'm missing?"
I'd love to hear the technical or strategic reasons behind this. Thanks!
Yes, I also wonder why GLM-5 shares the core technology DSA with and having a comparable size with DeepSeek-V3.2 (744B-A40B vs 671B-A37B) but is several times the price of the latter, it might be purely commercial considerations. (as you can notice that almost all providers on OpenRouter match their price to the official's)
I suspect there might ( not sure) be 2 reason for this :
1)Chinese computation is much cheaper (due to abundance of energy and subsidy)....even though amarican chips are better .... , So American servers (like openrouter) easily gets undercut in front of Chinese computation ...
2) Data War: using point-(1) as leverage .....Chinese Companies are aggressively selling their own API/Openclaw services (even at a loss) [...that's one of the reasons that some Chinese models are getting proprietary(like glm-turbo series)].......so if you don't want to pay premium ...grab their CODING-plan🤓.
What puzzles me most is that since GLM-5 and GLM-5.1 share the same architecture and parameter size, the inference cost (hardware requirement) should be identical.
This assumption is not correct which might be where the confusion comes in. Here is a short explanation from ChatGPT:
Since GLM-5 and GLM-5.1 appear to be very similar MoE models with roughly the same active parameter count, their baseline per-token compute and minimum weight-memory requirements should be broadly similar. But their real inference cost is not guaranteed to be identical, because serving cost also depends on exact parameter count, routing behavior, attention implementation, context/output lengths, quantization, batching, cache behavior, inference framework, and any “thinking”/agentic usage patterns.
GLM 5.1 is straight-up a disaster. I genuinely believe they intentionally made it dumber with the new licensing plans. it chokes on basic, mediocre everyday tasks just to enforce the usage limits ,
not trying to sound like a whiny bi*, but stuff that took it over an hour, grok finished in one go. You’ll probably think I’m exaggerating… until you try it yourself.
