GLM-5.2, an open-weight model released under MIT license, is tested locally for coding, math, logic, and 3D generation tasks. The model shows significant improvements over version 5.1 but suffers from high verbosity and slower inference.
Benchmarks and Licensing ⏱ 0:00
•GLM 5.2 is heralded as Claude Fable's competitor in the open source realm•Their shares jumped by a third from this release•Intelligence improved from 58 to 62 in cuding (just a 0.1 update)•First open-weight model to get over 80 in Terminal Bench, up there with GPT 5.5 and Claude Opus•MCP Atlas is up there with Claude and Humanity's last exam destroying Shack Jipper relevant•AIM for international maths: 99.2% (up from 95%), beating Claude Opus•Released under MIT license with no extra terms and conditionsCoding and 3D Generation Tests ⏱ 0:32
•WebGL face generation: 5.1 produced a decent face with eyes pointing to sky and nose morphed into chin; 5.2 with high thinking mode produced 3,000 tokens and an egg-like shape; using INF edition (4.8 bits) produced 15,000 tokens giving a potato-like face with hair, ear, forehead, nose, mouth, and expression controls (smile, pensive, surprise)•Piano test (Twinkle Twinkle Little Star): 5.1 made a legible piano with words; 5.2 (4.5 bit) made a nice view with falling notes•Minecraft: 5.1 produced a broken world; 5.2 (4.8 bit INF) generated a working world with blocks, water, and block interactions•3D city scene: 5.1 produced a beautiful generation with sun, moon, shadows, and cars (but sluggish); 5.2 generated a night scene with cars and a person•Flappy Birds in 3D: 5.1 converted bird to plane (wings inverted); 5.2 (INF) had correct wings but game too hard to play•Golden hour drive: 5.1 basic car not on road; 5.2 (INF) produced a futuristic backward supercar•Microsoft Word clone: 5.1 generated 15,000 tokens with full functionality; 5.2 (INF) generated 30,000 tokens with more menus and features (help, share, comment section)•Procedural planet generator: 5.1 generated 15,000 tokens (15 tokens/s) with planet and asteroid; 5.2 generated 29,000 tokens with nicer interface, sea level, terrain roughness controlsMathematics and Logic ⏱ 17:36
•International Maths Olympiad question: 5.2 with thinking disabled gave wrong answer; with high thinking mode (6,700 tokens) got correct answer•Logic: 'Car wash is 50 m from my house, should I drive or walk?' - 5.2 correctly said drive•Logic: 'Surgeon is the boy's father' - correct•Logic: 'If I could, then I would. I'll go wherever.' - recognized as 'Calling wherever you will go'•Logic: 'What does Nazi mean?' - correct answer•Orange distribution: 8 oranges, 4 children, 1 knife. 5.2 gave step-by-step solution: 'cut the eight oranges in half' (5.1 just said give two oranges). Token inspector showed 3% probability of saying 'four children' instead of 'eight oranges'New Features and Performance ⏱ 13:00
•GLM 5.2 introduces max and high thinking modes•New feature: shared indexer - runs indexer only on specific layers, shares results between layers, reducing processing (should run slightly faster, use less memory)•Inference speed: around 12-15 tokens per second locally; each inference test took about half an hour•Verbosity doubled: typical token count increased from 15,000 to 30,000 tokens for same prompts•5.2 slower than Kim K 2.7 (coding specialist model also recently tested)Key Takeaways
•GLM 5.2 is released under MIT license with open weights, showing substantial benchmark improvements over 5.1 (e.g., AIM math 99.2% beating Claude Opus).•Coding tests (WebGL face, Minecraft, 3D city, Word clone, planet generator) show better quality but token usage doubles (~30,000 tokens) requiring longer wait times.•Math reasoning requires high thinking mode to get correct answers; thinking disabled produced wrong results.•The model exhibits a 3% probability (observed) of misinterpreting 'cut the eight oranges' as 'cut the four children', raising safety concerns.•New shared indexer feature reduces processing but overall performance is slower than Kim K 2.7 and local inference is time-consuming (hours per test).Conclusion
GLM 5.2 demonstrates clear intelligence improvements and open licensing, but high verbosity and slower inference make it better suited for tasks where quality outweighs speed. The model's occasional dangerous misinterpretations warrant caution in autonomous settings.