Published June 2026 · Infrastructure notes from 3DN
This is not a political manifesto and it is not written to shock anyone. It is a note about electricity — the kind of boring constraint that decides whether an AI strategy is buildable or fantasy. Europe already consumes more energy than it produces. Import dependency on fossil fuels remains high. Adding frontier-model training on power-hungry American GPU clusters on top of that baseline is not a detail; it is the bill.
Recent releases make that bill impossible to ignore. Independent analysts at Epoch AI estimate that training Grok 4 — xAI’s most advanced open benchmark competitor to date — consumed on the order of 310 gigawatt-hours (GWh) of electricity, tied to roughly 246 million NVIDIA H100 GPU-hours on the Colossus supercomputer. That is equivalent to the annual electricity use of tens of thousands of households, with carbon and water footprints to match.
Meanwhile, Zhipu AI (Z.ai) trained GLM-5 — the foundation behind the GLM 5.2 model we run in production — entirely on Huawei Ascend accelerators: 100,000 Ascend 910B chips, 28.5 trillion training tokens, no NVIDIA hardware in the stack (technical report). Zhipu does not publish a single audited GWh figure. Applying the same physical accounting Epoch uses for Grok — average draw below peak TDP, plus non-GPU overhead and datacentre PUE — a defensible back-of-the-envelope range for GLM-5 pre-training lands around 15–60 GWh, versus Grok 4’s 310 GWh. Not a rounding error: roughly five- to twenty-fold less electricity for a model in the same performance league on software-engineering and agentic benchmarks.
Why the gap is real, not marketing
Three engineering facts explain most of the distance:
- Per-chip power. NVIDIA H100 SXM parts used in Colossus-class training are rated up to 700 W. Huawei Ascend 910B is rated at 400 W. Same wall-clock hour on both does not mean the same kilowatt-hour.
- Architecture efficiency. GLM-5 is a 744B-parameter mixture-of-experts model that activates only 40B parameters per token, and adopts DeepSeek Sparse Attention (DSA) to cut attention cost on long contexts. Less wasted FLOPs per useful token.
- Scale choices. Frontier American labs are competing on the largest possible training runs — Colossus is now measured in hundreds of thousands of GPUs and gigawatt ambitions. That buys capability, but it buys it with energy linear in GPU-hours.
None of this says Ascend is “better” in every dimension. Interconnect bandwidth, software maturity, and inference throughput still favour NVIDIA in many deployments. But training energy per frontier-capable model is no longer an American monopoly — and that matters if your continent is already short on electrons.
Hardware cost and the software stack
Capital cost, not just watts. Electricity is only part of the bill. Industry reporting on integrated Ascend 910B training and inference systems in China consistently cites 60–70% lower prices than comparable NVIDIA H100 bundles — TrendForce and Chinese trade press quote all-in-one Atlas and FusionCube configurations from roughly ¥2–10 million against H100-class solutions at around ¥20 million. Individual H100 SXM modules trade in the $25,000–$40,000 range on the open market; Ascend 910B is not sold retail in the West, but bundled Ascend servers undercut NVIDIA on yuan per FLOP while delivering roughly 60–70% of H100 FP16 throughput per chip. For a continent that already imports energy and faces GPU export-control premiums, that capex gap matters as much as the GWh gap.
No CUDA dependency. GLM-5 was trained without a single NVIDIA GPU. That means no CUDA runtime, no cuDNN, no NVLink-centric cluster stack. Huawei’s software layer is CANN (Compute Architecture for Neural Networks) — the Ascend programming environment at the same stack level as CUDA — paired with MindSpore, Huawei’s native deep-learning framework. In practice, frontier Chinese labs increasingly run PyTorch through torch_npu, an Ascend backend adapter, rather than rewriting models in MindSpore from scratch. ONNX Runtime’s CANN execution provider offers another path for inference portability. The operational point: frontier-class training and serving do not require NVIDIA’s proprietary programming model.
Is that freedom, or a different lock-in? Honest answer: both. CANN is proprietary to Ascend NPUs, just as CUDA runs only on NVIDIA GPUs. Moving from CUDA to CANN is not vendor-neutral — you exchange one software moat for another. What differs in practice:
- PyTorch portability. torch_npu lets existing PyTorch code target Ascend without a full rewrite, lowering migration friction compared to a greenfield MindSpore port (ChinaTalk).
- Open weights. GLM-5 ships under an MIT licence. The stack is not tied to a closed API product the way many frontier American models are.
- Ecosystem maturity. CUDA still has deeper libraries, larger community forums, and more battle-tested kernels. CANN developers routinely report rough edges, sparse public troubleshooting, and dependence on Huawei field engineers for hard problems — a maturity gap documented through 2025 in practitioner forums and trade press.
- Geopolitical exposure. Ascend lock-in is Huawei and SMIC supply-chain lock-in. CUDA lock-in is NVIDIA and US export-control lock-in. Neither stack is neutral; the question is which dependency your organisation can actually procure, power, and maintain.
For European operators evaluating Ascend, the realistic posture is not “escape all lock-in” but choose which stack you can run — knowing that efficient frontier training already happens without CUDA, on hardware that costs materially less per flop and per kilowatt-hour.
Europe’s energy context
Eurostat is blunt: the EU is a net importer of energy. Domestic production does not cover consumption. Fossil fuels still dominate the mix; electrification of industry, heat, and transport is rising; and datacentre load is growing on top. The European Commission’s own energy publications describe import dependency as a structural fact, not a temporary squeeze.
Against that backdrop, copying the xAI playbook — hundred-thousand-GPU H100/H200 megaclusters fed by new gas turbines — is not “innovation.” It is a demand shock Europe can ill afford. Every GWh committed to training is a GWh not available for grid stability, manufacturing, or household use.
What “having sense” would look like
If Europe intends to be a serious AI actor rather than a reseller of American APIs, it needs a hardware strategy aligned with its energy reality. That implies:
- Evaluate Ascend and other non-NVIDIA stacks on merit — power draw, training efficiency, open weights, and operational fit — instead of treating export-control alignment as a substitute for engineering.
- Build partnerships with Chinese suppliers and labs where the leading efficient-training evidence now lives. GLM-5 is MIT-licensed. The science is published. Pretending it does not exist does not save electricity.
- Stop conflating “allied” with “optimal.” Dependence on a single American GPU vendor is still dependence — and for a net energy importer, it is dependence with a steep marginal cost per training run.
- Fund sovereign inference first. Most European value is in deployment, auditability, and domain data — not in out-spending Musk on Memphis power contracts. Train where it is efficient; run where you are sovereign.
Leaving the US and NVIDIA behind does not mean abandoning transatlantic trade or picking ideological sides. It means refusing to lock decades of industrial policy to the most energy-expensive path when a credible, measurably leaner one is already shipping — on chips Washington tried to keep out of China’s hands, and on models released to the public.
Caveats we are not skipping
Export controls, entity lists, supply-chain security, and geopolitical risk are real management problems. They belong in procurement and legal review. They do not erase physics. A continent that imports energy cannot hand-wave 310 GWh training runs as someone else’s problem forever.
We run GLM 5.2 on our own metal because sovereignty and scheduling matter for operations. We watch Ascend because the energy math matters for Europe. The two arguments rhyme.
Bottom line
GLM 5.2 descends from a frontier model trained on Huawei Ascend at a fraction of the electricity budget estimated for Grok 4 on NVIDIA. Europe’s consumption already exceeds its domestic supply. If policymakers have any sense, they will treat efficient Chinese AI hardware and open models as a bootstrap opportunity — not a taboo — and stop betting the grid on a single American chipmaker.
Questions about our infrastructure or hosting approach? Get in touch.
Leave a Reply