The Layer Cake — Mapping the AI Hardware Stack
Nvidia's Jensen Huang introduced the layer cake framing to describe AI's economic architecture from the ground up. It is simple, correct, and underappreciated by investors who focus exclusively on the application layer. Each layer has distinct market size, margin characteristics, competitive structure, and investment duration. Reading from bottom to top:
| Layer | 2024 Market Size (est.) | 2027–2030 Estimate | Growth Driver | Margin Character |
|---|---|---|---|---|
| Chips (AI semis) | ~$70B (AI GPU revenue) | ~$200B by 2027 | Training + inference demand; model size scaling | Very high — 60–75% gross margin at design leaders |
| Data Centre Build-Out | ~$250B global DC capex | ~$500B+ by 2030 | Hyperscaler capex; AI-specific density requirements | Moderate — construction/equipment margins; REITs earn stable yield |
| Networking (DC) | ~$25B | ~$60B by 2028 | GPU cluster scale-out; 400G/800G Ethernet adoption | High — 50–60% gross margin for leading vendors |
| Energy / Power Infra | Difficult to isolate; AI adds ~$30B/yr to grid capex | ~$90B+ incremental by 2030 | Grid bottleneck; AI campus power requirements | Moderate — regulated utilities; equipment makers higher margin |
| Models (API / training) | ~$20B API revenue; $40B+ training capex | ~$100B+ by 2028 | Enterprise AI adoption; inference volume growth | Declining — commoditisation pressure; open-source competing |
| Applications | Nascent — most value unrealised | $500B–1T+ (highly uncertain) | Productivity gains; AI feature embedding | Variable — incumbents gain most; AI-native apps unproven at scale |
| Market size estimates are directional; sources include Morgan Stanley, Goldman Sachs, SemiAnalysis, and company disclosures. AI GPU revenue is Nvidia data centre revenue as the primary proxy. | ||||
The layer cake is not just a conceptual framework — it is a capital allocation map. The highest gross margins and most durable competitive positions sit in the chip layer (Nvidia, TSMC, ASML, SK Hynix). The largest total capital deployment is at the infrastructure layer (data centres, power), but margin per dollar of investment is lower. The application layer commands the highest speculative valuations but faces the most uncertain long-term margin structure. For a research framework, this means the analytical priority should be the chip layer — and that is exactly what Part B addresses.
Energy & Power Infrastructure — The Next Binding Constraint
The US electrical grid was designed around a world of relatively predictable, geographically dispersed demand growth. AI data centres are the opposite: sudden, enormous, geographically concentrated loads appearing in areas whose grid was sized for agricultural or light industrial use. The mismatch between what hyperscalers need and what the grid can deliver is the primary real-world constraint on AI infrastructure buildout — not chip supply, not permitting for data centre construction, but transformer availability and transmission capacity.
The numbers are stark. A single large EHV (Extra High Voltage) transformer — the type required to step down transmission-level power for a hyperscale campus — costs $3–7M, weighs 200–400 tonnes, and has a lead time of 18–24 months from order to delivery. There are only a handful of manufacturers globally capable of producing them at the required voltage class: ABB, Siemens Energy, Hitachi Energy, SPX Transformer Solutions in the US. Combined global production capacity is estimated at 700–900 units per year against rising demand. The supply of transformers has become one of the least-discussed but most practically constraining bottlenecks in the AI infrastructure buildout.
| Sub-Segment | What It Does | Key Players | Bottleneck? | Investment Angle |
|---|---|---|---|---|
| EHV Transformers | Step down transmission voltage to usable campus power. Critical single point; not substitutable. | Hitachi Energy, ABB, Siemens Energy, SPX (US) | Yes — 18–24 month lead times; global capacity ~700–900 units/yr | High — pricing power emerging; order books extended 2–3 years. Hitachi Energy most directly exposed to AI DC demand. |
| Thermal / Liquid Cooling | Remove heat generated by high-density GPU racks. Air cooling inadequate above ~40 kW/rack; AI racks run 40–100 kW. | Vertiv, Schneider Electric, Asetek, CoolIT, Alibaba (internal) | Partial — supply scaling but technology transition from air to liquid is not trivial | High — liquid cooling is a structural shift, not a cycle. Vertiv is the most liquid public exposure. Direct liquid cooling (DLC) adoption rate is the key metric. |
| UPS / Backup Power | Uninterruptible power supply and backup generation. Data centres require N+1 or 2N redundancy. | Eaton, Schneider Electric, Cummins (diesel gensets), Bloom Energy (fuel cells) | Moderate — genset lead times extended but not severe | Moderate — growing with DC construction but less differentiated. Fuel cell backup (Bloom) is interesting as grid reliability declines. |
| Grid Switchgear / Controls | Electrical switching and protection equipment within the campus power distribution system. | ABB, Eaton, Schneider Electric, Siemens | Moderate — same supply chain constraints as transformers but lower unit value | Moderate — benefits from same grid upgrade cycle but more commoditised than transformers. |
| Nuclear (Baseload) | 24/7 carbon-free baseload power. The only renewable-adjacent source that can deliver firm GW-scale power without storage. | Constellation Energy (existing fleet), Kairos Power, NuScale, X-energy (SMR), TerraPower | Long-dated — SMRs are 2030+ at scale; existing nuclear PPAs being competed for now | High structural, long dated — Microsoft Three Mile Island PPA is the template. Constellation is the most liquid near-term exposure. SMR developers are pre-revenue venture-stage. |
| Renewables + Storage | Solar/wind for partial load coverage; battery storage for short-duration grid firming. | NextEra, AES, Sunrun; battery: QuantumScape, Form Energy (long-duration) | No — renewable capacity is abundant; the issue is intermittency and transmission | Lower for pure AI thesis — renewables can't solve the 24/7 firm power requirement alone. Long-duration storage (10–100hr) is the missing piece and largely pre-commercial. |
| The power infrastructure buildout is constrained not by capital availability but by manufacturing capacity (transformers, switchgear), permitting timelines (grid upgrades, nuclear), and engineering talent. These constraints are structurally slower to relieve than semiconductor supply constraints. | ||||
In the prior generation of data centre buildout, site selection was driven by fibre connectivity, real estate cost, and tax incentives. For AI-era hyperscale campuses, the dominant variable has shifted to available power and grid headroom. Virginia — historically the world's largest data centre market — is facing a power availability crisis: Dominion Energy has effectively paused new large-load interconnection approvals in parts of Northern Virginia due to transmission constraints. This has redirected investment to new geographies: the Pacific Northwest (hydro-abundant), Texas (ERCOT grid flexibility), the upper Midwest (cheap wind + available transmission), and internationally, the Nordic countries (hydro + cold climate) and the Middle East (sovereign capital + land + willingness to build dedicated generation).
The implication for the investment thesis is that geographic power availability is becoming a durable competitive advantage for data centre operators who secured land and power agreements early. Equinix and Digital Realty — the dominant colocation REITs — have decades of site relationships that are difficult to replicate. Hyperscalers building their own campuses are racing to secure long-term power purchase agreements before available capacity is fully contracted.
Nuclear power has been commercially stagnant in the US and Europe for decades — the combination of Chernobyl, Fukushima, cost overruns (Vogtle), and cheap gas made new nuclear economically unviable. The AI power demand shock is changing that calculus in a narrow but important way: hyperscalers need 24/7 carbon-free power at GW scale, and no combination of solar, wind, and battery storage can reliably deliver that at current technology costs. Microsoft's deal to restart Three Mile Island (2023, 20-year PPA with Constellation for 835 MW), Google's PPA with Kairos Power for SMR output, and Amazon's acquisition of a nuclear-powered data centre campus signal that the tech sector has concluded that nuclear is the only near-term answer for firm baseload power at data centre scale. This is not a speculative thesis — it is procurement behaviour by the world's most data-driven buyers.
The energy and power infrastructure layer is underweighted in most AI investment frameworks because it is less intellectually exciting than chip architecture debates and less visible than Nvidia's earnings. But it is the layer where the physical constraints are most severe, the lead times are longest, and the incumbent advantages (existing nuclear fleet, long-term transmission rights, cooling IP) are most durable. As chip supply normalises through 2026–2027, the power constraint will replace it as the primary bottleneck — and the companies positioned on that constraint will be the next wave of AI infrastructure beneficiaries. Transformer manufacturers and liquid cooling specialists are the current year picks; nuclear operators and long-duration storage will be the medium-term ones.
Where Value Accrues — The Margin Map & Moat Inventory
Understanding the stack architecturally is one thing. Understanding where durable economics accumulate is another. The history of technology platform buildouts — railways, telephony, the internet — suggests that infrastructure layers produce durable returns for the components where switching costs are highest and replication is hardest, while applications and services above the platform compete away margins unless they develop independent network effects or data advantages. AI is following an analogous pattern.
Three structural questions determine where value accrues in any technology value chain: Who is irreplaceable? Who benefits from rising volume without proportional cost? And who controls the bottleneck that constrains everyone else? Mapping the AI hardware stack against these questions produces a clear hierarchy.
| Company / Segment | Layer | Gross Margin (FY2024) | Operating Margin | Key Margin Driver |
|---|---|---|---|---|
| Nvidia (Data Centre) | Chips — Design | ~75–78% | ~55% | Pricing power from scarcity + CUDA ecosystem lock-in |
| ASML | Chips — Equipment | ~51–53% | ~31% | Monopoly on EUV; service contracts on installed base |
| TSMC | Chips — Fab | ~53–56% | ~42% | Leading-edge fab monopoly; 3nm/2nm premium pricing |
| SK Hynix | Chips — Memory (HBM) | ~40–45% (HBM blended) | ~25% | HBM3E scarcity; 2–3 year technology lead over peers |
| Broadcom (semi) | Chips — Networking/ASIC | ~65–68% | ~35% | Custom ASIC design lock-in; networking software/IP |
| Arista Networks | Infrastructure — Networking | ~60–63% | ~37% | Software-defined networking; EOS platform stickiness |
| Vertiv | Infrastructure — Cooling/Power | ~36–38% | ~17% | Data centre thermal management; liquid cooling growth |
| Super Micro | Infrastructure — Servers | ~14–17% | ~8% | AI server assembly; direct procurement; thin margin model |
| Intel (Data Centre) | Chips — CPU/GPU | ~39–42% (blended, declining) | ~Negative (2024) | Margin compression from AMD competition + fab transition losses |
| Margins are approximate FY2024 figures sourced from company reports and consensus estimates. Nvidia data centre segment margins based on blended company disclosure. HBM margins for SK Hynix are estimated — the company does not separately disclose HBM economics. | ||||
The transition from CPU-centric to GPU/accelerator-centric data centres is the underlying force that has reshaped the entire hardware stack since 2022. A traditional data centre processes tasks sequentially on a small number of powerful general-purpose processors. An AI training cluster processes matrix operations in parallel across thousands of specialist chips. This architectural shift has: (1) made Nvidia the most valuable company in the world; (2) made HBM memory bandwidth — not raw compute — the primary performance bottleneck; (3) made TSMC's leading-edge packaging capabilities as important as its fabrication; and (4) made power consumption per rack 10–20x what it was in the prior generation, triggering the grid infrastructure crisis that is now the next binding constraint.
The margin map makes clear that the chip layer is where the value chain's structural economics sit. But within chips, the hierarchy matters: equipment (ASML) and design (Nvidia) generate the highest margins; fabrication (TSMC) is high-margin but capex-intensive; memory (SK Hynix) is lucrative during scarcity but cyclical. The infrastructure layer is large in total dollar terms but margin-thin in most segments except networking. Part B now interrogates each dimension of the chip layer in depth.
The Semiconductor Value Chain — From Sand to System
A semiconductor chip is one of the most complex manufactured objects in human history — a product requiring the coordinated output of dozens of specialist industries, across multiple geographies, under tolerances measured in atoms. The value chain that produces it is not a linear assembly line but an intricate lattice of interdependent specialisations, each representing decades of accumulated knowledge that cannot be transferred quickly or cheaply.
Understanding the structure of this value chain — who does what, where the margin sits, and where the chokepoints are — is prerequisite to understanding why certain companies are structurally irreplaceable. It also explains why the geopolitical contest over semiconductor supply chains has become the central economic and strategic conflict of the 2020s.
The semiconductor industry organises itself along a fundamental structural choice: whether to design chips, fabricate chips, or both. This choice — made once and expensive to reverse — shapes everything about a company's economics, risk profile, and competitive position.
| Model | Definition | Economics | Examples | AI Winners/Losers |
|---|---|---|---|---|
| Fabless | Designs chips; outsources all fabrication to foundries | Asset-light; very high gross margins (50–78%); R&D intensive; no fab capex | Nvidia, AMD, Qualcomm, Apple Silicon, Broadcom (mostly), Marvell | Big winner — fabless model captured most AI design value; low fixed cost base amplifies margin leverage |
| Pure-Play Foundry | Fabricates chips for others; no proprietary chip designs | Extremely capex-intensive ($20–30B per fab); high gross margin at leading edge once scale achieved; long payback periods | TSMC (dominant), GlobalFoundries, SMIC (China) | Big winner — TSMC specifically; foundry model concentrates fab expertise, producing better yields than IDM fabs at equivalent nodes |
| IDM (Integrated Device Manufacturer) | Designs and fabricates own chips; may also sell foundry capacity | Highest capital intensity; lower margins than fabless for design; internal fab often runs below leading edge efficiency; strategic flexibility | Intel, Samsung (semi), Micron (memory IDM) | Mixed — Samsung benefits from memory IDM; Intel is the prominent casualty, struggling to compete at leading edge while carrying full fab cost structure |
| The fabless model's dominance in AI reflects a structural truth: design intelligence is the source of value; fabrication is the enabler. The separation of these two activities — pioneered by TSMC in the late 1980s — is one of the most consequential structural changes in the history of the technology industry. | ||||
Morris Chang's founding insight at TSMC in 1987 was that chip designers were constrained by having to manage fab operations, and that a dedicated foundry — with no competing chip designs — could serve all designers without conflict and invest purely in fabrication excellence. The result 35 years later: fabless companies like Nvidia can design the most complex chips in history without owning a single clean room, while TSMC runs fabs at yields and node advances no IDM has matched. The division of labour that Adam Smith described for pin-making applies, at extraordinary technological intensity, to semiconductors.
The Compute Architecture Battle — GPU vs ASIC vs CPU
The most consequential product decision in AI hardware is the choice of compute architecture. Training and running AI models requires processing enormous volumes of matrix multiplication — an operation that different chip architectures handle with very different efficiency, cost, and flexibility trade-offs. The dominant architecture today is the GPU. The challengers are custom ASICs. And the incumbent general-purpose CPU is now largely irrelevant for AI accelerator workloads at scale.
| Architecture | How It Works | Strengths for AI | Weaknesses | Market Position |
|---|---|---|---|---|
| GPU (General Purpose Graphics Processing Unit) | Massively parallel processor with thousands of small cores optimised for matrix operations. Originally designed for graphics; repurposed for AI compute. | Extreme flexibility — runs any AI workload without code rewrite. Massive software ecosystem (CUDA). Best-in-class tools and developer familiarity. Rapid iteration — new Nvidia generation every 12–18 months. | Power-hungry. Not maximally efficient for any specific task — optimised for breadth, not depth. Very expensive per unit (~$30–40K for H100). | ~80% AI accelerator market share (Nvidia). The default choice for training and general inference. |
| ASIC (Application-Specific Integrated Circuit) | Custom chip designed for one specific task. Google's TPU is designed specifically for TensorFlow/JAX matrix operations. Amazon Trainium for Transformer inference. | 10–30% more energy efficient than GPUs for the specific workload they target. Lower total cost of ownership at hyperscaler scale once volume justifies design investment. No royalty paid to Nvidia. | Inflexible — useless outside the target workload. Requires massive investment to design (>$500M). Takes 2–4 years from design to volume production. Limited software ecosystem. | Growing rapidly within hyperscalers (Google, Amazon, Microsoft, Meta all have custom silicon programs). ~10–15% of AI compute at hyperscalers currently; projected to grow to 25–30% by 2028. |
| CPU (Central Processing Unit) | Sequential general-purpose processor. The traditional data centre workhorse. AMD Epyc and Intel Xeon dominate data centre CPUs. | Unmatched for complex sequential logic. Essential for orchestration, data preprocessing, inference serving (small batches). Still required in every AI server alongside the GPU. | Fundamentally inefficient for the parallel matrix operations that dominate AI training and large-batch inference. GPU performs the same AI computation 100–1,000x faster per watt. | Remains necessary but is no longer the primary value driver. AMD is gaining CPU share from Intel. Arm-based CPUs (Ampere, AWS Graviton) growing for cloud workloads. |
| The architecture war is not zero-sum — all three will coexist. The question is market share at the margin and which architecture captures the incremental unit of AI spend as the market grows. | ||||
Nvidia's dominance in AI is not primarily a hardware story. It is a software story. CUDA — NVIDIA's parallel computing platform launched in 2006 — is the reason. When AI researchers began experimenting with GPU training in the early 2010s, they did so on CUDA. Every framework (TensorFlow, PyTorch), every library (cuDNN, cuBLAS), every optimised model was built for and tested on CUDA-enabled hardware. By the time AI demand exploded in 2023, the software ecosystem lock-in was so deep that switching away from Nvidia was not a hardware decision — it was a software, tooling, and retraining decision affecting every ML engineer in the world.
This is the key insight that gets lost in the GPU vs ASIC debate: Nvidia's moat is CUDA, not the H100. The H100 will be superseded by the H200, B100, B200, and whatever comes next. But the CUDA ecosystem — 4 million+ developers, 3,500+ GPU-accelerated applications, 15+ years of optimisation — is extremely hard to displace regardless of hardware competition.
As of 2024, PyTorch — the dominant AI research and production framework — shows CUDA as the default backend in over 95% of production deployments. AMD's ROCm (the CUDA competitor) has made meaningful progress but remains 2–3 years behind in software maturity and library optimisation. Google's JAX runs on TPUs natively but has ~5% of the production deployment base vs PyTorch/CUDA. The economic consequence: developers optimise for CUDA because that is where the tools work best; this creates demand for Nvidia hardware; this generates revenue that funds further CUDA investment. The flywheel has been spinning for 15 years and is now self-sustaining.
The custom silicon programs at Google (TPU v5), Amazon (Trainium 2, Inferentia 3), Microsoft (Maia 100), and Meta (MTIA) represent the most credible threat to Nvidia's market share. The economics are compelling at hyperscaler scale: a custom chip designed specifically for your inference workload can deliver 20–40% better performance-per-watt, which at a million-chip deployment translates into hundreds of millions of dollars annually in energy savings and foregone Nvidia procurement spend.
But the threat is structurally bounded. Custom ASICs require: (1) sufficient volume to justify $500M+ design investment; (2) workload stability — the chip must be designed for a workflow that will not change before the chip ships; (3) internal silicon engineering teams of 500–1,000+ engineers (rare outside the Big 4 hyperscalers); and (4) 2–4 years of lead time from design to production. These requirements confine viable ASIC programs to a handful of the world's most resource-endowed companies. The long tail of AI companies — thousands of enterprises, startups, and research institutions — will buy Nvidia for the foreseeable future.
The compute architecture battle is ultimately a question of software ecosystem durability, not hardware performance. Nvidia's moat is unusually deep precisely because it is not primarily a hardware moat — hardware can be copied or improved upon; 15 years of software ecosystem development cannot. The base case is gradual, manageable share erosion at the very largest customers, while Nvidia retains dominant share everywhere else. For investors, the relevant risk is not GPU displacement but Nvidia's ability to sustain 70%+ gross margins as supply normalises — the margin question is more pressing near-term than the share question.
The Fab Moat — Why TSMC Is Nearly Impossible to Replicate
TSMC is the most consequential company that most investors outside the technology sector have never deeply analysed. It fabricates approximately 90% of the world's most advanced semiconductors. It is the reason Nvidia's H100 exists, the reason Apple's M-series chips are competitive, and the reason the US government has spent $52 billion via the CHIPS Act trying to build domestic alternatives. Understanding why TSMC's moat exists — and why it is so hard to replicate — requires engaging with physics, economics, and organisational knowledge simultaneously.
TSMC's primary fabs are in Taiwan — a geopolitical fact that has become the semiconductor industry's most discussed and least resolved risk. Approximately 92% of the world's leading-edge (sub-5nm) chip production capacity is located in Taiwan as of 2024. The US CHIPS Act, Japan's semiconductor subsidies, and the EU Chips Act are all attempts to redistribute this concentration — but the pace of geographic diversification is structurally constrained by the same factors that created the concentration in the first place.
The geographic diversification of semiconductor manufacturing is slower and harder than policymakers publicly acknowledge. TSMC's Arizona fabs have faced significant challenges: skilled technician shortages, yield rates initially below Taiwan counterparts, construction delays, and cost overruns. The reported 50% cost premium for chips fabricated in Arizona vs Taiwan reflects these structural disadvantages — higher labour costs, less dense supplier ecosystems, and the absence of the deep talent pool that Taiwan's semiconductor education system has spent 40 years building. Intel's domestic US fabs, receiving the largest single CHIPS Act allocation (~$8.5B), remain behind TSMC at the leading edge despite a decade of aggressive investment. Meaningful geographic rebalancing of chip fabrication is a 10–20 year project, not a 3–5 year one.
TSMC's moat is among the most durable in the global economy precisely because it is multi-dimensional — physical, capital, organisational, and ecosystem-based simultaneously. Any single dimension could theoretically be addressed by a well-resourced competitor; the combination of all four is what makes catch-up take decades. For investors, the relevant debate is not "will TSMC's moat erode?" but "what is TSMC worth relative to its irreplaceability?" — a question that intersects with Taiwan geopolitical risk, capex cycle timing, and whether leading-edge chip demand grows fast enough to justify continued node investment. All three variables are in TSMC's favour through at least 2027.
Memory — HBM, DRAM, and the Bandwidth Bottleneck
Memory is the least glamorous part of the semiconductor value chain — and, since 2023, the most important bottleneck in the AI hardware stack. The reason is simple: AI inference and training are not compute-bound at the frequencies that matter; they are memory-bandwidth-bound. The limiting factor in running a large language model is not the speed at which the GPU performs matrix multiplications — it is the speed at which it can feed data to itself from memory. This is why HBM (High Bandwidth Memory) became the critical component, and why SK Hynix — the company that delivers it at volume — became arguably as important to the AI supply chain as TSMC.
| Memory Type | What It Is | Bandwidth | AI Role | Key Players | Investment Relevance |
|---|---|---|---|---|---|
| HBM3E (High Bandwidth Memory) | Stacked DRAM dies connected to GPU via silicon interposer. Physically mounted on the same package as the compute chip. | 1.2 TB/s per stack (H100: 3.35 TB/s total from 5 stacks) | The critical AI bottleneck. GPU performance limited by how fast HBM can supply weights and activations. More HBM = faster inference for large models. | SK Hynix (~50% share), Samsung (~30%), Micron (~20%) | High — HBM supply is currently constrained; SK Hynix has a 2-year technology lead; HBM ASPs are 5–8x standard DRAM. Tight supply expected through 2026. |
| DRAM (Standard) | Standard volatile memory used in servers, PCs, and phones. DDR5 is the current generation. | ~50–80 GB/s (DDR5) | Server memory for CPU workloads; smaller AI inference deployments. Less relevant for large model training but required in every server. | Samsung (~40%), SK Hynix (~30%), Micron (~25%) | Cyclical — DRAM markets swing between oversupply and shortage. AI demand adds a structural growth layer but does not eliminate the cycle. Samsung has historically been the price-setting swing producer. |
| NAND Flash | Non-volatile storage. Used for SSDs, data storage in data centres. | Low relative to DRAM | Training data storage; checkpoint storage. Not on the critical AI latency path. Demand grows with data centre scale-out but is less AI-specific than DRAM/HBM. | Samsung (~30%), SK Hynix (~20%), Kioxia (~18%), Micron (~12%), WD (~13%) | Lower near-term — NAND cycle is independent of the AI upcycle; oversupply has compressed margins since 2022. Recovery is underway but pricing remains below 2021 peak. |
| HBM content per GPU is growing with each generation: H100 has 80GB HBM3, H200 has 141GB HBM3E, B200 (Blackwell) has 192GB HBM3E. Each generation increase in HBM content per GPU amplifies the revenue opportunity for HBM suppliers without requiring proportional production capacity increases. | |||||
The reason HBM became scarce — and why SK Hynix captured a disproportionate share of AI memory value — is a story about a technology bet made years before the AI boom. In 2018–2020, SK Hynix made a strategic decision to invest heavily in HBM3 development, betting that the convergence of GPU compute and memory-intensive ML workloads would create demand that standard DRAM could not serve. Samsung, the larger company, hedged more conservatively. Micron focused on standard DRAM and NAND cycles.
When Nvidia began designing the H100 in 2020–2021, TSMC and Nvidia turned to SK Hynix as the only supplier with both the HBM3 technology and the production ramp capability at the necessary scale. By the time the H100 launched in 2022 and AI demand exploded in 2023, SK Hynix was effectively the sole-source supplier for the critical memory component of the world's most sought-after chip. The resulting pricing power — HBM3E sells at 5–8x the price of equivalent-capacity standard DRAM — has driven SK Hynix's DRAM gross margins to levels not seen in a decade.
Each successive generation of Nvidia's flagship AI GPU carries more HBM capacity: H100 (80GB) → H200 (141GB) → B200 Blackwell (192GB). The GB200 NVL72 rack system — Nvidia's most advanced AI infrastructure product — contains 13.5TB of HBM3E memory across 72 B200 GPUs, representing approximately 70kg of HBM stacks per rack. This is not just a technology progression: it is a structural revenue escalator for HBM suppliers. Even if Nvidia sells the same number of GPU units in 2025 as in 2024, the total HBM content — and total HBM revenue — grows automatically because each chip requires more of it.
| Company | HBM Share (2024 est.) | HBM Generation | Technology Gap vs Leader | Strategic Position |
|---|---|---|---|---|
| SK Hynix | ~50% | HBM3E in volume; HBM4 in development | Leader — sets the pace | Sole qualified HBM3E supplier for Nvidia H200/B200 at launch. 2-year technology lead over Samsung at the time of the AI boom. Benefits from the technology bet made in 2019–2021. Key risk: Samsung catching up on HBM4 for the next GPU generation. |
| Samsung | ~30% | HBM3E ramping; yield issues delayed market entry | ~12–18 months behind SK Hynix | Faced yield qualification failures for Nvidia H100/H200 HBM3E in 2023. Recovered with improved processes. Expected to gain HBM share in 2025 with Blackwell-era products. Scale advantage in DRAM production is a tailwind for cost-down. Key risk: continued yield lagging in advanced packaging for HBM stacks. |
| Micron | ~20% | HBM3E qualified; ramping for Nvidia | ~18–24 months behind SK Hynix at HBM entry | Qualified for Nvidia HBM supply in late 2024. Receives meaningful volume from Nvidia for Blackwell GB200. US-domiciled company benefits from CHIPS Act incentives and US government preference for domestic supply. Fastest-growing HBM share trajectory. Key opportunity: US customers (hyperscalers, defence) may preference Micron for supply chain security reasons. |
| HBM market share will shift with each new GPU generation qualification cycle. The current hierarchy (Hynix → Samsung → Micron) is not fixed — qualification for each new Nvidia platform resets the competitive landscape. | ||||
Memory is the AI hardware supply chain's most underappreciated value pool. HBM is structurally different from commodity DRAM: it is technology-intensive, supply-constrained, and growing in content per GPU with every product cycle. SK Hynix's current position is the result of a contrarian technology bet made when AI was not yet a consensus investment theme — a useful reminder that the most valuable positions in technology supply chains are often built years before the demand they serve materialises. The key forward question is whether SK Hynix's lead is durable enough to persist through the HBM4 transition, or whether Samsung's scale closes the gap on the next technology node.
Geopolitics & Cycle — The US-China Chip War, Taiwan Risk & Where We Are
The semiconductor industry has always been globally interdependent — chips designed in California, fabricated in Taiwan, assembled in Malaysia, sold worldwide. That interdependence is now the central theatre of the US-China strategic competition. Since 2019, the US has progressively restricted China's access to advanced semiconductors and the tools to make them. The restrictions have escalated with each administration, producing a supply chain fragmentation that is reshaping where chips are made, who can buy them, and what the long-run cost structure of the industry looks like.
| Year | Action | Effect |
|---|---|---|
| 2019 | Huawei Entity List addition — US suppliers require licences to sell to Huawei | TSMC, Qualcomm, Google cut Huawei supply; Huawei loses access to leading-edge process nodes for Kirin chips |
| 2020 | TSMC fabrication ban for Huawei — US foreign direct product rule extended to TSMC | Huawei loses ability to manufacture its own mobile chips; Kirin 9000 is last leading-edge SoC |
| 2022 | Comprehensive export controls — ban on sale of advanced chips (>A100 performance) and related equipment to China | Nvidia forced to develop downgraded China-specific products (A800, H800); ASML restricted from shipping DUV machines to China |
| 2023 | Controls extended — H800 and A800 also banned; 40+ Chinese chip entities added to Entity List | Nvidia H20 (further downgraded) becomes the China-compliant product; ~$12–15B of annual Nvidia revenue redirected |
| 2023–24 | Huawei Mate 60 Pro launch with SMIC-fabricated 7nm chip (Kirin 9000S) | Demonstration that China can make near-leading-edge chips domestically, though at lower yields and higher cost; significant shock to US policymakers |
| 2025 | H20 banned outright; BIS rules tightened further; ASML DUV sales to China halted | Near-total severance of China from US-ecosystem advanced chips; Chinese companies redirecting investment to self-sufficiency |
| The US-China chip war is one of the most consequential structural forces in the semiconductor industry. Its effects compound over time — each restriction accelerates China's domestic investment, which creates long-run supply competition even if near-term Chinese capabilities remain limited. | ||
The geopolitics section in v1 of this document documented what happened chronologically but did not commit to an analytical view on what it means. That is an evasion. The US-China chip restrictions produce two materially different futures with genuinely different investment implications, and a serious framework should be explicit about which is more probable and what the second-order effects of each are.
Scenario A is assigned higher probability not because China lacks the intent or capital — it demonstrably has both — but because the physics of advanced semiconductor manufacturing impose real constraints that money alone cannot quickly solve. EUV is not just an export-controlled product; it is a system requiring 100,000+ precision components, many of which have their own supply chains that are also export-controlled. Reproducing it from scratch is a decade-plus project even with full state resources. However, Scenario B deserves 40% weight because history repeatedly shows that technology restrictions accelerate rather than prevent indigenous development when the prize is large enough. China's semiconductor industry in 2025 is more capable than US policymakers assumed it would be in 2019. Extrapolating that trajectory is not wishful thinking — it is reading the evidence.
One structural shift that the chip architecture analysis in Section 05 raised but did not fully resolve: the workload mix is shifting from training to inference, and this shift has different implications for the competitive landscape than a simple "more AI = more Nvidia" extrapolation.
Training workloads are large, parallelisable, and batch-processed. They run for days or weeks on clusters of thousands of GPUs. Flexibility across model architectures matters because researchers iterate frequently. This is Nvidia's home turf — CUDA + H100/B100 is optimised for exactly this workload.
Inference workloads are different in character: they require low latency (milliseconds, not minutes), high throughput at lower batch sizes, and relentless cost-per-token optimisation. A frontier LLM serving 2.5 billion queries per day is an inference problem, not a training problem. The optimal chip architecture for inference prioritises energy efficiency and memory bandwidth utilisation over raw floating-point performance. This is where the architecture competition opens up:
| Workload | % of AI Compute (2024 est.) | % by 2030 (est.) | Optimal Architecture | Competitive Implication |
|---|---|---|---|---|
| Training (frontier models) | ~70–80% | ~25–30% | GPU clusters (Nvidia H/B series) | Nvidia dominant; ASIC alternatives limited by flexibility requirements during research iteration |
| Inference (production serving) | ~20–30% | ~70–75% | Specialised inference ASICs, efficient GPUs, edge chips | Most competitive battleground — Google TPU, Amazon Inferentia, Groq LPU, and Nvidia's own inference-optimised products all competing; CUDA advantage is weaker here |
| Fine-tuning / adaptation | Small but growing | Meaningful share of enterprise AI | Mid-range GPUs; cloud-based fine-tuning services | Nvidia H100 overkill for most fine-tuning; AMD MI300 gaining traction for cost efficiency |
| The inference growth projection is the single most important structural variable for the next phase of the AI chip market. Inference-optimised products — including Nvidia's own L40S and the upcoming inference-specific lines — will be the primary battleground for chip market share through 2028. | ||||
The v1 document mentioned hyperscaler custom ASICs as a competitive threat to Nvidia. The fuller picture is more structural: hyperscalers are not just building chips — they are vertically integrating across the entire stack in a way that progressively reduces their dependence on third-party vendors at every layer.
Google designs its own TPUs, runs them in its own data centres, on its own grid-connected power infrastructure, serving its own model (Gemini) through its own cloud platform. The only external dependencies are TSMC (fabrication) and ASML (via TSMC). Amazon has Trainium chips, Graviton CPUs, its own DC construction capability, AWS Outposts for edge, and long-term nuclear PPAs. Microsoft has Maia ASICs, Azure infrastructure, and the Constellation Energy nuclear PPA. Meta has MTIA inference chips, its own massive DC build programme, and is the largest single customer for Nvidia — while simultaneously building the alternative.
This vertical integration dynamic is not a short-term threat to chip layer economics — the volumes required and the complexity of the transition mean hyperscalers will remain large Nvidia customers for years. But it is the structural force that will gradually compress Nvidia's pricing power at the hyperscaler segment and push Nvidia's growth increasingly toward the enterprise/cloud-customer segment where CUDA lock-in is strongest and custom silicon economics are least compelling.
| Segment | Current Cycle Position | Key Risk | Outlook 2025–2027 |
|---|---|---|---|
| AI GPUs (Nvidia) | Supply-constrained through 2025; Blackwell ramp expected to ease shortage in 2026 | Hyperscaler capex moderation; ASIC share shift at margin | Strong demand, supply normalising; pricing power moderates from peak but remains elevated. Margin risk is compression from current ~75% gross margin toward ~65–68%. |
| HBM Memory | Severely supply-constrained; SK Hynix fully allocated through 2025 | Samsung qualification closing gap; next-gen HBM4 resets pecking order | Tight through 2025; modest easing in 2026 as Samsung/Micron ramp. ASP premium likely to compress but remains well above standard DRAM. SK Hynix maintains lead on HBM4 timing. |
| Advanced Logic (TSMC 3nm/2nm) | Fully allocated; Nvidia, Apple, AMD competing for capacity | Arizona ramp slower/costlier than expected; geopolitical Taiwan risk | TSMC pricing power intact through at least 2026. 2nm node ramp (2025–2026) expected to sustain premium. Long-term Taiwan geopolitical risk is an investor concern but not a near-term operational constraint. |
| Standard DRAM | Recovering from 2022 oversupply; prices improving in 2024 | HBM capacity conversion reducing standard DRAM supply | Structural support from server demand and AI-adjacent server builds. Samsung's swing capacity now partially committed to HBM, tightening standard DRAM supply. |
| NAND Flash | Still recovering; prices below 2021 peak | Oversupply risk if data centre storage demand disappoints | Weaker than DRAM/HBM near-term. Recovery in 2025 partially underway but NAND is less directly AI-driven. Watch Samsung capex discipline as the key variable. |
| Cycle views as of early 2026. Semiconductor cycle dynamics shift rapidly; these assessments are directional, not precise forecasts. The most important variable is hyperscaler capex guidance, which provides the most reliable forward signal for AI chip demand. | |||
The paradox at the centre of AI hardware investing is this: the companies with the most durable structural moats (ASML, TSMC) are also the most geopolitically exposed. The company with the most visible near-term earnings power (Nvidia) faces the most uncertain long-run competitive structure as inference growth shifts architecture dynamics. And the memory companies (SK Hynix, Micron) that benefit most from the AI cycle also carry the highest cyclicality risk when the upcycle eventually normalises. There is no position in the AI hardware stack that captures structural moat, cyclical safety, and geopolitical insulation simultaneously. The investment framework should therefore be portfolio-level: own the moat (ASML, TSMC) for structural compounding; own the cycle leader (Nvidia) for near-term earnings; own the memory optionality (SK Hynix, Micron) for cycle leverage; own the power constraint (Vertiv, Hitachi Energy, Constellation) for the next bottleneck — with explicit position sizing to reflect each risk dimension.
Across the sections of this study, a clear hierarchy of competitive positioning has emerged. The moat scores below reflect structural durability (not near-term earnings), assessed across switching cost, replication difficulty, and time to catch up.
The AI hardware value chain is not a sector — it is a collection of structurally distinct businesses connected by supply chain relationships. ASML and TSMC are infrastructure monopolies compounding quietly. Nvidia is a platform business with a software moat dressed up as a hardware company, facing a gradual workload-mix headwind as inference growth reshapes optimal chip architecture. SK Hynix and Micron are technology-cycle businesses with near-term tailwinds but long-run cyclicality. The power infrastructure layer — transformers, cooling, nuclear baseload — is the emerging constraint that most chip-focused AI frameworks have underweighted. The unifying insight across all layers: the binding physical constraint is always where the value goes, and that constraint migrates as each bottleneck is relieved.
What Breaks the Thesis — Tail Risks and Architectural Discontinuities
A rigorous investment framework must ask not just what makes the thesis work but what would cause it to fail entirely. The AI hardware thesis — own the physical constraint, compound with the moat — rests on a set of assumptions that are well-grounded in current evidence but are not immutable laws. The following are the scenarios that would materially break the thesis rather than merely slow it down. They are assigned low probability but deserve explicit framing because their consequences are high enough to warrant position sizing that accounts for them.
None of these four risks is high probability over a 3–5 year investment horizon. But their combination — and the correlation between them in a stress scenario (a Taiwan contingency simultaneously triggers a capex freeze and an accelerated push for alternative architectures) — means that concentrated positions in AI hardware warrant explicit tail risk management. The practical implication is not to avoid the sector but to size positions such that the portfolio can survive a 40–60% drawdown in AI hardware names without permanent capital impairment. The structural thesis is intact; the entry price and position sizing matter as much as the thesis quality.
The AI hardware thesis is strong — the physical constraints are real, the moats are deep, and the demand drivers are secular. But strong theses break in specific, identifiable ways, and intellectual honesty requires naming them rather than burying them in footnotes. The most actionable of the four risks is the capex air pocket — it is the most near-term, the most monitorable (hyperscaler capex guidance is public), and the one most likely to produce a buying opportunity rather than a thesis break. The architectural discontinuity and Taiwan risks are genuine but long-dated and unactionable as trading signals. Size for them; do not obsess over timing them.
The semiconductor industry has always been globally interdependent — chips designed in California, fabricated in Taiwan, assembled in Malaysia, sold worldwide. That interdependence is now the central theatre of the US-China strategic competition. Since 2019, the US has progressively restricted China's access to advanced semiconductors and the tools to make them. The restrictions have escalated with each administration, producing a supply chain fragmentation that is reshaping where chips are made, who can buy them, and what the long-run cost structure of the industry looks like.
| Year | Action | Effect |
|---|---|---|
| 2019 | Huawei Entity List addition — US suppliers require licences to sell to Huawei | TSMC, Qualcomm, Google cut Huawei supply; Huawei loses access to leading-edge process nodes for Kirin chips |
| 2020 | TSMC fabrication ban for Huawei — US foreign direct product rule extended to TSMC | Huawei loses ability to manufacture its own mobile chips; Kirin 9000 is last leading-edge SoC |
| 2022 | Comprehensive export controls — ban on sale of advanced chips (>A100 performance) and related equipment to China | Nvidia forced to develop downgraded China-specific products (A800, H800); ASML restricted from shipping DUV machines to China |
| 2023 | Controls extended — H800 and A800 also banned; 40+ Chinese chip entities added to Entity List | Nvidia H20 (further downgraded) becomes the China-compliant product; ~$12–15B of annual Nvidia revenue redirected |
| 2023–24 | Huawei Mate 60 Pro launch with SMIC-fabricated 7nm chip (Kirin 9000S) | Demonstration that China can make near-leading-edge chips domestically, though at lower yields and higher cost; significant shock to US policymakers |
| 2025 | H20 banned outright; BIS rules tightened further; ASML DUV sales to China halted | Near-total severance of China from US-ecosystem advanced chips; Chinese companies redirecting investment to self-sufficiency |
| The US-China chip war is one of the most consequential structural forces in the semiconductor industry. Its effects compound over time — each restriction accelerates China's domestic investment, which creates long-run supply competition even if near-term Chinese capabilities remain limited. | ||
China's response to US chip restrictions has been a massive, state-directed investment in domestic semiconductor capability. The numbers are significant: over $150B in committed government funding for semiconductor industry development through 2030, channelled through the China Integrated Circuit Industry Investment Fund (the "Big Fund"). SMIC — China's leading foundry — has been ramping 7nm production using DUV lithography (not EUV) through multi-patterning techniques that achieve similar geometry at lower yield and higher cost. Huawei has rebuilt its chip design capabilities from scratch, developing its own AI accelerators (Ascend series) as a CUDA-alternative ecosystem.
The honest assessment: China is building a parallel semiconductor ecosystem that is 3–5 years behind the global frontier today, cannot make sub-5nm chips without EUV, and operates at substantially higher cost per wafer than TSMC. But it is building — and the restrictions are accelerating rather than stopping that effort. The long-run scenario where China has a credible 5nm foundry by 2030 is not implausible, even if it requires continued sacrifice of economic efficiency for strategic independence.
| Segment | Current Cycle Position | Key Risk | Outlook 2025–2027 |
|---|---|---|---|
| AI GPUs (Nvidia) | Supply-constrained through 2025; Blackwell ramp expected to ease shortage in 2026 | Hyperscaler capex moderation; ASIC share shift at margin | Strong demand, supply normalising; pricing power moderates from peak but remains elevated. Margin risk is compression from current ~75% gross margin toward ~65–68%. |
| HBM Memory | Severely supply-constrained; SK Hynix fully allocated through 2025 | Samsung qualification closing gap; next-gen HBM4 resets pecking order | Tight through 2025; modest easing in 2026 as Samsung/Micron ramp. ASP premium likely to compress but remains well above standard DRAM. SK Hynix maintains lead on HBM4 timing. |
| Advanced Logic (TSMC 3nm/2nm) | Fully allocated; Nvidia, Apple, AMD competing for capacity | Arizona ramp slower/costlier than expected; geopolitical Taiwan risk | TSMC pricing power intact through at least 2026. 2nm node ramp (2025–2026) expected to sustain premium. Long-term Taiwan geopolitical risk is an investor concern but not a near-term operational constraint. |
| Standard DRAM | Recovering from 2022 oversupply; prices improving in 2024 | HBM capacity conversion reducing standard DRAM supply | Structural support from server demand and AI-adjacent server builds. Samsung's swing capacity now partially committed to HBM, tightening standard DRAM supply. |
| NAND Flash | Still recovering; prices below 2021 peak | Oversupply risk if data centre storage demand disappoints | Weaker than DRAM/HBM near-term. Recovery in 2025 partially underway but NAND is less directly AI-driven. Watch Samsung capex discipline as the key variable. |
| Cycle views as of early 2026. Semiconductor cycle dynamics shift rapidly; these assessments are directional, not precise forecasts. The most important variable is hyperscaler capex guidance, which provides the most reliable forward signal for AI chip demand. | |||
The paradox at the centre of AI hardware investing is this: the companies with the most durable structural moats (ASML, TSMC) are also the most geopolitically exposed. The company with the most visible near-term earnings power (Nvidia) faces the most uncertain long-run competitive structure. And the memory companies (SK Hynix, Micron) that benefit most from the AI cycle also carry the highest cyclicality risk when the upcycle eventually normalises. There is no position in the AI hardware stack that captures structural moat, cyclical safety, and geopolitical insulation simultaneously. The investment framework should therefore be portfolio-level: own the moat (ASML, TSMC) for structural compounding; own the cycle leader (Nvidia) for near-term earnings; own the memory optionality (SK Hynix, Micron) for cycle leverage — with explicit position sizing to reflect each risk dimension.
Across the six sections of this study, a clear hierarchy of competitive positioning has emerged. The moat scores below reflect structural durability (not near-term earnings), assessed across switching cost, replication difficulty, and time to catch up.
The AI hardware value chain is not a sector — it is a collection of structurally distinct businesses connected by supply chain relationships. ASML and TSMC are infrastructure monopolies compounding quietly. Nvidia is a platform business with a software moat dressed up as a hardware company, facing a gradual workload-mix headwind as inference growth reshapes optimal chip architecture. SK Hynix and Micron are technology-cycle businesses with near-term tailwinds but long-run cyclicality. The power infrastructure layer — transformers, cooling, nuclear baseload — is the emerging constraint that most chip-focused AI frameworks have underweighted. The unifying insight across all layers: the binding physical constraint is always where the value goes, and that constraint migrates as each bottleneck is relieved.