Ornn

On Sovereign Compute

Ornn — Wed, 25 Mar 2026 19:41:17 GMT

In the world of compute, a lot has changed in the last twelve months. The U.S. imposed, and then reversed, an export ban on chip design software within a six week span. China retaliated against semiconductor restrictions by freezing rare earth shipments. Over 100 countries signed a declaration committing to AI sovereignty. And governments across the world collectively pledged hundreds of billions of dollars to build domestic chip capacity they don’t yet have. Clearly, the compute landscape is changing fast.

What ties all of this together is a single underlying shift. Compute, the GPU infrastructure that powers artificial intelligence, is no longer just a commercial input. It is now a sovereign commodity. Importantly, this isn’t unprecedented — it follows a pattern that has repeated across many commodities as soon as countries begin viewing them as critical to sovereignty.

Indeed, when a critical resource becomes geopolitically contested, governments move to secure access, and the market structure around that commodity is permanently reshaped. Oil followed this arc in the 1970s, rare earths followed it in the 2010s, and liquefied natural gas followed it over two decades. Compute is following the same pattern right now.

Compute is Now a Sovereign Resource

Regardless of political system, ideology, or geography, governments are converging on the conclusion that compute is a strategic reserve. The policies driving this shift span the semiconductor industry broadly, but the most consequential battleground is advanced AI chips — the GPUs and accelerators that power machine learning, and the memory and packaging infrastructure required to build them.

In the U.S., the CHIPS and Science Act appropriated $52.7 billion to boost domestic semiconductor manufacturing, including $39 billion in direct manufacturing subsidies and a 25% investment tax credit. Simultaneously, export controls on advanced AI chips to China have been tightened across successive rounds since October 2022. The Biden administration proposed the AI Diffusion Rule in January 2025, which the Trump administration rescinded before replacing it with a case-by-case licensing framework in January 2026, allowing limited H200 exports under a 25% fee and security review structure. Enforcement has escalated in parallel — in early 2026, the Bureau of Industry and Security (BIS) reached a $252.5 million settlement with Applied Materials for unauthorized semiconductor equipment transfers to China, while the DOJ announced Operation Gatekeeper, disrupting over $160 million worth of AI chip exports. In support of this, Congress approved a 23% increase in the Bureau of Industry and Security’s enforcement budget for the 2026 fiscal year.

China, in response, is accelerating its drive toward semiconductor self-sufficiency, backed by tens of billions in state-directed investment funds. Researchers at Peking University announced a 2D transistor operating 40% faster than TSMC’s 3nm devices, and DeepSeek demonstrated frontier-level AI capabilities built with constrained compute access. The European Union responded to both the U.S. and China as well, mobilizing roughly €43 billion through its EU Chips Act to double Europe’s share of global chip production by 2030.

The pattern extends well beyond the major global powers — Canada has announced a $2 billion sovereign compute strategy, India is building a national compute pool exceeding 38,000 GPUs, and Japan has committed ¥10 trillion in AI infrastructure investment through 2030. Additionally, Saudi Arabia, the UAE, France, and Germany have each launched sovereign AI infrastructure programs. And, as mentioned, over 100 countries adopted a declaration to pursue AI sovereignty at the AI for Developing Countries Forum in February.

The convergence on sovereign compute is unmistakable.

The Concentration Problem

Governments are increasingly viewing compute as a critical commodity because the AI chip supply chain is concentrated at every major chokepoint to a degree that is, at the very least, historically unusual for any commodity of this importance.

TSMC fabricates roughly 90% of the world’s most advanced logic chips — that is, chips at the sub-5nm nodes that power AI training and inference — while commanding close to a 75% share of the overall foundry market. Nearly all of this capacity sits in Taiwan, a small island roughly 100 miles off the coast of mainland China that both the U.S. and China consider strategically vital.

The risks of concentration are not theoretical. A confidential report commissioned in 2022 by the Semiconductor Industry Association and prepared by McKinsey found that cutting the supply of AI chips from Taiwan would trigger the largest economic crisis since the Great Depression, with U.S. economic output falling roughly 11% and China’s declining 16%. Bloomberg Economics estimated the global cost of conflict in Taiwan at over $10 trillion. Senior U.S. intelligence officials, including the CIA director and the Director of National Intelligence, have delivered classified briefings to the CEOs of Apple, NVIDIA, AMD, and Qualcomm specifically about this risk. And, in public, Treasury Secretary Scott Bessent has called Taiwan’s chip concentration “the single biggest point of single failure” for the world economy.

Importantly, chip fabrication is not the only chokepoint. High Bandwidth Memory (HBM), which has become the main price-setting input for compute (accounting for over 60% of the production cost of NVIDIA’s B200 GPU), is produced by just three companies: Samsung, Micron, and SK Hynix. At least one of these major suppliers has confirmed that its entire 2026 HBM output is already fully priced and volume-locked, even as the HBM market is projected to nearly triple from $35 billion in 2025 to $100 billion by 2028. As such, because supply is concentrated, pre-committed, and repriced at discrete rather than continuous contract boundaries, the conditions for sudden cost shocks in deploying compute at scale are present.

Therefore, a single disruption at any one of these nodes would cascade through the entire global compute supply chain within weeks, if not days. Historically, this level of geographic and supplier concentration for commodities that matter to national security has always triggered sovereign action. Compute will not be an exception.

From Embargo to Exchange

The pattern compute is currently entering — one in which critical commodities become geopolitically contested, governments intervene, and market structures thus transform — is nothing novel. Many historical episodes follow this pattern, but three modern examples illustrate it especially well.

[1] Oil, 1970s-80s. Before 1973, global oil pricing was controlled by a small group of major companies — the so-called “Seven Sisters” — through long-term bilateral contracts. Pricing was opaque and relationship-driven. Then, the 1973 Arab oil embargo began, weaponizing supply and causing prices to quadruple almost overnight. The crisis exposed how vulnerable the global economy was to a commodity controlled by a small number of stakeholders willing to use supply as a political instrument.

Within a decade, the NYMEX WTI crude futures contract launched, fundamentally changing how oil was priced globally. Before futures existed, industry participants argued that oil was too complex and relationship-driven for standardized financial products. But the oil crisis proved otherwise — when price volatility became intolerable, financial infrastructure emerged as a necessity.

[2] Rare Earth Minerals, 2010. China controlled approximately 97% of global rare earth production when, in 2010, it restricted exports following a diplomatic dispute with Japan. As a result, prices of key rare earth elements spiked by a factor of ten or more within months. Geographic concentration of supply in a single jurisdiction had made the entire global supply chain vulnerable to a political decision by one government. This crisis triggered national stockpiling programs worldwide and diversification efforts, including new mining operations in the U.S. and Australia.

The rare earth crisis demonstrated that even commodities that most people have never heard of can become flashpoints when supply is concentrated enough, and that once governments recognize this vulnerability, the response is swift, expensive, and permanent. The parallel to compute is striking.

[3] Liquefied Natural Gas, 2000s-2020s. For decades, LNG was traded exclusively through long-term bilateral contracts — 20-year deals negotiated behind closed doors, with pricing often indexed to oil. There was no transparent spot market and no independent benchmark. As a result, buyers were locked in with minimal flexibility.

Starting in the 2010s, oversupply from U.S. shale gas production and new export terminals, combined with buyer resistance to rigid contract structures, drove the gradual emergence of spot trading and eventually benchmark pricing with the entrance of the Japan Korea Marker (JKM) in Asia and the Title Transfer Facility (TTF) in Europe. Importantly, the catalyst for LNG’s financial evolution was, unlike oil and rare earths, not a single crisis but a structural change — new supply entrants and buyers who demanded flexibility broke the old model. In other words, financial infrastructure came to the LNG market because it had grown too large and too important to operate without transparent price discovery.

The throughline across all three of these historical examples is consistent. Whenever nations begin contesting a critical commodity, supply volatility spikes and governments intervene in the market to secure sovereign access. That intervention compounds price volatility through stockpiling, trade restrictions, and fragmented markets until financial infrastructure becomes a necessity. And while the timeline often varies — oil’s transformation took a decade, LNG’s took two, and rare earths triggered an immediate scramble — the endpoint is always the same: benchmarks, price discovery, and risk-transfer mechanisms emerge because the market cannot function without them.

Compute’s Missing Market

Compute is now well into this arc. Governments are already restricting trade through escalating export controls, stockpiling capacity through national GPU reserves, and subsidizing domestic production through the CHIPS Act, EU Chips Act, and sovereign compute programs spanning dozens of other countries. Moreover, the sources of volatility are only multiplying and becoming increasingly driven by governments rather than free markets. Entire national markets can be opened or closed to specific chip products on the basis of a single government decision.

But unlike oil in the early 1980s or LNG in the 2010s, compute’s financial infrastructure barely exists right now. There is no universally trusted transparent benchmark for the price of a GPU-hour,1 no liquid hedging market, and no standardized mechanism for managing the price risk that comes from supply disruptions.

Thus, the gap between how geopolitically contested compute has become and how underdeveloped its financial infrastructure remains is historically unusual. In every other commodity that followed this arc, financial tools (i.e. benchmarks, futures, hedging instruments) emerged in direct response to exactly this kind of volatility. If anything, compute is behind schedule.

Implications for Market Participants

For data center operators and AI companies, sovereign compute policies are fragmenting the global market. Compute pricing increasingly varies by jurisdiction, not just by hardware and provider. And when an export control decision can shift overnight, budgeting and procurement become exercises in geopolitical risk management, not just supply chain logistics.

For lenders and investors financing AI infrastructure, geopolitical risk is now embedded in the collateral. A GPU cluster’s economics can change not because of technology cycles but because of a trade policy shift. This is a category of risk that traditional equipment finance and infrastructure lending frameworks were not built to capture.

Lastly, for the market more broadly, the price of compute in one jurisdiction may diverge sharply from another as supply becomes politicized and fragmented across national boundaries. And, in this environment, independent benchmarks and risk-transfer mechanisms become more valuable, not less.

Looking Forward

Compute is following a well-documented historical arc. When nations begin treating a commodity as a matter of sovereignty, price volatility increases, supply becomes less predictable, and the absence of financial infrastructure goes from inconvenient to untenable. Compute is going through this now, but faster, because the capital deployed is larger, the supply chain is more concentrated, and the geopolitical stakes are arguably higher.

Ultimately, the structural forces — everything from export controls to sovereign buildouts — are already in motion and accelerating on every front. As such, the question now is not whether the financial infrastructure for compute will emerge, but whether it will emerge fast enough to keep pace with the geopolitical forces reshaping the market.

Selected Sources:

Subscribe now

Contributors: ,

Ornn’s indices are the first attempts at these benchmarks. See them at data.ornnai.com

The Basis Gap in Compute Acquisition Structures

Ornn — Tue, 27 Jan 2026 18:37:11 GMT

Why Basis Risk Defines Every Procurement Decision

Every decision to procure GPUs or compute capacity in today’s AI landscape is, at its core, a risk-reward trade-off - a strategic decision on how to bundle and allocate financial exposures among the parties involved: the customer / workload owner, the operator, the lessor / financier, the provider, or the broader market. In this piece, the decision-maker is the compute fleet owner - the entity responsible for ensuring GPUs are available to run workloads and absorbing the economic consequences if capacity, pricing, or hardware assumptions prove wrong. Buying, leasing, and reserving capacity each represent a different way of packaging these risks, with no “free lunch” - you’re always choosing who ends up holding the downside when things don’t go exactly as planned.

The procurement economics hinge primarily on three key risks:

Spot / replacement risk - the chance that market prices spike or better hardware becomes available when the compute fleet owner is locked in to older terms, negatively affecting their unit economics due to higher refresh costs or outdated performance. This risk includes two distinct but related dynamics: (i) exposure to market pricing when capacity can’t be re-procured quickly, and (ii) exposure to hardware performance curves when new generations improve the performance-per-dollar.
Utilization risk (“short utilization”) - paying for capacity that goes unused. This is one of the most common hidden components - costs are fixed or committed but demand and workload timing is uncertain. They are “short utilization” not simply because lower utilization is undesirable, but because the downside from under-utilization is asymmetric and nonlinear: when utilization falls, fixed costs are spread across fewer GPU-hours and causing unit costs to rise sharply while upside from higher utilization is capped by total available capacity. For example, if the compute fleet owner commits $1,000 for 400 GPU-hours but only runs 200, their effective cost jumps to $5 / hour from the original $2.50 / hour. In many agreements this is explicit via minimum commits or take-or-pay terms, meaning the fleet owner owes the dollars whether the workload shows up or not.
Residual value risk - what the GPU or cluster is worth at the end of its term. If the compute fleet owner owns hardware, it’s the literal resale value; if they don’t own hardware, the risk still exists economically through contract terms via optionality or locking you into a particular spec. It is dependent on timing, location, and tech cycles.

The key concept that ties these together - explaining how “locking in A” is not the same as “locking in B” - and the north star of this article, is the idea of basis risk:

Basis = (what a contract guarantees) - (what a workload actually needs)

When the gap widens, the compute fleet owners pays the difference. You can be “hedged” on paper but still wrong in practice - basis risk is why two seemingly identical deals can end up costing vastly different.

Compute fleet owners have valid, varying objectives, whether minimizing upfront capital, maximizing uptime certainty, or retaining asset control, and no single structure is inherently superior. Basis risk, however, is the common thread that cuts across all of them, turning a theoretically sound choice into an expensive one when the contracted guarantee diverges from actual need. This risk-trading lens explains why procurement feels high stakes today, as rapid obsolescence, volatile demand, and massive capex mean small contract mismatches can explode into large P&L outcomes.

The Commitment vs. Reservation Split

Although commitment and reservation are frequently discussed in the same breath as ways to lock in compute, they serve different purposes - price and deliverability.

A commitment is a pricing contract. The capacity buyer commits to a minimum spend or usage level (e.g. 50% off spot rates for 3 years) in exchange for a lower rate. It’s a hedge that reduces the buyer’s unit rate, but doesn’t necessarily guarantee that exact capacity will be available where or when the workload needs it. Reservation is an availability contract. The capacity buyer reserves a determined pool of capacity (specific GPUs, region, spec, etc.) so it is there when the workload needs to run. It’s a hedge on deliverability, but usually at full or near-market rates with no built-in discount. The split is where the basis risk starts, as the buyer can win on one leg and still lose on the other.

Two quick examples:

A mid-tier customer looking to buy capacity commits to discounted H100s at $1.50 / hr vs $2.50 / hr spot (figures are purely illustrative). The price looks solid, but a regional shortage means the buyer can’t actually get the GPUs during the launch window. They’re forced to either delay the launch or burst onto on-demand elsewhere at $2.50 / hr - so the “discount” only applies to capacity they can’t access when it matters.

A customer wanting to buy capacity reserves a block of capacity - say a specific region and networking tier - to guarantee availability. A few months later, the workload shifts to require a different GPU class. The reserved block is now the wrong deliverable. The customer still pays for it (or eats penalties), but to run the workload they now also have to procure new capacity in the correct GPU class / spec (and sometimes region) - effectively paying twice: once for stranded reserved capacity and again for incremental spot or new reservations.

The bottom line: commitment without reservation leaves the capacity buyer exposed to being right on price but wrong on access. Reservation without commitment leaves the buyer exposed to being right access but wrong on the workload’s actual deliverable. Either way, the mismatch shows up as basis, because what the contract guarantees is now what the workload ends up needing.

Three Structures as Synthetic Risk Positions

Buying, leasing, and reserving are three different contracts with three different failure modes. The structures look different on paper, but economically, they behave like synthetic positions: each one leaves the compute fleet owner long and someone short by key variables, and the spread between what’s contracted and needed shows up as basis.

Here’s a quick table to help the exposures resonate:

A quick intuition on the “long” and “short” framing: this is simply operating leverage. When capacity costs are fixed or committed, higher utilization improves unit economics gradually, but lower utilization hurts disproportionately. The downside from under-utilization is larger than the upside from over-utilization. Consider a hotel with $80,000 of fixed monthly costs and 100 rooms. At 80% occupancy, costs average $100 per room-night. If occupancy rises to 100%, costs fall to $80 per room-night. But if occupancy falls to 60%, costs rise to $133 per room-night. A 20-point increase improves unit economics modestly, while a 20-point decrease worsens them much more. GPU fleets behave the same way: fixed capacity costs are spread across fewer GPU-hours when utilization falls, which is why the exposure is described as “short utilization.”

Owning hardware is the cleanest legally but the messiest economically because the fleet owner takes the full stack. The fleet owner is short utilization as costs are fixed while demand is uncertain, and long residual value because they own the asset’s future price - whether it holds or collapses. If the market reprices the hardware curve or if a new generation of hardware slashes the $ / FLOPs faster than expected, costs rise and resale values fall. Basis occurs as the deliverable the fleet owner bought - whether GPUs, interconnects, or clusters - doesn’t match the evolved workload. Ownership then forces the owner to absorb the mismatch as your problem to solve.

Leasing may shift the ownership but it doesn’t remove the exposures that dominate the day-to-day economics: utilization and cost of capital. The compute fleet owner remains short utilization as payments are fixed but revenue is not, so keeping busy is imperative. The primary difference is that residual value risk is pushed toward the financier / lessor, and procurement costs can inherit rate sensitivity. Even if GPU economics are solid, lease economics can worsen when the stack reprices, proving why the structure of the lease matters. Look at the triple-net (NNN) concept, for example: the lessor wants a clean yield stream backed by the asset, while the lessee absorbs operating variability (power, maintenance, hosting) and execution risk. We can see this logic in large AI infrastructure financings - e.g., Apollo’s January 2026 $3.5 billion capital solution backing Valor Equity Partners $5.4 billion acquisition and triple-net lease of NVIDIA GB200 GPUs to xAI. What’s important here is the allocation: leases force the operator to carry utilization risk while the capital stack prices residual value and credit, and it creates a new basis channel if rates move or residuals underperform.

Reserved capacity changes the feel of the contract as the capacity buyer is explicitly buying delivery. This delivery tends to be defined by a region / zone and an instance / GPU class, occasionally with a performance or networking tier baked in. When the buyer reserves capacity, they lock in the shape of what is being purchased - GPU type, region, spec - so they are trading off flexibility for clarity. As a result, this creates a technology lock-in: if a workload shifts region or the “best-fit” GPU changes as the work advances, the pivot to new capacity brings friction (penalties, repricing). The basis failure here is “right on capacity, wrong on fit”, as the capacity buyer can end up paying for reserved capacity that doesn’t match the workload, while still needing to procure incremental capacity elsewhere.

No single structure solves everything; each one simply redistributes the risks in its own way. The meaningful question is which redistribution best matches the compute fleet owner’s tolerance for the basis that will inevitably appear when the hardware, workloads, or utilization diverge from what was contracted. In practice, that alignment is what separates sustainable economics from gradual erosion.

The Institutional Financing Stack

When GPU fleets scale to thousands of units across sites, procurement becomes structured finance. The workhouse tool is the special purpose vehicle (SPV): a separate legal entity created to own a defined pool of GPUs (or full cluster), borrow against that pool, and keep the debt / risk isolated from the compute fleet owner’s main balance sheet.

For example, Lambda Labs structured a $500M loan in 2024 as a GPU-backed asset-backed securitization (ABS) via a dedicated SPV. The SPV held the physical Nvidia chips plus rental contracts / cash flows from cloud customers, allowing investors to fund the vehicle while Lambda used proceeds to buy more GPUs. Coreweave has taken this further with multiple delayed-draw term loans (DDTLs) through subsidiaries / SPVs (e.g., $2.6B facility closed in July 2025), collateralizing fleets to finance OpenAI and other commitments without loading their own balance sheet.

Lenders don’t underwrite “a pile of GPUs” in isolation - they underwrite a complete setup: hardware resale value is paired with contracted revenue stream and supported by enforceable legal terms. The hardware provides recovery in default, but the contracts dictate reliable cash flows over time - who is paying (counterparty), how long (term), and under what pricing and cancellation terms. Enforcement mechanics are key because the GPUs operate in live clusters - lenders need step-in rights and access if default hits.

The core basis is that collateral value tracks the volatile GPU secondary market, while revenue ties to slower AI demand, utilization, and fixed contracts. When the two drift (e.g., oversupply crashes residuals while contracts stay steady), basis risk widens.

It is valuable to reframe lender underwriting as two parallel questions:

How strong are the cash flows? - Counterparty quality (e.g., investment-grade tenant or major AI lab), term / cancellation (non-cancelable 3 - 5+ years, plus hell-or-high-water clauses), pricing mechanism (e.g., fixed rates), concentration (no single tenant > 50 - 70%), and deliverability milestones (e.g., full funding only after power-up and testing).
How real is the collateral? - Timing adds another mismatch: DDTLs commit capital upfront but draw in tranches as GPUs arrive / deploy / energize. For example, your first draw funds Gen N GPUs and you later draw funds for Gen N+1, but your customer contract is fixed for 24 months - essentially the economics of the fleet and revenue are misaligned on different clocks.

Importantly, repossession value is not simply “what is a B200 worth?” - it’s “what is a B200 worth where it sits.” In practice, GPUs are wired into live clusters (InfiniBand, cooling, power) - extraction costs thousands per rack takes weeks or months, and causes downtime. This is why SPV structures obsess over hosting and access terms: control rights and hosting agreements can matter just as much as the equipment itself.

At scale, this is project finance - lending against a defined asset pool and contracted cash flows - rather than unsecured corporate debt against the computer fleet owner’s balance sheet. The true risk is alignment, as to whether the collateral packaged stays aligned with the revenue stream you’re reliant on.

Managing Basis at Scale

In real-world execution, basis risk isn’t an abstract footnote, but rather it’s the difference between a fleet that generates predictable returns and one that bleeds capital. The structures examined aren’t competing solutions - they are different ways of slicing the same pie of exposures. The question is never “which one is best,” but “which risks are you structurally willing to own, and which can you afford to let someone else carry?”

What makes this current moment different from prior cycles is the velocity. Generations turn over in 12 - 18 months, not 3 - 5 years. Utilization can swing 30 points in a quarter based on model efficiency or training / inference mix, and residuals can evaporate meaningful value depending on supply. The basis gap opens faster and wider than ever, and contracts that look bulletproof on day one can become anchors in just a few months time.

Taking a step back, the compute fleet owners and financiers who will win are the ones who treat basis as the primary variable and not an afterthought. They define the deliverable with precision and map exposures, all the while building in optionality - whether through tranche-based commitments or emerging hedging tools. As compute acts more like oil, edge belongs to those who procure compute intelligently.

The future of mid-tier data center economics will be defined less by who can build the biggest fleet and more by who can manage the basis most effectively. In this game, the real risk isn’t GPU scarcity - it’s watching your margins evaporate when the market reprices faster than your contract can follow.

Compute in Space. Why, How, and When?

Ornn — Fri, 16 Jan 2026 05:01:26 GMT

The emergence of AI has led to tremendous demand for compute. Today, hyperscalers report overwhelming utilization of chips in their datacenters and are anticipating this demand to grow exponentially as GPU-intensive processes cement themselves across every vertical of our economy. However, providing the compute needed to support such a world requires enormous infrastructure development, especially considering cloud providers’ current struggle to meet existing capacity requirements. Wall Street analysts project $500B-$600B in capex (primarily from companies like Amazon, Microsoft, Alphabet, Meta, and Oracle) just in 2026 to build datacenters. Some dub this massive spend the “AI arms race” because big players pressure each other to deploy massive investments into new capacity out of fear of falling behind. Despite these ambitious buildout plans, doubts remain on whether infrastructure deployment can keep pace with future requirements. Datacenters take years to build and outfit with the necessary hardware, networking, and cooling, and the GPU supply chain is complex and subject to delays. Not to mention the multiple year lag it takes to integrate something as power-intensive as a massive datacenter to the power grid.

Market consensus holds that rapid datacenter buildout is crucial to keep pace with AI growth over the coming decades. However, expansion faces significant challenges. Facilities need three core inputs, GPUs, Power, and Cooling, which each bring their own challenges.

Datacenters need access to vast quantities of reliable (and cost efficient) electricity to power the chips, and water, to cool the chips. Geography matters too: hyperscalers look for cool climate, flat, stable topography, minimal natural disaster risk, and sometimes proximity to densely populated areas for delay-sensitive inference. These requirements place a limit on the amount of land suitable for datacenters.

Datacenters also impose substantial environmental costs, raising questions about their long-term sustainability. Compute operations strain power grids and drive increased fossil fuel consumption. Hyperscalers cool GPUs by evaporating massive amounts of fresh water. The vapor escapes into the atmosphere, permanently removing the water from the local ecosystem and requiring datacenters to continuously source more. Rivers dry up, agriculture withers, and droughts intensify.

Building data centers in space might sound like a CEO’s empty PR promise, but we’re rapidly approaching the point where it becomes the economically viable choice. With SpaceX slashing launch costs and Starcloud already operating real hardware in orbit, what once sounded like sci-fi fantasy could become reality in the next 5–10 years.

Bull Case

Let’s zoom in on the latter two inputs: power and cooling.

Electricity in space is harnessed through solar energy, which is unlimited, free, and easily accessed closer to the Sun. Satellites can follow an orbital path that ensures the system has access to sunlight for up to 99% of the time. Furthermore, because the atmosphere dissipates ~50% of solar energy, panels in space produce around 8x more power per square meter than on Earth.

Orbital datacenters require radiative cooling as traditional methods involving conduction and convection fail in space’s vacuum. Heat pipes, sealed tubes containing water or ammonia, act as conduits that transfer heat from the chips to large external radiator panels, where the heat dissipates into the vacuum. The fluid then returns back to the chips to absorb more heat and the cycle continues. The ISS uses radiative heating, but since its systems consume less power it employs much smaller panels than a datacenter would need. Engineers have developed clever origami-like folding techniques to pack flaps of maximum surface area into the limited space available on rockets.

Space offers what Earth cannot: infinite room to scale and boundless, reliable energy without consuming land, water, or fossil fuels.

Key Players

Starcloud and SpaceX have partnered and emerged as early leaders in this market. On November 2, 2025, Starcloud launched an NVIDIA H100 GPU into orbit using SpaceX’s Falcon 9 rocket. The system was the size of a fridge and weighed in at 60kg, marking a milestone as the H100 is 100x more powerful than any computing system previously deployed in space. Starcloud plans to test their platform’s model training and fine-tuning capabilities as well as inference capabilities on Gemini. They planned the next iteration for sometime in 2026: a micro datacenter housing multiple H100s and at least one NVIDIA Blackwell chip.

While Starcloud’s progress is encouraging, the ultimate determinant of space datacenter viability is launch cost. SpaceX’s ability to rapidly produce and reuse rockets has given them overwhelming market share. Their competitors like Blue Origin and Rocket Lab trail far behind in launch frequency, reusability, and manufacturing capability. Elon Musk and his team have reduced the cost of launching to space by 95%—from $50,000 per kilogram in the 1970s to $2,500 today—and aim to eventually reach $50 per kilogram with Starship, their soon-to-be flagship vessel.

Rapid production and refurbishment timelines drive SpaceX’s cost advantage. Their factory in Hawthorne, California assembles a Falcon 9 booster in weeks which can be launched and relaunched in under 9 days. Musk plans to mass-produce Starships with 1,100 cubic meter cargo bays capable of delivering 100-ton payloads to orbit.

Challenges & Limitations

Despite the current optimism, questions around the practicality and feasibility of these datacenters persist. Radiation usually absorbed by the atmosphere interferes with chips’ circuits and memory in space, and long-term exposure will significantly shorten hardware lifespan. Radiation-resistant materials exist but command premiums. Moreover, how will hyperscalers replace or upgrade chips in orbit? Operating existing datacenters is extremely labor-intensive; failures occur frequently and require on-site troubleshooting.

Will hardware systems become more robust over time? Will advancements in robotics enable the automation of maintenance and prevent the need to return faulty systems to Earth or send personnel into space? These questions remain unanswered but will be central to the viability of space-based datacenters.

Lastly, routing compute to an orbital datacenter increases latency for most users on Earth. Inference loads, like a self-driving car deciding how to avoid a pedestrian, tend to be delay-sensitive and comprise around 70% current AI compute demand. That said, while it typically should add tens of milliseconds of latency, some inference requests originating far enough from terrestrial datacenters can actually be faster when routed through orbital infrastructure. Nonetheless, the marginally higher latency should not matter for the majority of use cases.

Ornn Adds the Financial Layer

Ornn is cementing its position in the AI ecosystem by offering instruments that allow compute consumers, providers, and lenders financing infrastructure to hedge against price volatility. Their partnerships with cloud providers give them access to extensive GPU pricing data unavailable to the public, which enables Ornn to precisely price futures that allow AI consumers to lock in future costs. Given the scale of corporate compute spending, disruptions like energy price spikes or semiconductor supply chain issues could significantly erode profit margins. By providing data centers and hyperscalers with revenue certainty, which in turn secures more favorable financing terms crucial for the infrastructure expansion necessary to meet AI demand.

Ornn’s derivatives will be a necessity with the deployment of orbital datacenters. The massive capital expenditure required to develop infrastructure in space makes Ornn’s role as the leader in compute downside protection crucial. It also hedges terrestrial hyperscalers against the risk of their datacenters becoming obsolete if compute migrates to space and legacy hardware proves incompatible with orbital operations.

Looking Forward

Major innovations from SpaceX and Starcloud make orbital datacenters feasible within the next 5-10 years, though significant uncertainty remains around how they would ultimately manifest. Specific datacenter architectures and operational designs are still undetermined. Given Elon Musk's track record of vertical integration, SpaceX could build orbital compute infrastructure independently and rather than partnering with Starcloud. Edge cases exist where chip efficiency gains could enable ground-based compute to remain economically and environmentally superior. Overall, orbital datacenters represent a promising development worth monitoring over the next few years, but several variables and technical challenges must be resolved before they become an operational reality.

Subscribe now

On GPU Depreciation

Ornn — Thu, 15 Jan 2026 00:56:41 GMT

The last couple of months there has been strong commentary from investors — many of whom we admire — surrounding the current depreciation schedules of GPUs. That is, questions as to whether or not current 5 - year depreciation schedules are accurate. The prevailing argument amongst value-skewed investors is that these schedules are in the worst case completely out of distribution and in the best case, the very edge of feasible range. Fear coursed through FinTwit and the AI/Semi’s community alike, especially on the back of articles like this one by the Economist that warn if server lives were reduced to one to two years, it could shave off $2 trillion to $4 trillion market cap from current big tech valuations. This is not even to say what the effect would theoretically be on smaller datacenters and neoclouds that own clusters. Many of these players don’t have locked in take-or-pay contracts, and don’t have the same balance sheet flexibility that hyperscalers have to update their existing compute base.

We believe many of these fears are misplaced and stem from a core misunderstanding of the concern in question — namely around what depreciation truly means.

Depreciation at its Core

Depreciation at its core is an accounting model for the periodic economic cost of your capital expenditures. In layman's terms, it’s a proxy for the cost using your “fixed” resources — GPUs, networking equipment, etc. This is then factored into a company’s income statement, the point of which is to map the current economic standing of a business. To that end, operating profit (EBIT) and earnings are too both proxies for economic before and after taxes and financing costs. And because it is in theory our best model for economic profit, investors care a lot about it, particularly in public markets. This is part of the reason why stocks tend to trade on an EV/EBIT(DA) or P/E multiple basis. So, when depreciation schedules are longer than the true useful life, they overstate earnings and therefore inflate valuations in public markets. To put some meat on the bones, consider a company that generates $10 in revenue and registers a $2 depreciation cost (with no other costs) over 3 years. Say public company A trades at a multiple then of 5x EV/EBIT — they’d then be valued at $40. Now, if I told you that the depreciation cost was actually understated and that it should have been $3 over 2 years, the company is all of a sudden (on the same multiple) worth $35. The valuations in public markets on the same company are different.

We have strong conviction that the way depreciation is being thought about in public markets today is disconnected from the hardware reality. Namely, we suspect that the hardware deprecation itself is less a driver of the decrease in GPU value than the technical obsolescence from new architecture releases and training regimes. Per Applied Conjecture, “an older, fully depreciated A100, while slower than a new B200 for a single, latency-sensitive query, can be highly cost-effective for throughput-sensitive workloads. When running large, batched workloads, the A100 can be driven to high utilization delivering a lower TCO for that workload than a brand new, expensive B200 that might be under-utilized”. Yes, A100s decrease in value, but it’s new tech that drives this, not key hardware failure. We see similar real world examples today as well. “Azure announced the retirement of its original NC, NCv2, and ND-series VMs (powered by Nvidia K80, P100, and P40 GPUs) for August/September 2023. Given these GPUs were launched between 2014 and 2016, this implies a useful service life of 7-9 years. And more recently, the retirement of the NCv3-series (powered by Nvidia V100 GPUs) was announced for September 2025, approximately 7.5 years after the V100’s launch”.

All Models Are Wrong, But Some Are Useful

Many operators and investors alike have taken the usefulness of depreciation from an accounting standpoint and extrapolated some idea that depreciation is even approximately a true proxy for the economic depreciation in the GPU as you use it over time. But as mentioned earlier, the reality is that most of the economic depreciation comes from tech obsolescence which is extraordinarily difficult to predict. Depreciation (the accounting metric) then has no real grounding in the GPU use case —it’s more of a hand-wavey relationship companies are required to slap on in the name of GAAP accounting and to appease financiers. Where depreciation is useful is for equipment that doesn’t suffer from the same idiosyncratic tech risk that GPUs do; equipment whose value is driven by usage.

This is not to say that GPU accounting depreciation is completely meaningless. We already know in public markets it can help drive earnings and therefore prices. But even on the private side, depreciation very critically allows for a tax-shield. That is, it is recognized as an accounting cost and therefore is deductible. Accelerated depreciation in early years increases the present value of tax shields, improving cash flow timing and investment returns. Datacenters are then incentivized towards depreciation schedules skewed towards what they believe is most beneficial from a fiscal reporting standpoint. And indeed GPU deprecation is a massive risk for datacenters. But to attempt to use depreciation schedules to accurately model this is an impossible challenge.

Concerns over GPU depreciation largely initially arose from concerns of public investors surrounding overstated earnings. This then spiraled to debate on the useful life of GPUs. We believe both these debates miss the point — depreciation is simply an accounting metric and the useful life of GPUs today is driven principally by future tech, not hardware malfunction. As a result, depreciation can be used for the benefit of datacenters (within reason) from an accounting standpoint. And should datacenters be concerned about their true GPU economic depreciation, they should explore financial products to manage this risk.

Memory: How It Works and Why It Sets the Cost of Compute

Ornn — Mon, 12 Jan 2026 21:11:38 GMT

The AI systems of today are incredibly powerful - they can write code better than most software engineers, analyze documents, generate images, and can reason through complex problems to reach coherent solutions. However, when a long conversation slows down output or a model forgets important information, the system appears fragile and is very frustrating for the user. Today, GPUs can perform trillions of operations per second, but large language models actually spend most of their time waiting. The bottleneck is no longer arithmetic, but access to information. In modern inference workloads, memory - not compute - determines how fast and how reliably models can respond.

Why AI Systems Remember Instead of Recompute

Memory is simply any form of information that an AI system can recall instead of having to recompute or guess. In AI systems, memory exists to preserve immediate results and contextual information, so the model can reuse prior work instead of repeating computations at each step.

The actual memory chip is attached to the GPU. These chips are High Bandwidth Memory (HBM), a specialized form of DRAM, and it usually sits very close to the GPU die via precise packing. This memory holds the model weights, key-value cache, and activations used during inference. HBM is very fast (measured in terabyte / second of bandwidth) but is limited in capacity. Additionally, the server contains system DRAM attached to the CPU on the server’s motherboard, which is accessed over slower interconnects and cannot support high-throughput inferences. As a result, GPU-resident HBM is the key limiting memory resource.

At the core of how modern language models operate is a mechanism called a key-value cache, or KV cache. When a model is given a prompt and begins processing text, for every token it encounters, the model processes two vectors in HBM: a key and a value. These vectors are later used in matrix multiplication to determine which prior tokens are relevant for the current computation. The classic example is two sticky notes: as you read a word in a sentence, you log down two sticky notes. The first sticky note is the key, which tells you what the word’s meaning is and its eventual relevance later. The second sticky note is the value, which tells you what information the token actually contains. Take the name “Drew” for example, the key tells you that it is a noun, a subject, a person, and likely an important part of the sentence, whereas the value, by contrast, is simply the information of “Drew.” In a sentence, a model goes token by token, and logs each token’s KV into a cache in order to help the model decide whether and how this token should influence future tokens output. As the sentence grows, the notes accumulate and remain available, as these entries remain in memory during the duration of the request.

Caching happens during the prefill stage, which is when the model parallel-processes the user’s prompt and builds up its memory of tokens. This relationship is linear - the longer a prompt, the longer the KV cache in the memory. The next stage is decoding, where the model references the current entire KV cache to generate a token. After a token is generated and added to the cache, the decoding process is repeated, now looking at an N+1-length cache.

As KV cache grows, it imposes a hard capacity ceiling at the GPU level. Since the cache is resident memory, it stays for the full request or conversation. GPU HBM is finite, and as a result, when a user continues its request, it consumes a slice of HBM and only so many requests can fit - just as a bookshelf can only hold a fixed amount of books of a certain size. Capacity is then governed not by the operations per second, or FLOPs, but rather the memory consumed by each request.

The decoding step is structurally memory bound as well. During decode, the GPU isn’t limited by the amount of tensor cores but rather the speed at which you can feed them from HBM. During this stage, it has to pull model weights and pull KV cache, which isn’t arithmetic-intense but rather a constant migration of lots of data, meaning the GPU usually spends a greater time waiting for data to arrive than the computations themselves. More generally, achievable compute constraint can be expressed as:

Because decode has low arithmetic intensity, this bound is almost always set by memory bandwidth, not by the GPU’s theoretical compute capability. In training, the constraint is memory capacity as it must fit about three to four times the model size in HBM, even as model sizes grew by about a factor of 400 while accelerator memory only grew twice in size. In contrast, inference, where models may fit, the GPUs may achieve only about a small percentage FLOP utilization because weights and KV cache must be streamed from HBM each token because of bandwidth. The imbalance is structural: over the past two decades, trends show that peak FLOPs have grown roughly 60,000 times over whereas DRAM and interconnect bandwidth have only increased 100 and 30 times over respectively, showing the distinct disparity that has underlined the modern “memory wall” in AI.

Where Inference Actually Bottlenecks

Inference doesn’t fail because models lack compute; it slows when they run out of fast memory to work with, which has directed the priorities for how accelerator generations have evolved. H100s, for example, ship with about 80 GB of HBM and about 3.3 TB/s of bandwidth per GPU. The H200 raised that to 141 GB and 4.8 TB/s, and the Blackwell-vintage B200s extend that to 180 - 192 GB and around 8 TB/s per GPU. This aggregates to 1.44 TB of memory and about 64 TB/s of bandwidth in a DGX B200 system. The idea is simple: compute tells you how capable a GPU is, while memory lower-bounds how much of that capability can be exercised and how quickly you can utilize that intelligence. If memory is the limiting factor, additional compute delivers diminishing returns unless paired with greater memory and bandwidth. The expanding memory unlocks higher economic value for all parties, as it allows for greater context windows and more users per accelerator, and those constraints surface clearly in how LLMs behave under load.

For example, when your chatbot conversation starts to pause or slow after a long exchange, that is because long prompts trigger the prefill stage, where the model allocates GPU memory to build a KV cache for the tokens in that context. During the decode stage, the model then has to reread a growing cache every newly generated token, which leads to sustained bandwidth pressure. If your KV cache is large, more memory is allocated, and therefore a greater amount of data is required to be moved for the next step of generation. That’s why short prompts feel instantaneous as cache is small. More memory consumed per request leads to fewer users served concurrently, and this forces the hand of the system, as they must continually trade off between individual context length, concurrent users, latency, and throughput.

As a result, conversations with a long cache might be truncated or reset by automatic capacity management. When your context is truncated, the model tries to probabilistically infer missing information as opposed to directing attention towards earlier tokens. Similarly, when the cache is extremely long, the decode stage must attend to a large number of competing key-value pairs. This dilutes attention over relevant tokens and degrades the consistency of generated outputs. This combination of truncation and noisy attention culminates in what frustrates many chatbot users far too often: hallucinations. These are not model malfunctions as some might interpret, but direct consequences of memory constraints under load.

Memory Stacks - and Prices Jump

HBM does not scale like conventional DRAM - it is produced by stacking multiple DRAM dies vertically using through-silicon-vias (TSV), bonded to electrically connect each die, underfilled with insulation to manage thermal stress, tested, and finally integrated alongside a GPU by advanced packaging. Since each step introduces yield loss and other capacity constraints, it collapses the viable supply base to three primary manufacturers: Micron, Samsung, and SK Hynix.

As the HBM stack grows taller, yield losses compound. For example, if each individual die has a 99% yield, stacking eight dies together lowers usable output to ~92%, twelve dies to about 89%, and sixteen dies to ~85% before accounting for any additional process losses. Independent of high per-die yields, moving from eight-high to twelve-high to sixteen-high stacks reduces usable output as TSV defects and interconnect failures accumulate across layers. At these heights, bonding and underfill become first-order constraints because failures at these stages invalidate the entire stack rather than a single die - much like a multi-story building falls if the foundation of one floor is unstable. Additionally, HBM is harder to qualify than commodity DRAM, as dies require testing before stacking as well as after the stack is completed, which makes testing a gating factor. These constraints cause HBM capacity to expand in discrete increments, shaping pricing downstream.

Even when HBM dies are available, without advanced packing capacity, they cannot be deployed. High-end accelerators need CoWoS packing, where the GPU and multiple HBM stacks are mounted on a silicon interposer and then integrated onto an organic substrate, which is very slow to expand as a process and not easily substituted. C.C. Wei, the CEO of TSMC, has repeatedly stated that CoWoS capacity is “very tight” and essentially sold out through 2026, with expansions lagging demand. The majority of this capacity is pre-allocated to large customers like Nvidia, who are estimated to consume more than 50% of available CoWoS capacity to support Blackwell. As a result, supply is gated by access to finite packing slots rather than fabrication alone.

HBM does not clear through a liquid spot market but rather is allocated through long- and short- term contracts. Importantly, this means pricing doesn’t adjust continuously but resets at negotiation boundaries. By late 2025, Micron stated that its entire 2026 HBM supply was fully priced and volume-locked. Additionally, SK Hynix and Samsung stated they are intentionally moving away from multi-year contracts in order to capitalize on expected stepwise price increases through 2027. Under this structure, hyperscalers are prioritized while smaller buyers face rationing or delayed access. Industry reporting in late 2025 and early 2026 showed Samsung raised memory prices by up to 60%, as well as commentary pointing to 20+% yoy increases from HBM under new contracts. HBM4, the next generation HBM standard, has been reported to have about 50% higher ASPs than HBM3. The result isn’t gradual inflation but rather discrete price jumps tied with contract resets. When HBM capacity and key packing slots are committed, then the marginal cost of expanding compute is dictated by memory allocation and not silicon capability. Price here is a step-function, and not a slope.

The economics make the shift explicit, as according to Epoch AI’s BOM estimates, HBM alone contributes $2,900 and advanced packaging $1,000 out of a $6,400 total cost of production for a B200 - roughly 60% of the total cost. Additionally, packaging yield losses add roughly $1,000 more, eating into margins. This means that the marginal cost of deploying the next unit of compute isn’t driven primarily by transistors nor logic efficiency, but by access to memory. In this space, $ / usable-FLOPS is dictated by memory economics and makes it the price-setting input for scalable AI compute.

Let’s consider a cost waterfall where there is a 30% increase in HBM pricing - well within contract step changes. Using the rough prices stated above, this raises the HBM cost per GPU by about $870, and including higher dollar-value packaging yield costs, around $1,100 to $1,200 a unit. At the system level, that leads to an eight-GPU server absorbing about $8,000 to $10,000 of incremental cost, and if you expand to a 10,000-GPU cluster, that figure might rise to $10 to 12 million of additional capital is required. Because the demand for frontier accelerators is relatively price-inelastic, these cost increases are rarely absorbed at the margin. The three primary memory suppliers capture the initial price reset, and GPU vendors attempt to keep high gross margins through higher ASPs, and the margin trickles down to hyperscalers and data-center operators. Since pricing and procurement happen at discrete boundaries, these adjustments reprice deployments at once instead of smoothing, creating abrupt step changes in $ / usable-FLOP - creating memory pricing as a first-order financial exposure.

The Risk Beneath the Stack

What emerges from the prior sections is less a cost story than a risk one. Memory has become price-setting for usable compute. It is supplied through allocation to larger players and repriced at contract boundaries, as well as constrained by physical bottlenecks that do not scale smoothly. This combination produces discrete volatility, and not continuous. When costs reset, the impact propagates through GPU pricing and deployment budgets all at once. In that sense, memory no longer behaves like a background component cost, but like a volatile input whose price determines whether compute can be deployed on acceptable terms.

That exposure is broadly unhedged. Hyperscalers and data-center operators commit to multi-year build plans and fixed deployment targets, yet face step-function swings in memory-driven capital intensity. Neo-clouds and GPU service providers sell compute under relatively stable pricing models while their upstream costs reprice episodically. Lenders and financing partners underwrite assets and cash flows assuming predictable replacement costs, even as those costs increasingly hinge on memory availability and contract timing. In this environment, HBM begins to behave economically less like a depreciating component and more like a scarce commodity input: its value is sustained by constrained supply and rising demand from successive compute generations rather than eroded by them.

When a non-substitutable input becomes volatile and price-setting, it ceases to be an engineering variable and becomes a financial one. Markets typically respond to such conditions by creating mechanisms to price, transfer, and hedge that risk, separating operational execution from exposure to upstream shocks. As memory increasingly determines the economics of scalable compute, the absence of such mechanisms becomes a growing mismatch between how AI infrastructure is built and how its risks are managed.

Subscribe now

The AI Chip Wars: From Supplier Dominance to Contested Equilibrium

Ornn — Mon, 05 Jan 2026 17:03:15 GMT

Recent reporting that OpenAI has begun running a portion of its training workloads on Amazon’s Trainium has been widely interpreted as a challenge to NVIDIA’s technical leadership. On top of that, just last week, NVIDIA entered into a technology licensing agreement with Groq focused on inference-related chip designs - another signal that competition in AI compute is broadening beyond a single architectural path.

The important story isn’t that anyone “beat” NVIDIA on performance. It’s that large buyers are building credible outside options - enough to change bargaining power, contract structure, and where margins accrue across the stack.

The real shift: control over compute economics

For much of the current AI cycle, the stack was cleanly hierarchical. NVIDIA supplied the critical hardware, hyperscalers distributed it, and model developers consumed it. NVIDIA’s position was reinforced not only by performance leadership, but by deep software lock-in and limited substitutability. For hyperscalers, GPUs were a necessary input but not a controllable one.

That arrangement becomes unstable at scale.

As AI workloads have grown, compute costs have moved from being a pass-through expense to a strategic variable. At that point, hyperscalers face a structural problem: continued dependence on a single external supplier limits their ability to manage margins, plan capacity, and negotiate pricing. This is the context in which in-house silicon matters.

Trainium and Google’s TPUs are best understood not as attempts to displace NVIDIA outright, but as mechanisms to reintroduce bargaining power into the system. Hyperscalers do not need their silicon to be universally superior. They need it to be viable enough, at a sufficient scale, to constrain unilateral pricing power upstream.

This is why the “chip wars” are less about performance leadership and more about who controls the economics of compute. NVIDIA’s objective is to preserve value capture through differentiation and ecosystem depth. Hyperscalers’ objective is to ensure that no single supplier can dictate terms indefinitely.

Why OpenAI’s choices matter disproportionately

Not all customers exert the same influence in contested markets. OpenAI is among the largest marginal buyers of frontier training compute, and its infrastructure decisions shape behavior across the stack.

When OpenAI runs meaningful workloads on non-NVIDIA silicon, it does two things simultaneously. First, it validates alternative architectures at scale, encouraging further investment and optimization. Second, it establishes optionality as a credible negotiating position. Even partial workload portability changes how future capacity contracts are structured and priced.

This does not imply abandonment of GPUs. NVIDIA hardware remains central to frontier training. But it does mark the end of single-vendor dependence as an acceptable default. In mature infrastructure markets, large buyers do not optimize for maximum performance alone. They optimize for resilience, leverage, and long-term cost control. AI compute is beginning to exhibit the same behavior.

Contested equilibria do not emerge through abrupt displacement. They emerge when credible alternatives make dominance harder to sustain.

Where competition actually manifests

One reason this shift is easy to misread is that its effects are indirect and delayed.

Competition will not show up first in NVIDIA’s reported revenues or in headline performance metrics. Merchant suppliers often retain leadership even as pricing power erodes. The early signs appear instead in cloud pricing, in contract flexibility, and in the share of economics captured at different layers of the stack.

Hyperscalers are willing to absorb complexity and near-term inefficiency in exchange for strategic control. NVIDIA continues to defend margins through system-level integration and software depth. These forces can coexist for years, producing gradual rather than abrupt change.

What matters is not immediate displacement, but the redistribution of leverage. Over time, that redistribution tends to compress excess rents, even if market shares move slowly.

Why this competition ultimately benefits end users

For end consumers, the chip wars are invisible in form but meaningful in consequence. Users do not care which silicon runs an AI workload. They care about cost, reliability, and speed of deployment.

As compute becomes less dependent on a single supplier, effective costs tend to fall, and capacity constraints loosen. That enables more aggressive pricing, faster iteration, and broader deployment of AI features. These benefits typically accrue downstream and with a lag, but history suggests they are durable once competition takes hold.

As architectures fragment, “the price of compute” stops being a single number and becomes a surface - varying by chip, region, network, and contract shape. That fragmentation increases basis risk and makes budgeting harder, which is why independent benchmarks and risk-transfer layers (e.g., Ornn) become more valuable over time.

This is not unique to AI. Similar patterns have played out in telecom equipment, cloud storage, and networking infrastructure. When dominance gives way to competition, incumbents face margin pressure, but users gain choice and affordability.

Bottom line

OpenAI’s use of Trainium is not a verdict on NVIDIA’s technology. It is evidence that the AI compute market is entering a new phase.

The defining shift is structural, not technical: AI compute is moving from supplier dominance to a contested equilibrium. NVIDIA remains a central player, but it no longer operates in an environment where alternatives are theoretical. Hyperscalers are no longer passive distributors; they are active participants in shaping compute economics.

This transition will be gradual and uneven. But historically, when infrastructure markets move toward contested equilibria, pricing power becomes harder to sustain, and the benefits flow downstream. Not because any single firm wins outright, but because no single firm can dictate terms alone.

Contributors: Ornn, Archegos Intern

Circular Financing in AI

Ornn — Fri, 19 Dec 2025 17:34:19 GMT

If you follow AI infrastructure at all, you’ve likely seen this Bloomberg graphic mapping the dense web of relationships between NVIDIA, OpenAI, Microsoft, Oracle, CoreWeave, and a growing list of model labs and GPU clouds. Capital flows in one direction, compute flows in another, equity stakes loop back, and long-term contracts seem to anchor everything in place.

To some, the image looks like an “AI money machine” - a closed loop where companies appear to fund each other’s growth. In reality, it is something far more familiar to infrastructure and project finance veterans: a market improvising under extreme supply constraints.

What’s being called circular financing is not financial sleight of hand. It is a way to pull future compute demand into the present, allowing physical infrastructure to be financed and built before traditional markets for pricing, hedging, and risk transfer fully exist.

Circular Financing: what it is, how it works, and who’s involved

In the AI ecosystem, circular financing refers to interlocking capital and commercial arrangements among the same set of counterparties (chip suppliers, frontier model labs, cloud providers, and infrastructure builders) that collectively finance the production and consumption of compute.

Rather than a single transaction, it’s a system of reinforcing commitments:

equity investments, warrants, or credits
long-dated take-or-pay compute contracts
prepayments for hardware or cloud capacity
utilization guarantees that de-risk lender underwriting
and revenue-sharing or pricing concessions that substitute for traditional financing costs

The unifying idea is simple: compute demand arrives faster than compute can be built. GPUs, power, cooling, land, and grid interconnections all move on multi-year timelines. But frontier AI labs cannot wait years for capacity. Delays compound into weaker models, slower deployment, and a durable competitive disadvantage. Finance fills the gap.

How the loop actually works

At the center of the ecosystem sit frontier model developers (OpenAI, Anthropic, xAI, and others) whose willingness to commit to massive, long-term compute usage turns abstract AI demand into something banks and builders can underwrite.

Surrounding them are chip suppliers, led by NVIDIA and increasingly AMD. These firms are not just selling hardware. They are stabilizing demand. Strategic investments or incentives at the model layer help ensure that expensive capacity expansions translate into real deployments.

Next are cloud and infrastructure integrators like Microsoft and Oracle. These firms increasingly function less like “cloud resellers” and more like builders and financiers of physical infrastructure, taking on exposure to power pricing, construction timelines, and balance-sheet leverage in exchange for long-term contracted demand.

Then come GPU-focused clouds and intermediaries, such as CoreWeave, which sit between silicon and end users. These players often operate with thinner margins and higher utilization sensitivity, but they play a crucial role in translating long-term commitments into near-term capacity.

Behind all of this sit data center developers, utilities, and lenders, underwriting projects whose viability depends on one central assumption: that contracted compute demand materializes and persists.

What makes the structure feel “circular” is that the same counterparties appear on both sides of financing and consumption. But economically, this is no different from how power plants, LNG terminals, or telecom networks have long been financed - forward contracts make future cash flows bankable today.

Where circular financing breaks: the real risks

Circular structures don’t fail because they’re circular. They fail when assumptions embedded in long-term commitments stop holding.

1) Utilization and demand risk

If AI spending slows, inference economics disappoint, or model scaling plateaus, fixed take-or-pay obligations become painful quickly. The question isn’t whether compute demand fluctuates. It’s who is locked into paying when it does.

2) Balance-sheet and funding risk

Long-term commitments pull capital expenditure forward. That can stress leverage ratios and credit metrics well before cash flows fully arrive, inviting market scrutiny and tighter financing conditions.

3) Construction and power risk

Delays in grid connections, cooling infrastructure, or permitting push revenue out while interest and fixed costs continue to accrue. In a capital-intensive system, timing matters as much as price.

4) Technology obsolescence risk

GPU generations advance quickly. Long-dated contracts can strand older hardware or force renegotiation if performance-per-dollar improves faster than expected.

5) Governance and disclosure risk

Because these arrangements span equity, contracts, and financing, poor disclosure can make sound economics look suspect. Investors tend to discount what they can’t clearly underwrite.

Across all these risks, the same question keeps reappearing: when conditions worsen, which balance sheet absorbs the shock?

What actually fixes this: how the system matures

Circular financing is not a permanent solution. It is a bridge - a way to clear supply and demand before markets have the tools to do so efficiently. As the AI ecosystem matures, several changes become inevitable.

Standardization of compute contracts: Today’s bespoke agreements bundle pricing, delivery, uptime, and utilization in idiosyncratic ways. Over time, markets will push toward standardized terms that make contracts comparable and transferable.

Better price discovery: Right now, compute pricing is opaque and highly negotiated. Transparent benchmarks (by workload type, duration, and performance tier) are a prerequisite for efficient capital allocation.

Risk separation: In a mature system, utilization risk, price risk, construction risk, and technology risk do not all sit on the same balance sheets. Financial instruments emerge to unbundle and transfer them to the parties best suited to bear them.

Less reliance on balance sheets: As markets develop, fewer projects need to be justified by strategic equity, cross-investments, or utilization guarantees. Capital can flow on the basis of expected returns, not ecosystem loyalty.

We at Ornn are building the market primitives that achieve these ends.

The bottom line

The Bloomberg diagram is not evidence of an AI bubble propped up by self-referential financing. It reflects a system under strain, relying on balance-sheet engineering to compensate for missing market infrastructure. Circular financing accelerates capacity build-outs by tying together capital formation, demand commitments, and risk allocation across a narrow set of counterparties. But in doing so, it concentrates exposure, obscures true pricing, and weakens the signals that normally discipline capital deployment.

These structures persist not because they are robust or sustainable, but because alternatives remain underdeveloped. Circular financing substitutes contractual and balance-sheet complexity for transparent markets and works only so long as demand growth remains strong and assumptions hold. Over time, durable systems must rely less on counterparties underwriting one another’s demand and more on standardized contracts, clear price discovery, and explicit risk transfer. Until then, circular financing remains a fragile stopgap - useful under constraint, but poorly suited as a long-term foundation.

Contributors: Ornn, Archegos Intern

Compute Futures

Ornn — Tue, 16 Dec 2025 22:26:14 GMT

Right now if you were to open up Bloomberg, the Journal, or your business news source of choice, chances are that you’d see something about a new investment in compute infrastructure. It’s clear that AI is becoming a more integral part of everyday life, and capital markets have been quick to deliver the financing for its proliferation. Despite this massive influx of capital, however, there remains significant opacity in the precise value of compute. Markets have been writing blank checks, as it were.

At Ornn we’re building a futures market in which participants can explicitly trade on this value of compute. Anyone with exposure to the AI value chain can hedge their risks by assuming a futures position. So how does it work?

Cash-settled to a Benchmark

A central unlock from a futures contract is that it allows for cash exposure to compute without dealing with physical delivery. As with any other commodity, many involved parties don’t want to deal with inventory—in the case of compute, financial parties have no interest in managing SLAs, API endpoints, SSH details, etc. Instead, futures are cash-settled to a benchmark tracking the spot value of compute; they pay out the difference of your trade price and the underlying index.

Payout Structure

In our first post in this series, we argued for compute as an analog commodity to electricity. Taking inspiration from power markets, compute futures are “Asian-style,” where the payout is equal to the arithmetic average of index values observed through the tenor of the contract. Moreover, because index values are published daily, settlement can be realized incrementally throughout the life of the contract rather than only at expiry. For a monthly future, then, this would consist of 30 daily payouts based upon the difference of the trade price and the observed index values.

Sample Monthly Future outcome from Ornn. The payout for a long position is equal to a scaled difference of the index (blue curve) and the trade price (white dashed line). Here that’s 30 * 24 * (1.856 - 1.854) = 1.44.

The payout of an oil future—say, ICE Brent Crude—is given by the benchmark value at the end of its tenor. Terminal settlement works well for storable commodities like oil, where economic exposure is dominated by the spot price at expiry. Compute, however, is a flow commodity. What market participants care about is not the price of compute on a single day, but the average price paid over the period of use.

“Asian-style” settlement aligns the financial payoff of the contract with this economic reality. By settling on the arithmetic average of daily index values, a compute future effectively locks in a fixed price per GPU-hour across the tenor of the contract, mirroring how compute is actually consumed and billed in practice.

Margin and Counterparty Risk

This sounds all well and good. I can hedge my compute exposure via a simple cash future? Great! Where can I sign up?

A cautious operator will surely ask about counterparty risk. Who’s to guarantee that the loser of a trade will pay out the winner? Outside of the clear reputational loss, compute futures require margin, where counterparties must put up cash as collateral in order to guarantee their positions.

So at a high level, margin serves two purposes. First, it protects market participants from the risk of counterparty default by requiring that sufficient collateral be posted before losses are realized. Second, it ensures that daily gains and losses can be settled as they accrue, preventing losses from compounding unchecked over the life of a contract. Margin protects both sides transacting a compute future.

A Price Discovery Layer

Compute is quietly becoming one of the most critical inputs to the modern economy, and yet it remains one of the least financially transparent. Capital is deployed at colossal scales, exposure accumulates continuously, and participants are left managing GPU price risk with little more than hand-wavy balance sheet intuitions.

Compute futures change that. By anchoring settlement to a transparent benchmark, structuring payouts around flow structure, and managing counterparty risk through margin, these contracts make the value of compute explicitly tradable. Free markets converge on a value. They allow consumers, producers, and investors to hedge real economic exposure without taking on delivery risk or operational complexity.

The result is not financialization for its own sake, but a foundation—a market structure that supports price discovery, risk transfer, and long-term planning in an ecosystem where compute has become essential infrastructure. As AI continues to scale, the ability to trade compute risk will become as fundamental as the capability to trade power or fuel.

At Ornn, we’re ushering in that world.

Subscribe now

Compute as a commodity isn't oil. It's electricity.

Ornn — Tue, 09 Dec 2025 01:12:44 GMT

At Ornn we believe that compute will be the most important commodity of the 21st century. It’s the essential input to the training and deployment of artificial intelligence, and our business is built on the hypothesis that AI companies and datacenters alike need to hedge their compute exposure, just like any other commodity.

Coming from more financial/trading backgrounds, we immediately grouped compute with the current biggest commodity in the world—crude oil. The analogy was obvious: consumers of oil (compute) want to lock in a price they can purchase at to reduce future uncertainty, and producers of oil (compute) want to lock in a price they can sell at for the same fundamental reason. A future, the financial mechanism which enables these hedges, is an agreement to do just that: two parties decide to transact oil (compute) at a contracted price sometime in the future. You get to fix tomorrow’s price right now.

We’re building a futures market for compute, and we started by simply porting over the battle-tested oil model. But in due time we arrived at an intrinsic difference between the two commodities, one which we believe illuminates the foundational nature of compute as a commodity. Namely, compute isn’t a stock resource, it’s a flow good.

Stock vs Flow Goods

Fundamentally, compute is temporal. This is baked into very unit of price: compute is quoted in dollars per GPU-hour. You rent a GPU for some period of time, paying for access to its computational resources (for training or inference or whatever) over that very same period. Consumption happens over time, and any unused capacity can never be recovered. So when you purchase or sell compute, you’re really transacting a GPU’s flow.

Oil, however, is a stock commodity. It’s quoted in barrels, and there’s no intrinsic relationship with time. Indeed, oil can be stored; a barrel today is exactly the same as a barrel tomorrow. Flow goods capture a rate, while stock goods are invariant and fungible across time.

Electricity as an Analog

So, if compute is fundamentally a flow good, then oil is the wrong mental model. The right comparison—the one whose economics and financial structure mirror compute almost exactly—is electricity. Most importantly, they’re both services delivered across time.

Electricity, like compute, is not something you can stockpile. A megawatt-hour not generated or consumed at 2:00pm is gone forever. Power markets therefore price electricity as a rate over time, and all market structures (from real-time dispatch to forward curves to futures settlement) are built around this temporal nature. We believe compute should behave the same way.

And we’ve already seen the primary market for compute inherit some of electricity’s design. In particular, there’s:

Real-time spot pricing updates based on utilization
Pricing can vary significantly based upon the location of the compute
Long-term bespoke contracts for delivery of continuous compute resources

This shared status as a flow commodity is why we believe that electricity is the right analog for compute. So we think that market participants should be able to hedge their compute just like electricity, and our futures exchange is built to do exactly that.