Every AI startup CEO is running the same calculation in the back of their head. Compute is the largest line item on the P&L. It is also the input that determines how fast the team can ship, how predictably the product performs, and how long the runway lasts. When the line that funds the product becomes the line that threatens the company, the question stops being “how do we optimize cloud spend” and starts being “are we on the right kind of infrastructure at all?”
Andreessen Horowitz, in its widely cited “Navigating the High Cost of AI Compute” analysis, frames the choice plainly: cloud is the right starting point for most AI startups, but at scale the same elasticity that justified the premium becomes a margin problem the business cannot grow out of. As of April 2026, an on-demand NVIDIA H100 lists at:
- $3.00 per GPU-hour on GCP
- $3.90 per GPU-hour on AWS
- $6.98 per GPU-hour on Azure
Those are CloudZero’s published comparison rates. Specialized GPU providers list comparable H100 SXM5 capacity at roughly $2.00 per GPU-hour. For a CEO, that is roughly 1.5x to 3.5x more spend for the same chip. A margin problem is a runway problem.
This is a framework for deciding when to move off the hyperscalers. The honest version of the answer is that it depends on the workload, not the company. Moving too early costs capital. Staying too long costs the company. The goal here is to make the decision on evidence rather than on the architecture the team happened to pick when it had five engineers and no customers.

What Does “Moving Off the Hyperscalers” Actually Mean?
Moving off the hyperscalers means migrating GPU workloads from public cloud (AWS, Azure, Google Cloud) to dedicated infrastructure the startup owns or leases directly: on-premises hardware, a colocation facility, or a private GPU cloud. It does not mean abandoning cloud principles like elasticity or self-service. It means matching the infrastructure to the predictability of the workload and the economics of the business. Done well, the result feels like cloud to the engineers using it and like owned infrastructure to the CFO funding it.
The decision sequence:
- Start on hyperscalers. Fastest path to a working product.
- Stay until the workload is no longer experimental. Workload, customer base, and unit economics need to stabilize first.
- Move when the math flips. That is what the rest of this framework is for.
What is the Real Cost of Staying on Hyperscalers Too Long?
The real cost of staying on hyperscalers too long is compounding margin loss. Per Barclays’ Q4 2024 CIO Survey, 86% of CIOs plan to move at least some workloads off public cloud, the highest rate the survey has recorded. Andreessen Horowitz found that repatriation typically cuts compute spend by one-third to one-half, and estimated cloud margin drag at roughly $100 billion of market value across 50 top public software companies. The point is not that cloud is a mistake. It is that the premium you happily paid for elasticity keeps being charged long after the workload stopped being unpredictable.
For an AI startup, the cost shows up in three places:
- The visible cost. $3.00 to $6.98 per H100 GPU-hour across the three hyperscalers as of April 2026 (CloudZero).
- The hidden cost. AWS publishes outbound internet data transfer at up to $0.09/GB. For an AI workload moving 10 TB per month, that single line is roughly $900 before NAT, load balancer, cross-zone, or storage charges.
- The opportunity cost. Every dollar spent renting compute is a dollar not extending runway, hiring engineers, or improving the product.
The Six Signals: A CEO-Level Framework
When startup leadership teams ask how to know it is time to move, the answer is the same set of signals every time. None of them is sufficient on its own. Two or three appearing together usually mean the spreadsheet has already moved past the cloud, whatever the team’s instinct says.
Signal 1: Hyperscaler GPU spend has crossed $25,000 per month and keeps growing
This is the threshold where capex on owned hardware starts beating opex on rentals on a cash basis alone, before any of the tax or balance-sheet advantages discussed later.
- H100 SXM5 GPU price: $27,000 to $40,000 per unit (IntuitionLabs / TRG Datacenters)
- Full 8-GPU HGX H100 system: $200,000 to $400,000 (chassis, CPUs, networking, integration)
- Comparable hyperscaler rental: roughly $22,000 to $40,000 per node-month at on-demand rates
- Break-even at $25K/month cloud spend: an owned node amortizes within roughly a year on hardware alone
- At $40K/month: the math is no longer something the CFO needs to model twice
Signal 2: The workload is now predictable
Hyperscalers earn their premium when demand is unknowable. The day a CEO can forecast next quarter’s GPU demand within a reasonable range, the elasticity premium has stopped being insurance and started being a tax.
Strong repatriation candidates:
- Production inference on a stable model
- Continuous fine-tuning loops on a known cadence
- 24/7 serving workloads with diurnal traffic patterns
Not yet ready:
- Exploratory pre-PMF training
- Sporadic research jobs with no fixed cadence
Signal 3: Egress and ancillary fees are a meaningful share of the bill
Per AWS public pricing pages:
- Outbound internet transfer: up to $0.09/GB
- Cross-AZ transfer: $0.01/GB each way
- Plus NAT gateways, application load balancers, storage API calls, and cross-region traffic
When those line items together become a meaningful share of the bill, the public cloud has stopped being a compute service and started being a tollbooth. Owned infrastructure or a direct colocation arrangement removes that tax, and replaces it with a fixed bandwidth commit you can actually forecast.
Signal 4: GPU utilization is sustained above 40%
Owned GPUs only pay back when they are being used. According to the Uptime Institute’s May 2025 report on GPU utilization, GPU servers in training are operational roughly 80% of the time, and even well-tuned training jobs reach just 35% to 45% of peak silicon performance while running. The relevant number for this decision is sustained cluster-level utilization, not peak.
The thresholds:
- Below 30% sustained: cloud is usually cheaper. Stay.
- 40% to 70% sustained: owned hardware wins decisively.
- Above 70% sustained: every month on the hyperscaler is margin set on fire.
Utilization is also the objection people raise against owning, so it is worth being precise about it. The fear is stranded capacity: GPUs you paid for sitting idle. Two things answer that fear, and both are covered below under the objections. First, you only repatriate the workloads that clear this bar. Second, idle capacity on owned infrastructure is not dead weight the way an over-provisioned cloud reservation is, because spare cycles can be monetized rather than simply wasted.
Signal 5: Customer SLAs or compliance now require infrastructure control
There are two ideas inside the phrase “data sovereignty,” and keeping them separate is what makes this signal useful. Data residency is about where data physically sits. Data sovereignty is about which government can compel access to it. They are related, but they are not the same, and the difference is what matters for startups selling into regulated markets.
Residency is largely a solved problem on public cloud. Hyperscalers like AWS let customers keep data and its backups in a chosen Region, and have gone further with sovereign offerings such as Dedicated Local Zones and a generally available European Sovereign Cloud. If the requirement is simply to keep data inside a given country, a hyperscaler can meet it.
Sovereignty is where a gap remains. A cloud provider headquartered in or ultimately controlled by a US parent stays subject to the US CLOUD Act, which can require it to hand over data under its control regardless of where that data physically lives. Choosing an in-region data center does not remove that exposure, because the question is no longer where the data sits but whose law reaches the company holding it.
For an AI startup selling into healthcare, finance, government, or EU buyers, that distinction shows up directly in deals. A European customer governed by GDPR, or a regulated institution under DORA, often has to guarantee that its data cannot be reached by a foreign legal order. If the underlying infrastructure is operated by a US-controlled provider, that guarantee is difficult to make no matter where the servers are. It is not hypothetical: in 2021 Portugal’s data protection authority ordered its national statistics institute to stop sending census data to a US provider, precisely because US surveillance law could override the contract meant to protect it.
The regulatory drivers behind this:
- HIPAA (US healthcare)
- DORA (EU financial services), now in force
- The GDPR Article 48 conflict with the US CLOUD Act
- The EU Data Act
This is why infrastructure control becomes a sales question and not just an internal preference. The Nutanix Enterprise Cloud Index 2026 reports that 57% of IT leaders now require infrastructure within a single country, which is the residency half of the question. The sovereignty half, a guarantee that no foreign jurisdiction can reach customer data, is one that dedicated infrastructure outside that chain of control can answer cleanly.
Signal 6: The roadmap for this workload is firm for 18 to 24 months
GPU hardware amortizes over three to five years. The decision rule follows from that:
- Workload that might pivot in 6 months: stay rentable.
- Workload tied to a proven product line with an 18-month-plus roadmap: lock into infrastructure economics, not per-hour pricing.
The Cost Comparison That Matters at the Board Level
The table below consolidates the numbers AI startup CEOs and CFOs should be modeling when they run their own break-even scenario. Figures reflect 2026 published rate cards and analyst sources.
Two points are easy to miss inside a per-hour comparison:
- Dedicated infrastructure is not the opposite of cloud. Most AI startups that move off the hyperscalers run hybrid: dedicated infrastructure for the predictable production load, a small public cloud footprint for spikes or geographic reach.
- The savings do not come from the hardware line alone. They come from eliminating egress, sustaining higher utilization on fewer GPUs, recovering value from idle capacity, and giving finance a forecastable run-rate instead of a variable one.
What This Looks Like in Practice: Boson AI
In 2025, Boson AI, a fast-moving LLM company building voice agents, faced exactly the conditions described above. Cloud bills were scaling faster than the business. The team needed visibility into networking, storage, and node topology that hyperscalers abstracted away. Standard OEM lead times for the hardware they needed exceeded 12 weeks.
Boson partnered with Arc Compute. The result:
- 65-node NVIDIA HGX H100 cluster on a 400G Quantum-2 InfiniBand fabric
- 520 ConnectX-7 NICs hand-installed on site
- Under 4 weeks delivery vs. the 12-week industry norm
- 100% cloud-to-on-prem migration the same week the cluster went live
The economics flipped from a recurring spend that “priced every new experiment” into fixed infrastructure costs that scaled with the business, not against it. The full Boson AI case study walks through the BOM-to-rack execution. The lesson is not that on-prem is cheaper. The lesson is that the right partner makes the move from rented compute to owned compute fit inside a startup’s product timeline rather than fighting against it.
Answering the Objections You Will Hear in the Room
Every CEO who runs this calculation hits the same set of objections, usually from a thoughtful CFO or a cautious board member. They are reasonable. They also have answers.
“Owning hardware is a huge capital commitment.”
It can be, but it does not have to be a binary. There are two paths, and the path between them matters as much as either endpoint. The CAPEX path suits startups with predictable, high-utilization workloads and a clear case for reducing per-unit compute cost, and it is common at Series B and beyond. The OPEX path, leasing or consumption-based access to dedicated infrastructure, suits earlier-stage teams or companies whose investors prefer operating expense to capital outlay. Many startups start on OPEX to prove that dedicated infrastructure works for their workload, then convert to ownership once the economics are demonstrated. The financial commitment should not be the thing that decides whether the move is technically right.
“Owned GPUs will sit idle and waste money.”
This is the strongest objection, and it has two answers. First, you only repatriate workloads that clear Signal 4, so the baseline utilization is high by construction. Second, idle capacity on owned infrastructure is not stranded the way an over-provisioned cloud reservation is. Spare cycles can be sold. Through controlled marketplace monetization, excess capacity can be rented externally without touching internal workloads, directly offsetting infrastructure cost. In Arc Compute’s own modeling of a representative cluster, this is the difference between a good outcome and an exceptional one: three-year ROI versus public cloud ranges from roughly 198% when only renting out zero-downtime spare capacity to as high as 350% when recovering around 30% of otherwise-idle utilization. Those are modeled figures and depend on the workload, but the direction is the point: on owned infrastructure, idle time is an asset you can choose to monetize, not a sunk cost.
“We lose the elasticity that made cloud attractive.”
You lose it only if you treat the decision as all-or-nothing, and almost nobody does. The mainstream pattern is hybrid: dedicated infrastructure carries the steady-state production load, and a public cloud footprint absorbs spikes and provides geographic reach. You are not giving up elasticity. You are paying the elasticity premium only on the slice of the workload where it is actually worth paying, instead of on all of it.
“We do not have the team to run GPU infrastructure.”
This is a real constraint, and it is the reason many teams stay on cloud longer than the math justifies. It is also solvable without hiring an infrastructure org. A turnkey AI factory is delivered as a managed solution covering the full lifecycle: design and logistics, deployment, burn-in and validation, integration with existing identity and networking, then day-to-day remote operations, monitoring, end-user support, and break-fix. A cloud-like operating layer (a self-service portal, multi-tenancy, GPU scheduling, health monitoring) sits on top of the owned hardware, so engineers consume it the way they consumed cloud. The operational complexity is real; the point is that you can buy it as a service rather than build it as a department.
“GPUs depreciate fast, so the asset is worthless in a few years.”
Accounting depreciation and market value are not the same thing. A GPU cluster can be largely depreciated on the balance sheet while still retaining meaningful resale value; enterprise AI infrastructure can often preserve a substantial share of its original value years in, and demand for prior-generation accelerators has stayed strong as supply has remained tight. Ownership also turns spend into a balance-sheet asset rather than pure operating expense, which creates annual depreciation deductions that reduce taxable income, and can improve borrowing capacity and lender confidence. When you fold tax treatment, residual value, and the avoided cloud premium into the model, ownership frequently shows a lower true economic cost than renting the equivalent capacity. That is a CFO argument, not just an engineering one.
“We will just wait and rent it cheaper later.”
That bet is not paying off right now. SemiAnalysis reports H100 one-year rental contract pricing rose roughly 40%, from $1.70 per GPU-hour in October 2025 to $2.35 per GPU-hour by March 2026, with on-demand capacity sold out across most major providers and Blackwell lead times stretching into mid-2026. And the structural pressure runs the other way: per analyst reporting compiled by IEEE ComSoc, the Big Five hyperscalers are set to spend more than $600 billion on infrastructure in 2026, roughly three-quarters of it on AI. That capital has to be recovered from customers. “Wait it out and rent cheaper” is, for now, a strategy that is losing money.
What Owning Compute Actually Unlocks for AI Startups
For CEOs running this calculation, the savings number is the headline. The strategic effects are bigger:
- Faster iteration. Production inference on dedicated infrastructure delivers predictable latency and throughput, which means the team ships features instead of debugging cloud queuing or noisy-neighbor variance. Training on owned clusters means experiments are no longer rationed by per-hour pricing.
- Forecastable margin. Replacing a variable hyperscaler bill with a fixed run-rate gives the CFO numbers that hold for a year. That changes board conversations, and it changes the customer contracts a startup can sign without writing margin guarantees into its pricing.
- A defensible architecture. For startups in regulated industries, infrastructure control is a sales question, not an ops question. The architecture that closes a HealthTech, FinTech, or government contract is rarely the architecture that closed the seed round.
- Optionality. Once a startup owns or has dedicated access to its compute, hybrid becomes a real choice. Cloud burst for spikes, owned capacity for the steady state. The startup, not the hyperscaler, decides where each workload lives.
What the Next 12 to 24 Months Look Like
Three forces will sharpen this decision further through 2027:
- Hyperscaler AI capex is at unprecedented levels. Per analyst reporting compiled by IEEE ComSoc, the Big Five hyperscalers will spend more than $600 billion on infrastructure in 2026, roughly 75% of it on AI. That spend has to be recovered. Expect AI compute pricing on the hyperscalers to stay premium.
- GPU rental capacity is not getting cheaper. SemiAnalysis reports H100 one-year rental pricing rose roughly 40% between October 2025 and March 2026, with on-demand capacity sold out across most major providers and Blackwell lead times stretching into mid-2026. The “wait it out” strategy is not working.
- Regulation is forcing the conversation. DORA is live in the EU. Residency is now broadly solvable on public cloud, but the jurisdictional conflict between the US CLOUD Act and EU law cannot be contracted away inside a hyperscaler agreement when the provider is US-controlled. Startups selling into regulated buyers are already being asked these questions on procurement calls.
The AI startups that compound the fastest over the next two years are the ones that stop treating “stay on hyperscalers” as the safe default and start treating infrastructure as a per-workload decision that gets re-examined every six months. The underlying logic is the oldest one in finance: rent what you use briefly or unpredictably, own what you use heavily and continuously.
The window to make this decision on the company’s own terms is open. Moving too early costs capital. Moving too late costs the margin that funds everything else. What does not work is staying on hyperscalers because that is the architecture the team picked when there were five engineers, no customers, and no SLAs.
Arc Compute partners with AI startup CEOs and infrastructure leads on the move from public cloud to dedicated GPU infrastructure, structured around runway, revenue model, and growth trajectory. We have done this for LLM teams like Boson AI, for GPU cloud builders, and for AI-native product companies, and the playbook is the same: right-size the deployment, fold procurement and facility planning into one timeline, and get to production on a startup’s calendar, not a hyperscaler’s. If you are running your own six-signal score and want a second set of eyes on the numbers, that is a conversation we have often.
Sources
- Barclays CIO Survey Q4 2024. 86% of CIOs plan to repatriate some workloads, the highest rate on record.
- Andreessen Horowitz: The Cost of Cloud, a Trillion Dollar Paradox. Repatriation cuts cloud compute spend by one-third to one-half. $100B market value impact across 50 top public software companies.
- Andreessen Horowitz: Navigating the High Cost of AI Compute. Framework for AI startup compute decisions.
- CloudZero: Cloud GPU Pricing Comparison: AWS Vs Azure Vs GCP For AI Workloads (2026). H100 on-demand rates: AWS $3.90, Azure $6.98, GCP $3.00 per GPU-hour as of April 2026.
- SemiAnalysis: The Great GPU Shortage, Rental Capacity (March 2026). H100 1-year rental contract pricing rose ~40% from $1.70 to $2.35 per GPU-hour between October 2025 and March 2026.
- Uptime Institute: GPU Utilization Is a Confusing Metric (UII Update 362, May 2025). GPU training servers operational ~80% of time. 35% to 45% of peak silicon performance when running.
- IntuitionLabs: NVIDIA AI GPU Pricing Guide. H100 SXM5 pricing $27K-$40K per unit; 8-GPU systems $200K-$400K. Sources: TRG Datacenters, GDep, market reseller quotes.
- Nutanix Enterprise Cloud Index 2026. 57% of IT leaders require infrastructure within a single country.
- IEEE ComSoc Technology Blog: Hyperscaler Capex >$600B in 2026. Big Five hyperscaler infrastructure spend in 2026, ~75% allocated to AI. Sources: Goldman Sachs, CNBC, CreditSights.
- AWS Pricing (public): outbound internet data transfer up to $0.09/GB; cross-AZ transfer $0.01/GB each way.
- Arc Compute: Inference-Focused Scalable AI Factory (solution brief) and internal ROI modeling. Modeled three-year ROI of roughly 198% to 350% versus public cloud depending on idle-capacity monetization; turnkey managed-services model.
- Arc Compute: Boson AI Case Study. 65-node NVIDIA HGX H100 cluster, 400G InfiniBand fabric, 520 ConnectX-7 NICs, delivered in under four weeks against a 12-week industry norm; 100% cloud-to-on-prem migration.

