When Should an AI Startup Move Off the Cloud? A Practical Framework
Every AI startup CEO is running the same calculation in the back of their head. Compute is the largest line item on the P&L. Compute is also the input that determines how fast the team can ship, how predictably the product performs, and how long the runway lasts. When the line that funds the product becomes the line that threatens the company, the question stops being “how do we optimize cloud spend” and starts being “are we on the right kind of infrastructure at all?”
Andreessen Horowitz, in its widely cited “Navigating the High Cost of AI Compute” analysis, frames the choice plainly: cloud is the right starting point for most AI startups, but at scale the same elasticity that justified the premium becomes a margin problem the business cannot grow out of. As of April 2026, an on-demand NVIDIA H100 lists at:
- $3.00 per GPU-hour on GCP
- $3.90 per GPU-hour on AWS
- $6.98 per GPU-hour on Azure
Per CloudZero’s pricing comparison. Specialized GPU providers list comparable H100 SXM5 capacity at roughly $2.00 per GPU-hour. For a CEO, that is roughly 1.5x to 3.5x more spend for the same chip. A margin problem is a runway problem.
This is a framework for deciding when to move off the hyperscalers. Moving too early costs capital. Staying too long costs the company.

What Does “Moving Off the Hyperscalers” Actually Mean?
Moving off the hyperscalers means migrating GPU workloads from public cloud (AWS, Azure, Google Cloud) to dedicated infrastructure the startup owns or leases directly: on-premises hardware, a colocation facility, or a private GPU cloud. It does not abandon cloud principles like elasticity. It chooses infrastructure aligned with the predictability of the workload and the economics of the business.
The decision sequence:
- Start on hyperscalers. Fastest path to a working product.
- Stay until the workload is no longer experimental. Workload, customer base, and unit economics need to stabilize first.
- Move when the math flips. That is what the rest of this framework is for.
What is the Real Cost of Staying on Hyperscalers Too Long?
The real cost of staying on hyperscalers too long is compounding margin loss. Per Barclays’ Q4 2024 CIO Survey, 86% of CIOs plan to move workloads off public cloud, the highest rate ever recorded. Andreessen Horowitz found repatriation typically cuts compute spend by one-third to one-half, with cloud margin drag eroding $100 billion across 50 top software companies.
For an AI startup, the cost shows up in three places:
- The visible cost. $3.00 to $6.98 per H100 GPU-hour across the three hyperscalers as of April 2026 (CloudZero).
- The hidden cost. AWS publishes outbound internet data transfer at up to $0.09/GB. For an AI workload moving 10 TB per month, that single line is roughly $900, before NAT, ALB, cross-zone, or storage charges.
- The opportunity cost. Every dollar spent renting compute is a dollar not extending runway, hiring engineers, or improving the product.
The Six Signals: A CEO-Level Framework
When startup leadership teams ask how to know it is time to move, the answer is the same set of signals every time. None of them is sufficient on its own. Two or three appearing together usually mean the spreadsheet has already moved past the cloud.
Signal 1: Hyperscaler GPU spend has crossed $25,000 per month and keeps growing
This is the threshold where capex on owned hardware starts beating opex on rentals on a cash basis alone.
- H100 SXM5 GPU price: $27,000 to $40,000 per unit (IntuitionLabs / TRG Datacenters)
- Full 8-GPU HGX H100 system: $200,000 to $400,000 (chassis, CPUs, networking, integration)
- Comparable hyperscaler rental: ~$22,000 to $40,000 per node-month at on-demand rates
- Break-even at $25K/month cloud spend: owned node amortizes within a year on hardware alone
- At $40K/month: the math is no longer something the CFO needs to model
Signal 2: The workload is now predictable
Hyperscalers earn their premium when demand is unknowable. The day a CEO can forecast next quarter’s GPU demand within a reasonable range, the elasticity premium has stopped being insurance and started being a tax.
Strong repatriation candidates:
- Production inference on a stable model
- Continuous fine-tuning loops on a known cadence
- 24/7 serving workloads with diurnal traffic patterns
Not yet ready:
- Exploratory pre-PMF training
- Sporadic research jobs with no fixed cadence
Signal 3: Egress and ancillary fees are a meaningful share of the bill
Per AWS public pricing pages:
- Outbound internet transfer: up to $0.09/GB
- Cross-AZ transfer: $0.01/GB each way
- Plus: NAT gateways, application load balancers, storage API calls, cross-region traffic
When those line items together become a meaningful share of the bill, the public cloud has stopped being a compute service and started being a tollbooth. Owned infrastructure or a direct colocation arrangement removes that tax entirely.
Signal 4: GPU utilization is sustained above 40%
Owned GPUs only pay back when they are being used. According to the Uptime Institute’s May 2025 report on GPU utilization, GPU servers in training are operational ~80% of the time, and even well-tuned training jobs reach just 35% to 45% of peak silicon performance when running.
The thresholds:
- Below 30% sustained: cloud is cheaper. Stay.
- 40% to 70% sustained: owned hardware wins decisively.
- Above 70% sustained: every month spent on the hyperscaler is money set on fire.
Signal 5: Customer SLAs or compliance now require infrastructure control
The Nutanix Enterprise Cloud Index 2026 reports that 57% of IT leaders now require infrastructure within a single country. Driving forces:
- HIPAA (US healthcare)
- DORA (EU financial services)
- GDPR + the US CLOUD Act conflict
- The EU Data Act
For AI startups selling into healthcare, finance, government, or EU buyers, infrastructure control is no longer an internal preference. It is a contract condition.
Signal 6: The roadmap for this workload is firm for 18 to 24 months
GPU hardware amortizes over three to five years. The decision rule:
- Workload that might pivot in 6 months: stay rentable.
- Workload tied to proven product line, 18+ month roadmap: lock into infrastructure economics, not per-hour pricing.
The Cost Comparison That Matters at the Board Level
The table below consolidates the numbers AI startup CEOs and CFOs should be modeling when they run their own break-even scenario. Figures reflect 2026 published rate cards and analyst sources.
Two points are easy to miss inside a per-hour comparison:
- Dedicated infrastructure is not the opposite of cloud. Most AI startups that move off the hyperscalers run hybrid: dedicated infrastructure for the predictable production load, a small public cloud footprint for spikes or geographic reach.
- The savings do not come from the hardware line alone. They come from eliminating egress, sustaining higher utilization on fewer GPUs, and giving finance a forecastable run-rate instead of a variable one.
What This Looks Like in Practice: Boson AI
In 2025, Boson AI, a fast-moving LLM company building voice agents, faced exactly the conditions described above. Cloud bills were scaling faster than the business. The team needed visibility into networking, storage, and node topology that hyperscalers abstracted away. Standard OEM lead times for the hardware they needed exceeded 12 weeks.
Boson partnered with Arc Compute. The result:
- 65-node NVIDIA HGX H100 cluster on a 400G Quantum-2 InfiniBand fabric
- 520 ConnectX-7 NICs hand-installed on site
- Under 4 weeks delivery vs. the 12-week industry norm
- 100% cloud-to-on-prem migration the same week the cluster went live
The economics flipped from a recurring spend that “priced every new experiment” into fixed infrastructure costs that scaled with the business, not against it. The full Boson AI case study walks through the BOM-to-rack execution.
The lesson is not that on-prem is cheaper. The lesson is that the right partner makes the move from rented compute to owned compute fit inside a startup’s product timeline rather than fighting against it.
What Owning Compute Actually Unlocks for AI Startups
For CEOs running this calculation, the savings number is the headline. The strategic effects are bigger:
- Faster iteration. Production inference on dedicated infrastructure delivers predictable latency and throughput, which means the team ships features instead of debugging cloud queuing or noisy-neighbor variance. Foundation model training on owned clusters means experiments are no longer rationed by per-hour pricing. The procurement and design pain points behind that shift, including the GPU infrastructure challenges most AI and HPC teams hit, are the same patterns we see across startups making this move.
- Forecastable margin. Replacing a variable hyperscaler bill with a fixed run-rate gives the CFO numbers that hold for a year. That changes board conversations. It changes the kinds of customer contracts a startup can sign without writing margin guarantees into the pricing.
- A defensible architecture. For startups in regulated industries, infrastructure control is no longer an “ops” question. It is a sales question. The architecture that closes a HealthTech, FinTech, or government contract is rarely the same architecture that closed the seed round.
- Optionality. Once a startup owns or has dedicated access to its compute, hybrid becomes a real option. Cloud burst for spikes. Owned capacity for the steady state. The startup, not the hyperscaler, decides where each workload lives.
The Capital Question
The other reason CEOs hesitate is the capital outlay. That is a real concern, and it is also a solvable one. Two paths, depending on stage:
- CAPEX path. Best for startups with predictable workloads, high utilization, and a clear case for reducing per-unit compute cost. Common at Series B+ with sustained, high-utilization workloads.
- OPEX path. Best for earlier-stage teams or companies whose investors prefer operating expenses over capital outlays. Access dedicated infrastructure through leasing or consumption-based models without the upfront commitment.
Many AI startups start on OPEX to validate that dedicated infrastructure works for their workload, then transition to ownership once the economics are proven. Arc Compute structures both, and structures the path between them, which removes the financial commitment from the question of whether the move is technically right.
What the Next 12 to 24 Months Look Like
Three forces will sharpen this decision further through 2027:
- Hyperscaler AI capex is at unprecedented levels. Per Goldman Sachs and CNBC reporting compiled in IEEE ComSoc Technology Blog analysis, the Big Five hyperscalers will spend over $600 billion on infrastructure in 2026, roughly 75% of it on AI. That spend has to be recovered from somewhere. Customers should expect AI compute pricing on hyperscalers to remain premium.
- GPU rental capacity is not getting cheaper. SemiAnalysis reports H100 1-year rental contract pricing rose roughly 40% from $1.70 per GPU-hour in October 2025 to $2.35 per GPU-hour by March 2026, with on-demand capacity sold out across most major providers and Blackwell lead times stretching into mid-2026. The “wait it out and rent cheaper” strategy is no longer working.
- Regulation is forcing the conversation. DORA is live in the EU. The US CLOUD Act and EU Data Act create sovereignty conflicts that cannot be contracted away inside a hyperscaler MSA. Startups selling into regulated buyers are already being asked infrastructure questions on procurement calls.
The AI startups that compound the fastest over the next two years are the ones that stop treating “stay on hyperscalers” as the safe default and start treating infrastructure as a per-workload decision that gets re-examined every six months.
The window to make this decision on the company’s own terms is open. Moving too early costs capital. Moving too late costs the margin that funds everything else. What does not work is staying on hyperscalers because that is the architecture the team picked when there were five engineers, no customers, and no SLAs.
Arc Compute partners with AI startup CEOs and infrastructure leads on the move from public cloud to dedicated GPU infrastructure, structured around runway, revenue model, and growth trajectory. We have done this for LLM teams like Boson AI, GPU cloud builders, and AI-native product companies, and the playbook is the same: right-size the deployment, fold procurement and facility planning into one timeline, and get to production on a startup’s calendar, not a hyperscaler’s. If you are running your own six-signal score and want a second set of eyes on the numbers, that is a conversation we have often.
Sources
- Barclays CIO Survey Q4 2024. 86% of CIOs plan to repatriate some workloads, the highest rate on record.
- Andreessen Horowitz: The Cost of Cloud, a Trillion Dollar Paradox. Repatriation cuts cloud compute spend by one-third to one-half. $100B market value impact across 50 top public software companies.
- Andreessen Horowitz: Navigating the High Cost of AI Compute. Framework for AI startup compute decisions.
- CloudZero: Cloud GPU Pricing Comparison: AWS Vs Azure Vs GCP For AI Workloads (2026). H100 on-demand rates: AWS $3.90, Azure $6.98, GCP $3.00 per GPU-hour as of April 2026.
- SemiAnalysis: The Great GPU Shortage, Rental Capacity (March 2026). H100 1-year rental contract pricing rose ~40% from $1.70 to $2.35 per GPU-hour between October 2025 and March 2026.
- Uptime Institute: GPU Utilization Is a Confusing Metric (UII Update 362, May 2025). GPU training servers operational ~80% of time. 35% to 45% of peak silicon performance when running.
- IntuitionLabs: NVIDIA AI GPU Pricing Guide. H100 SXM5 pricing $27K-$40K per unit; 8-GPU systems $200K-$400K. Sources: TRG Datacenters, GDep, market reseller quotes.
- Nutanix Enterprise Cloud Index 2026. 57% of IT leaders require infrastructure within a single country.
- IEEE ComSoc Technology Blog: Hyperscaler Capex >$600B in 2026. Big Five hyperscaler infrastructure spend in 2026, ~75% allocated to AI. Sources: Goldman Sachs, CNBC, CreditSights.
- AWS Pricing (public): outbound internet data transfer up to $0.09/GB; cross-AZ transfer $0.01/GB each way.






