GPU Utilization & the TCO of Infrastructure

How CTOs and HPC Managers are Increasing GPU Utilization, Lowering the Total Cost of Ownership of Their On-Premise Infrastructure

Anton Allen
Vice President of Sales
March 2, 2023

Under-utilized Resources

IT decision-makers must rapidly adapt to new frameworks and technologies to best serve their clients when building, managing, and optimizing on-premise infrastructure. One of the primary issues faced across industries is the under-utilization of computing resources, especially GPUs. 

When working with AI/ML models, considerable investments in GPU servers are required to provide the necessary environments for testing and training complex algorithms. These environments encompass hundreds to tens of thousands of GPUs, where teams aim to squeeze the most PetaFLOPs out of the underlying chipsets as possible. With an average GPU utilization of only 10%, key decision-makers have been wary of adapting to new HPC hardware as their current resources remain stagnant.

Are Job Schedulers the Solution?

Job schedulers, like SLURM, have been one of the only tools available for addressing utilization issues. They can be great tools for queueing and organizing jobs but fail at maximizing utilization. Greedy code, human error, and static resources plague job schedulers. Without intensive professional intervention, GPUs consistently remain under-utilized. These utilization issues can only be thoroughly addressed by Arc Compute’s GPU/CPU hypervisor, GVM Server.

GVM Server enables "Real Utilization" by addressing and repurposing idle/under-utilized compute resources, such as execution capabilities and VRAM during runtime, allowing up to 100% utilization as long as there are workloads available for processing. This translates into faster job training times with far less opportunity cost of idle resources. GVM Server can be fully integrated under most job schedulers within an organization's tech stack. 

 

New & Improved GPUs

The explosive performance growth in NVIDIA’s H100s versus A100s and Intel’s Datacenter GPU Max Series breathes new excitement (and problems) into the world of Exascalers and supercomputers as they try to double, triple, quadruple, and quintuple PetaFLOPs. Breakthrough technology looks great, but many ask, “how do we ensure we get the most out of it given the total cost of ownership and technical investment requirements.” Spending hundreds of thousands of dollars on new GPUs can be hard to justify when overall utilization is so low.

 

The Solution: GVM Server + Job Scheduler

For maximizing utilization, a job orchestration and scheduling tool is necessary to ensure a consistent funnel of work for HPC infrastructures but, without GVM Server, you’re only addressing part of the underlying issue. Pairing a job scheduler with GVM Server encompasses a complete solution for lowering the total cost of ownership of next-generation infrastructure and makes considerable investments far more justifiable to key decision-makers.  When both technologies are present in the tech stack, users can address both ends of the utilization problem, minimizing the complexity of job schedulers and maximizing the ROI of new hardware. 

Thanks to GVM Server ensuring compute resources are automatically provisioned/reprovisioned, removing barriers to idle compute silos, it has never been easier to maximize utilization and lower the total cost of ownership of on-premise GPU-accelerated infrastructure. 

Looking to learn more about Arc Compute?
Read our latest white papers and case studies.
GVM Server - 100% Utilization POC
The following results are from tests we ran to demonstrate the performance benefits and limitations of GVM Server, which provides a way forward for further proof of concept tests within your organization’s infrastructure.
Thank you for your submission!
Read Now
Oops! Something went wrong while submitting the form.
GVM Server - Organization-Level Provisioning with Nested Roles
Organization-level provisioning is a nested roles feature that allows organizations to manage data and resources for teams/projects hierarchically.
Thank you for your submission!
Read Now
Oops! Something went wrong while submitting the form.
GVM Server - Solution Brief
GVM Server is Arc Compute's GPU/CPU hypervisor which is an all-in-one GPU utilization and virtualization solution.
Thank you for your submission!
Read Now
Oops! Something went wrong while submitting the form.
Arc Compute - Company Summary
Arc Compute's customers have one thing in common; they are all large consumers of GPUs who are tired of the current cloud business models and are looking for better, transparent pricing and better performance and security.
Thank you for your submission!
Read Now
Oops! Something went wrong while submitting the form.
Arc Compute Powers GPU Cloud Offering with Liqid
"Arc Compute, the only cloud service provider to offer Liqid’s revolutionary composable disaggregated infrastructure (CDI) as a service, proposed a GPU cloud option that offered the immersive video company a far more flexible and cost-effective solution".
Thank you for your submission!
Read Now
Oops! Something went wrong while submitting the form.
GVM Server - Superior GPU Utilization and Performance
As you will see in the following benchmarks, by utilizing GVM Server, your workloads can train up to 80% faster thanks to improved utilization of GPU resources.
Thank you for your submission!
Read Now
Oops! Something went wrong while submitting the form.
Connect with us
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Arc Blog

Arc Compute: a custom GPU cloud provider
February 27, 2023
Read More
GPU Utilization & Total Cost of Infrastructure Ownership

GPU Utilization & Total Cost of Infrastructure Ownership

Anton Allen
March 2, 2023
One of the primary issues faced across industries is the under-utilization of computing resources, especially GPUs. 
Read More
NVIDIA H100 PCIe vs. SXM5

NVIDIA H100 PCIe vs. SXM5

Erik Kimmerer
February 27, 2023
With NVIDIA being the leading player in the GPU market, it’s challenging to determine which NVIDIA GPU server is suitable for your company. In this blog post, I will compare the PCIe and SXM5 form factors for NVIDIA H100 GPUs, the highest-performing GPUs currently available, and contrast performance and costs to help you make an informed decision.‍
Read More
Addressing Utilization Issues with GPU Job Schedulers

Addressing Utilization Issues with GPU Job Schedulers

Anton Allen
February 10, 2023
A GPU Job Scheduler is a tool that manages and schedules the allocation of GPUs in a cluster environment, although, they have drawbacks when it comes to maximizing utilization and performance.
Read More
GVM Server - Nested Roles Explained

GVM Server - Nested Roles Explained

Erik Kimmerer
January 10, 2023
Learn all about one of GVM Server's primary benefits: organization-level provisioning, a nested roles feature that allows organizations to manage data and resources hierarchically for teams/projects.
Read More
LibVF.IO: Add GPU Virtual Machine Support

LibVF.IO: Add GPU Virtual Machine Support

Arthur Rasmusson
August 24, 2022
LibVF.IO (vGPU & SR-IOV on Consumer GPUs) has added support for GPU Virtual Machine (GVM).
Read More
Experience Better GPU Performance with GVM Server

Experience Better GPU Performance with GVM Server

Erik Kimmerer
August 23, 2022
Learn how Arc's GPU/CPU hypervisor, GVM Server, increases GPU performance and utilization through exclusive configurations made possible thanks to Simultaneous Multi-Virtual GPU
Read More
The Web Browser Landscape

The Web Browser Landscape

Arthur Rasmusson
June 4, 2021
As I’m sure many people have heard over the course of the last few days Chrome’s developers have chosen to change the way Chrome’s advertising, JavaScript, XHR connection, CSS, and iframe...
Read More
Closed Investment Round with OPN & Supporters Fund

Closed Investment Round with OPN & Supporters Fund

Justin Ritchie
June 5, 2021
Typically, when a GPU cloud consumer is utilizing their provider’s GPU compute, the provider must either run single physical devices per user or instead use expensive multi-user sharing...
Read More
Why Augmented Reality is Not Ready

Why Augmented Reality is Not Ready

Arthur Rasmusson
June 24, 2021
What enabled VR to become functionally capable of inducing reliable "presence" (the qualitative threshold for experiences that convince all the cognitive systems that make up your conscious...
Read More
Learning from OpenBSD to Make Computers Better

Learning from OpenBSD to Make Computers Better

Arthur Rasmusson & Louis Castricato
December 5, 2019
This is an attempt to consolidate down a number of threads spanning separate discussions from around the 'net I have been having on the subject of operating system development models and...
Read More
Looking to learn more about Arc Compute?
Read our latest white papers and case studies.
Arc Compute GPU Cloud Infrastructure

GVM Server - 100% Utilization POC

The following results are from tests we ran to demonstrate the performance benefits and limitations of GVM Server, which provides a way forward for further proof of concept tests within your organization’s infrastructure.
Download Now
Arc Compute GPU Cloud Infrastructure

GVM Server - Organization-Level Provisioning with Nested Roles

Organization-level provisioning is a nested roles feature that allows organizations to manage data and resources for teams/projects hierarchically.
Download Now
Arc Compute GPU Cloud Infrastructure

GVM Server - Solution Brief

GVM Server is Arc Compute's GPU/CPU hypervisor which is an all-in-one GPU utilization and virtualization solution.
Download Now
Arc Compute GPU Cloud Infrastructure

Arc Compute - Company Summary

Arc Compute's customers have one thing in common; they are all large consumers of GPUs who are tired of the current cloud business models and are looking for better, transparent pricing and better performance and security.
Download Now
Arc Compute GPU Cloud Infrastructure

Arc Compute Powers GPU Cloud Offering with Liqid

"Arc Compute, the only cloud service provider to offer Liqid’s revolutionary composable disaggregated infrastructure (CDI) as a service, proposed a GPU cloud option that offered the immersive video company a far more flexible and cost-effective solution".
Download Now
Arc Compute GPU Cloud Infrastructure

GVM Server - Superior GPU Utilization and Performance

As you will see in the following benchmarks, by utilizing GVM Server, your workloads can train up to 80% faster thanks to improved utilization of GPU resources.
Download Now