Tech Comparisons

GPU Job Scheduler Vs. ArcHPC

GVM Server Vs.Request Consultation

What is a GPU Job Scheduler?

A GPU Job Scheduler is a tool that manages and schedules the allocation of GPUs in a cluster environment. They enable the efficient utilization of GPU resources by allocating them to the jobs that need them. Schedulers also provide a unified interface for submitting, monitoring, and controlling the execution of GPU jobs in clusters.

Although schedulers can be very useful to Systems Administrators, they have drawbacks when it comes to truly maximizing utilization and performance. These issues are addressed by Arc Compute's ArcHPC.

To increase GPU utilization when multiple jobs need to be trained, many Schedulers rely on multi-tenancy through MIG (Multi-Instance GPU). Let's run through an example to highlight the differences between Schedulers utilizing MIG and Arc Compute utilizing GVM Server's feature SMVGPU (Simultaneous Multi-Virtual GPU).
GVM Server sits within the
management layer of data center
nodes offering lower level GPU/
CPU optimizations than any job
scheduler can.

MIG Vs. SMVGPU

GPU Utilization with GPU Job Scheduler (MIG)GPU Utilization with SMVGPU
In both instances three jobs are sharing a single virtualized GPU to train their workloads with no additional jobs queued.
GPU Allocation Across Jobs
In this scenario, a job scheduler has allocated a third of a virtualized GPU to each job. When Job #1 finishes after two hours its allocated resources will remain idle, the same goes for Job #2 when it finishes after 4 hours. Therefore, jobs are limited to the resources they are initially allocated when using a Scheduler. This isn't the case while utilizing GVM Server.

It should be noted that these idle resources would be reallocated to inactive jobs that are queued up, if there were any.

Unlike MIG, SMVGPU can allocate VRAM on a continuous and non-continuous basis, enabling the automated redistribution of idle resources at runtime.

With ArcHPC, when Job #1 finishes after 2 hours its GPU resources are automatically reallocated to Job #2 and #3. This enables both remaining jobs to train more efficiently. Job #2 can finish an hour faster than it can while using MIG. When Job #2 finishes, its resources are reallocated to Job #3, enabling Job #3 to finish 2 hours faster, as it’s allocated the entire GPU for its last hour of training. This automated reallocation of GPU resources at runtime enables the ability to reach high levels of GPU optimization and 100% utilization.


While the premier version of ArcHPC doesn't completely replace all of the functionalities of a GPU Scheduler, it sits below them in the data center tech stack, meaning that a Scheduler can be seamlessly integrated into ArcHPC.

Job Scheduler

Increases GPU Utilization
ArcHPC
Completely Optimizes Utilization

GPU Job Scheduler

ArcHPC

Reduces workflow bottlenecks

Does Have Feature
Does Have Feature

GPU pooling

Does Have Feature
Does Have Feature

Ensures VRAM assignment

Does Have Feature
Does Have Feature

Cluster management

Does Have Feature
Does Have Feature

Allocates VRAM on non-continuous basis

Does Not Have Feature
Does Have Feature

Granular control over low-level GPU resources

Does Not Have Feature
Does Have Feature

Automated optimization

Does Not Have Feature
Does Have Feature

VDI capabilities

Does Not Have Feature
Does Have Feature

No command line required

Does Not Have Feature
Does Have Feature

GPU architecture backwards compatibility

Does Not Have Feature
Does Have Feature

Enables 100% GPU utilization

Does Not Have Feature
Does Have Feature

Optimize Your GPUs with

ArcHPC
ArcHPC takes control of your GPU infrastructure and optimizes GPU resource utilization, leading to increased throughput and improved performance through in-depth analysis and the mapping of low-level utilization and optimization points.
Learn More
Optimized GPU Utilization
Takes granular control of all the components within a GPU, repurposing under-utilized resources during runtime, enabling reduced cycle times.
Accelerated AI Advancements
Develop AI models quicker with optimized utilization of GPU resources. Run jobs simultaneously on the same GPUs and increase performance by up to 206%.
Reduced Hardware Requirements
It's more important than ever to get the most out of your HPC infrastructure. Get more out of your GPUs and reduce your future hardware needs.