The Short Answer:
GVM Server, Arc's GPU hypervisor, has an exclusive feature called Simultaneous Multi-Virtual GPU (SMVGPU). SMVGPU enables the virtualization of multiple multiplexed virtual GPUs into a single virtual machine, something that no other hypervisor can do. This, along with its superior ability to allocate GPU memory at run-time* (more on this at the end of the post), means that when you're utilizing virtual GPUs in the Arc cloud you'll always be allocated the amount of resources you signed up for, but you're often allocated even more. To illustrate this let's take a look at an example.
Let's say you need access to 2 x A100 40 GB GPUs to train a complex neural network model. The standard way to do this is to reserve a cloud instance (VM) with 2 full A100 GPUs passed into it. This configuration is shown in Configuration 1 below. You'll get this type of configuration with any other cloud provider.
If you were to require 2 x A100 40 GB GPUs in the Arc cloud, you wouldn't be allocated just two graphics cards. Instead, you would be allocated 4 half vGPUs (half a multiplexed virtual GPU). This configuration is illustrated in Configuration 2.
You're allocated the exact same amount of resources in both situations (the equivalent of 2 x A100 40 GB GPUs), but you're limited to just the resources of those two cards in Configuration 1. This isn't the case for Configuration 2. Thanks to GVM Server's ability to allocate GPU resources at run-time, assuming the other halves of the 4 cards you’re using aren’t being utilized (or are being under-utilized) by someone else's workloads, you'll actually be allocated some of the resources of those halves as well. Due to the staggered nature of workloads (especially across different time zones), you'll be allocated more resources than you signed up for 99% of the time, which increases performance and reduces the time it takes for your workload to run.
An added bonus of this performance boost in the Arc cloud is that you can often get away with using fewer GPU resources than you're used to. In the above example of needing 2 x A100 40 GB GPUs to train your workloads, you would likely only need 1 x A100 40 GB vGPU in the Arc cloud (AKA 2 half virtualized GPUs) (assuming someone else's workloads aren't fully utilizing the other halves of those GPUs).
*Arc's hypervisor's allocation of GPU resources at run-time is more advanced than other hypervisors due to its ability to virtualize GPUs into more complex configurations. With multiple multiplexed virtual GPUs passed through into a single VM, that instance will benefit from the load-balanced resources across all GPUs that it’s utilizing (including any part of the GPUs that it’s not technically allocated)