Win the AI & HPC race.
Optimized kernel scheduling.
Higher GPU throughput.
Max performance per watt.
ArcHPC revolutionizes the operation and management of your GPU resources with advanced kernel scheduling. By leveraging memory-level parallelism, GPU throughput is significantly increased, enhancing your infrastructure's performance per watt.
Responsible for creating and managing your HPC environment while optimizing GPU utilization and performance. Nexus enables administrators to enhance user and task density.
Automates task matching and deployment across your cluster and increases accelerated hardware performance through enterprise-wide scalable controls during run-time.
Optimizes warp and thread arrangement, and kernel execution to reduce latency and task completion times, ensuring less computational waste.
By utilizing memory-level parallelism, ArcHPC keeps Streaming Multiprocessors (SMs) continuously engaged, enhancing compute efficiency.
Installed at the hypervisor level, capturing tasks and optimizing them as they are deployed to your existing job scheduler.
Fine-tunes the task environment, reducing compute times and automating the deployment of tasks to maximize GPU throughput.
Infrastructure maintainers can set policies aligning with business objectives, ensuring a balance between performance and organizational goals.
Monitors performance and adapts to changing workloads, reducing start-up costs and latencies, achieving significant energy savings.
Accelerating AI & HPC one kernel at a time.
With NVIDIA GPUs, "L2 Cache Crossbar Latency" refers to the delay in accessing data from the L2 cache via the crossbar, an internal communication mechanism. This latency results in unnecessary GPU cycles and is especially significant for data requests from "far SMs" (Streaming Multiprocessors) that are farther from the L2 cache or less efficiently connected. Reducing this latency is crucial for improving computational efficiency and data throughput while minimizing bottlenecks. Effective management of this latency is vital for optimal GPU performance.
ArcHPC overcomes this latency by managing which SMs are presented to kernels at any given time. This management lets you control where the data is loaded, ensuring it remains "local" to the SMs in its fast-path network, avoiding unnecessary GPU cycles caused by data access through the crossbar.
In shared computing environments that utilize job schedulers like SLURM, a common challenge is the inability to fractionalize GPU resources effectively. Most job schedulers cannot allocate partial GPU resources to multiple jobs simultaneously, leading to inefficiencies where each job requires an entire GPU, even if it doesn’t fully utilize its capabilities.
ArcHPC overcomes this limitation in virtualized and bare-metal environments by enabling fine-grained resource allocation, allowing multiple jobs to share a single GPU efficiently. This fractionalization maximizes GPU utilization and enhances overall system performance.
HPC and deep learning applications often consist of multiple kernels, each needing different GPU resources at various stages. This variability can lead to underutilization when GPUs are allocated based on peak demands, as less intensive kernels don't fully use the hardware. Conversely, allocating based on minimal needs can cause bottlenecks during demanding kernels. This mismatch complicates scheduling, increases latency, and makes performance tuning difficult, as static resource allocation doesn't adapt to dynamic workloads.
ArcHPC enables dynamic resource provisioning during run-time, eliminating these bottlenecks.
Regulating power/thermal is crucial due to the significant energy demands of GPUs. High power draw can increase costs, strain electrical infrastructure, and cause instability. Fluctuating workloads complicate energy management, requiring advanced strategies to prevent overloads like GPUs “falling off the bus” and maintain performance. Inefficient power management also intensifies cooling needs, making it essential to balance performance with energy efficiency and cost.
ArcHPC addresses these challenges by optimizing power distribution and managing workloads to reduce energy consumption and heat generation, ensuring stable and efficient HPC operations.
Nexus manages your HPC environment's compute resources while communicating and executing live adjustments based on Oracle interactions. Together, these solutions respond to dynamic changes during runtime while maximizing the utilization and performance thresholds of the underlying infrastructure.
Oracle automates task matching and deployment by analyzing machine code and latencies in your accelerated hardware architecture. This optimizes code deployment, ensuring tasks are executed in the most optimally tuned and calibrated HPC environments while factoring in user-defined governance aligned with business objectives across your entire HPC ecosystem.
Job Scheduler | Manual Task Matching | ArcHPC | |
---|---|---|---|
Widely available | |||
Easy of implementation/use | |||
Increases GPU performance | |||
Addresses latency issues | |||
Controls code optimization cycle | |||
No chance of performance degragation | |||
Granularly manages compute HPC environment | |||
Doesn't requires large investment in human capital | |||
Scalable across entire organization | |||
Highly secure, regardless of task strength | |||
Automated task and kernel matching | |||
Accessible to organizations of all sizes |