InfiniBand’s advantage over Ethernet lies in its lower latency, achieved through a streamlined protocol and hardware offloads, delivering microsecond-scale communication critical for AI clusters. While InfiniBand often offers high bandwidth, Ethernet can match or exceed it (e.g., 400 GbE), and Ethernet supports RDMA via RoCE, making latency the standout differentiator.
(Reference: NVIDIA Networking Documentation, Section on InfiniBand vs. Ethernet)
Question 2
What is a common tool for container orchestration in AI clusters?
Options:
A.
Kubernetes
B.
MLOps
C.
Slurm
D.
Apptainer
Answer:
A
Explanation:
Kubernetes is the industry-standard tool for container orchestration in AI clusters, automating deployment, scaling, and management of containerized workloads. Slurm manages job scheduling, Apptainer (formerly Singularity) runs containers, and MLOps is a practice, not a tool, making Kubernetes the clear leader in this domain.
(Reference: NVIDIA AI Infrastructure and Operations Study Guide, Section on Container Orchestration)
Question 3
What NVIDIA tool should a data center administrator use to monitor NVIDIA GPUs?
Options:
A.
NVIDIA System Monitor
B.
NetQ
C.
DCGM
Answer:
C
Explanation:
The NVIDIA Data Center GPU Manager (DCGM) is the recommended tool for data center administrators to monitor NVIDIA GPUs. It provides real-time health monitoring, telemetry (e.g., utilization, temperature), and diagnostics, tailored for large-scale deployments. NetQ focuses on network monitoring, and there’s no “NVIDIA System Monitor” in this context, making DCGM the correct choice.(Note: The document incorrectly lists D; C is intended.)