In training and inference architecture requirements, what is the main difference between training and inference?
Which technology partitions a single GPU into isolated instances for parallel workloads?
What is a direct benefit of using GPUDirect RDMA for multi-server workloads?
When monitoring a GPU-based workload, what is GPU utilization?
What NVIDIA tool should a data center administrator use to monitor NVIDIA GPUs?
When deploying high-density workloads in a data center, what are the three main resource constraints that need to be considered?
A simul-ation is bottlenecked by memory transfer speeds. Which GPU architectural feature addresses this?
What is a key advantage of dynamic, priority-based job scheduling in an AI cluster?
Which are three key features of InfiniBand networking technology?
An engineer is training an autonomous robot to interact with the real world, completing tasks like moving objects from one place to another. Which type of machine learning should be used?
Which NVIDIA parallel computing platform and programming model allows developers to program in popular languages and express parallelism through extensions?
What should an AI operations team do to maintain consistency when scaling workloads across different environments?
Which phase of deep learning benefits the greatest from a multi-node architecture?
What aspect of AI infrastructure design is MOST critical for ensuring high availability of production AI services during hardware or node failures?
Which of the following NVIDIA tools is primarily used for monitoring and managing AI infrastructure in the enterprise?
What is a key value of using NVIDIA NIMs?
What is the importance of a job scheduler in an AI resource-constrained cluster?
An IT professional is considering whether to implement an on-prem or cloud infrastructure. Which of the following is a key advantage of on-prem infrastructure?
How many distinct network fabrics are in an AI cluster?
Engineers are troubleshooting slow step time and poor scaling efficiency in a multi-rack distributed AI training cluster. Which infrastructure change is MOST likely to improve end-to-end training performance?
How many 1 Gb Ethernet in-band network connections are in a DGX H100 system?