Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free Access NVIDIA NCP-AII New Release

Page: 5 / 9
Total 123 questions

NVIDIA AI Infrastructure Questions and Answers

Question 17

When configuring an out-of-core HPL burn-in for a 40B matrix on 8x H100 nodes, which environment variable prevents GPU out-of-memory errors while reserving space for drivers?

Options:

A.

export HPL_OOC_SAFE_SIZE=4.0

B.

export HPL_OOC_MODE=0

C.

export HPL_OOC_NUM_STREAMS=8

D.

export HPL_OOC_MAX_GPU_MEM=90

Question 18

An enterprise is deploying an AI Factory using NVIDIA DGX BasePOD architecture. The infrastructure team must ensure high availability and efficient data transfer between compute nodes. Which network topology should they implement for the InfiniBand fabric?

Options:

A.

Simple ring topology connecting all nodes in a loop.

B.

Fat-Tree topology with rail-optimized design.

C.

Single flat Ethernet network for all traffic.

D.

Star topology with all nodes connected to a single central switch.

Question 19

During a 48-hour NeMo question-answering model burn-in test, GPU memory errors occur when processing large datasets. Which configuration strategy prevents Out-of-Memory (OOM) errors while maintaining processing efficiency?

Options:

A.

Set blocksize= " 1GB " for data loading and enable RMM asynchronous allocation.

B.

Switch from FP16 to FP32 precision for numerical stability.

C.

Disable add_filename for Parquet files to reduce metadata.

D.

Increase files_per_partition to 1000 for larger batch processing.

Question 20

A network engineer is tasked with configuring the management, storage, and compute networks for a new DGX BasePOD deployment. Which statement best describes the network segmentation required for optimal operation?

Options:

A.

A single VLAN for all types of network traffic.

B.

Two networks: one for management and one for compute.

C.

Four networks: compute, storage, out-of-band, and management.

Page: 5 / 9
Total 123 questions