Weekend Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: save70

Free and Premium NVIDIA NCP-AIO Dumps Questions Answers

Page: 1 / 5
Total 66 questions

NVIDIA AI Operations Questions and Answers

Question 1

A system administrator needs to lower latency for an AI application by utilizing GPUDirect Storage.

What two (2) bottlenecks are avoided with this approach? (Choose two.)

Options:

A.

PCIe

B.

CPU

C.

NIC

D.

System Memory

E.

DPU

Buy Now
Question 2

You are managing a deep learning workload on a Slurm cluster with multiple GPU nodes, but you notice that jobs requesting multiple GPUs are waiting for long periods even though there are available resources on some nodes.

How would you optimize job scheduling for multi-GPU workloads?

Options:

A.

Reduce memory allocation per job so more jobs can run concurrently, freeing up resources faster for multi-GPU workloads.

B.

Ensure that job scripts use --gres=gpu: and configure Slurm’s backfill scheduler to prioritize multi-GPU jobs efficiently.

C.

Set up separate partitions for single-GPU and multi-GPU jobs to avoid resource conflicts between them.

D.

Increase time limits for smaller jobs so they don’t interfere with multi-GPU job scheduling.

Question 3

An instance of NVIDIA Fabric Manager service is running on an HGX system with KVM. A System Administrator is troubleshooting NVLink partitioning.

By default, what is the GPU polling subsystem set to?

Options:

A.

Every 1 second

B.

Every 30 seconds

C.

Every 60 seconds

D.

Every 10 seconds

Question 4

A GPU administrator needs to virtualize AI/ML training in an HGX environment.

How can the NVIDIA Fabric Manager be used to meet this demand?

Options:

A.

Video encoding acceleration

B.

Enhance graphical rendering

C.

Manage NVLink and NVSwitch resources

D.

GPU memory upgrade

Question 5

An administrator is troubleshooting issues with an NVIDIA Unified Fabric Manager Enterprise (UFM) installation and notices that the UFM server is unable to communicate with InfiniBand switches.

What step should be taken to address the issue?

Options:

A.

Reboot the UFM server to refresh network connections.

B.

Install additional GPUs in the UFM server to boost connectivity.

C.

Disable the firewall on the UFM server to allow communication.

D.

Verify the subnet manager configuration on the InfiniBand switches.

Question 6

A Fleet Command system administrator wants to create an organization user that will have the following rights:

For locations - read only

For Applications - read/write/admin

For Deployments - read/write/admin

For Dashboards - read only

What role should the system administrator assign to this user?

Options:

A.

Fleet Command Operator

B.

Fleet Command Admin

C.

Fleet Command Supporter

D.

Fleet Command Viewer

Question 7

You have noticed that users can access all GPUs on a node even when they request only one GPU in their job script using --gres=gpu:1. This is causing resource contention and inefficient GPU usage.

What configuration change would you make to restrict users’ access to only their allocated GPUs?

Options:

A.

Increase the memory allocation per job to limit access to other resources on the node.

B.

Enable cgroup enforcement in cgroup.conf by setting ConstrainDevices=yes.

C.

Set a higher priority for Jobs requesting fewer GPUs, so they finish faster and free up resources sooner.

D.

Modify the job script to include additional resource requests for CPU cores alongside GPUs.

Question 8

An administrator is troubleshooting a bottleneck in a deep learning run time and needs consistent data feed rates to GPUs.

Which storage metric should be used?

Options:

A.

Disk I/O operations per second (IOPS)

B.

Disk free space

C.

Sequential read speed

D.

Disk utilization in performance manager

Question 9

Your organization is running multiple AI models on a single A100 GPU using MIG in a multi-tenant environment. One of the tenants reports a performance issue, but you notice that other tenants are unaffected.

What feature of MIG ensures that one tenant's workload does not impact others?

Options:

A.

Hardware-level isolation of memory, cache, and compute resources for each instance.

B.

Dynamic resource allocation based on workload demand.

C.

Shared memory access across all instances.

D.

Automatic scaling of instances based on workload size.

Question 10

What is the primary purpose of assigning a provisioning role to a node in NVIDIA Base Command Manager (BCM)?

Options:

A.

To configure the node as a container orchestration manager

B.

To enable the node to monitor GPU utilization across the cluster

C.

To allow the node to manage software images and provision other nodes

D.

To assign the node as a storage manager for certified storage

Question 11

An administrator requires full access to the NGC Base Command Platform CLI.

Which command should be used to accomplish this action?

Options:

A.

ngc set API

B.

ngc config set

C.

ngc config BCP

Question 12

A system administrator needs to scale a Kubernetes Job to 4 replicas.

What command should be used?

Options:

A.

kubectl stretch job --replicas=4

B.

kubectl autoscale deployment job --min=1 --max=10

C.

kubectl scale job --replicas=4

D.

kubectl scale job -r 4

Question 13

A system administrator is experiencing issues with Docker containers failing to start due to volume mounting problems. They suspect the issue is related to incorrect file permissions on shared volumes between the host and containers.

How should the administrator troubleshoot this issue?

Options:

A.

Use the docker logs command to review the logs for error messages related to volume mounting and permissions.

B.

Reinstall Docker to reset all configurations and resolve potential volume mounting issues.

C.

Disable all shared folders between the host and container to prevent volume mounting errors.

D.

Reduce the size of the mounted volumes to avoid permission conflicts during container startup.

Question 14

An administrator is troubleshooting issues with NVIDIA GPUDirect storage and must ensure optimal data transfer performance.

What step should be taken first?

Options:

A.

Increase the GPU's core clock frequency.

B.

Upgrade the CPU to a higher clock speed.

C.

Check for compatible RDMA-capable network hardware and configurations.

D.

Install additional GPU memory (VRAM).

Question 15

You are deploying AI applications at the edge and want to ensure they continue running even if one of the servers at an edge location fails.

How can you configure NVIDIA Fleet Command to achieve this?

Options:

A.

Use Secure NFS support for data redundancy.

B.

Set up over-the-air updates to automatically restart failed applications.

C.

Enable high availability for edge clusters.

D.

Configure Fleet Command's multi-instance GPU (MIG) to handle failover.

Question 16

A new researcher needs access to GPU resources but should not have permission to modify cluster settings or manage other users.

What role should you assign them in Run:ai?

Options:

A.

L1 Researcher

B.

Department Administrator

C.

Application Administrator

D.

Research Manager

Question 17

If a Magnum IO-enabled application experiences delays during the ETL phase, what troubleshooting step should be taken?

Options:

A.

Disable NVLink to prevent conflicts between GPUs during data transfer.

B.

Reduce the size of datasets being processed by splitting them into smaller chunks.

C.

Increase the swap space on the host system to handle larger datasets.

D.

Ensure that GPUDirect Storage is configured to allow direct data transfer from storage to GPU memory.

Question 18

A data scientist is training a deep learning model and notices slower than expected training times. The data scientist alerts a system administrator to inspect the issue. The system administrator suspects the disk IO is the issue.

What command should be used?

Options:

A.

tcpdump

B.

iostat

C.

nvidia-smi

D.

htop

Question 19

A system administrator wants to run these two commands in Base Command Manager.

main

showprofile device status apc01

What command should the system administrator use from the management node system shell?

Options:

A.

cmsh -c “main showprofile; device status apc01”

B.

cmsh -p “main showprofile; device status apc01”

C.

system -c “main showprofile; device status apc01”

D.

cmsh-system -c “main showprofile; device status apc01”

Page: 1 / 5
Total 66 questions