Practice Free NCP-AII Exam Online Questions – Page 6

Question #51

An engineer needs to validate NVLink Switch functionality on a DGX H100 system with 8 GPUs.

Which NCCL command verifies intra-node NVLink bandwidth?

A . broadcast_perf -b 8 -e 16G -f 2 -g 8 without split configuration
B . all_reduce_perf -b 8 -e 16G -f 2 -g 4 with NCCL_TESTS_SPLIT="MOD 2"
C . all_reduce_perf -b 8 -e 16G -f 2 -g 1 repeated 8 times
D . all_reduce_perf -b 8 -e 16G -f 2 -g 8 with NCCL_TESTS_SPLIT="OR 0x7"

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

The NVIDIA Collective Communications Library (NCCL) "Tests" are used to verify the maximum achievable bandwidth of the interconnects. On a DGX H100, the GPUs are connected via a dedicated

high-bandwidth NVLink Switch fabric (NVLink 4), which provides significantly higher throughput than PCIe. To validate the intra-node (within a single server) performance, the all_reduce_perf test is used. The command in Option D is specifically designed to stress all 8 GPUs (-g 8) across a wide range of message sizes (8 bytes to 16G). The use of the environment variable NCCL_TESTS_SPLIT with the bitwise "OR" or "AND" masks allows the engineer to isolate specific traffic patterns or groups of GPUs to ensure the NVLink switches are distributing the load evenly. For a standard 8-GPU H100 tray, achieving a "Bus Bandwidth" of ~450 GB/s to 900 GB/s (depending on the precision and message size) confirms that the NVLink fabric is operating at its theoretical peak. Using only 4 GPUs (Option B) or 1 GPU (Option C) would not provide a complete picture of the NVLink switch bisection bandwidth.

Question #52

You are tasked with setting up network fabric ports to connect several servers, each with multiple NVIDIA GPUs, to an InfiniBand switch. Each server has two ConnectX-6 adapters.

What is the best strategy to maximize bandwidth and redundancy between the servers and the InfiniBand fabric?

A . Connect only one adapter from each server to the switch to minimize cable clutter.
B . Connect both adapters from each server to the same switch, but do not configure link aggregation.
C . Connect both adapters from each server to the same switch and configure link aggregation (LACP or static LAG) on both the server and the switch.
D . Connect one adapter from each server to one switch, and the second adapter to a different switch, without link aggregation.
E . Connect one adapter from each server to one switch, and the second adapter to a different switch, and configure multi-pathing on the servers.

Reveal Solution Hide Solution

Correct Answer: E
E

Explanation:

Connecting each adapter to a different switch and configuring multi-pathing provides the highest level of bandwidth and redundancy. Link aggregation to the same switch improves bandwidth but doesn’t provide redundancy if that switch fails. Connecting only one adapter obviously limits bandwidth. Multi-pathing allows the servers to use both adapters simultaneously, increasing bandwidth, and provides automatic failover if one of the switches or links fails.

Question #53

A server with 8 NVIDIAAIOO GPUs is experiencing an unexpected shutdown under heavy load. The IPMI logs show a ‘Power Supply Deasserted’ event immediately preceding the shutdown. After replacing the PSU, the issue persists.

What is the MOST likely cause of the continued shutdowns?

A . Incompatible GPU driver version.
B . Overcurrent protection (OCP) tripping due to excessive inrush current during GPU startup.
C . Insufficient system memory (RAM).
D . Network congestion causing system instability.
E . A faulty CMOS battery.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

The ‘Power Supply Deasserted’ event, even after replacing the PSIJ, strongly suggests that overcurrent protection (OCP) is being triggered. OCP is a safety mechanism that shuts down the PSU if it detects excessive current draw. This is particularly likely with multiple high- power GPUs, as the inrush current during startup can momentarily exceed the PSU’s capacity. A driver issue or insufficient memory is less likely to cause this specific event.

Question #54

After running a 24-hour stress test on a DGX node, the administrator should verify which two key metrics to ensure system stability?

A . Average CPU usage >80% and Docker container uptime.
B . No thermal throttling events and consistent GPU utilization >95% throughout the test.
C . SSD write endurance and RAM capacity.
D . Total energy consumption and NVLink bandwidth.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

A 24-hour stress test (using tools like HPL or NCCL) is designed to push the thermal and electrical limits of a DGX system. To verify a "Pass," the administrator must ensure that the hardware maintained its performance targets without degradation. Consistent GPU utilization >95% confirms that the workload successfully saturated the compute cores for the entire duration. Crucially, the absence of thermal throttling events (verified via nvidia-smi -q -d PERFORMANCE) ensures that the system’s cooling solution (fans and heatsinks) is adequate for the environment; if throttling occurred, the GPUs would have slowed down to protect themselves, indicating a potential cooling failure or environmental heat issue. While power consumption (Option D) and CPU usage (Option A) are interesting, they are not the primary indicators of "Stability" under extreme AI training loads. System stability is defined by the ability to run at peak speeds indefinitely without hardware-level interventions or slowdowns.

Question #55

You are deploying a multi-node NVIDIA GPU cluster for distributed deep learning. Each node has a different ambient operating temperature due to varying airflow patterns within the data center.

To ensure optimal performance and longevity of the GPUs across all nodes, which approach is MOST effective for managing GPU power limits?

A . Set a uniform power limit for all GPIJs across the entire cluster based on the GPU’s Thermal Design Power (TDP) specification.
B . Disable power capping altogether to allow GPUs to operate at their maximum potential performance.
C . Implement dynamic power management using NVIDIA’s Data Center GPU Manager (DCGM) to adjust power limits on a per-GPU basis, taking into account real- time temperature readings and workload characteristics.
D . Rely on the default power management settings provided by the GPU driver.
E . Manually adjust the fan speeds of each GPU to ensure they are all running at maximum RPM.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Option C, using DCGM for dynamic power management, is the most effective approach. It allows for per-GPU power limit adjustments based on real-time conditions, optimizing performance while ensuring thermal safety and longevity across nodes with different operating temperatures. A uniform power limit (A) might be too restrictive for some nodes or insufficient for others. Disabling power capping (B) risks overheating and damage. Default settings (D) may not be optimal. Manually adjusting fan speeds (E) can help, but doesn’t address power limits directly.

Question #56

You are troubleshooting performance issues in an A1 training clusten You suspect network congestion.

Which of the following network monitoring tools would be MOST helpful in identifying the source of the congestion?

A . Ping
B . Traceroute
C . iPerf/Netperf
D . tcpdump/Wireshark
E . netstat

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

iPerf/Netperf can measure network bandwidth and latency between nodes, allowing you to quantify congestion. tcpdump/Wireshark can capture and analyze network packets, providing detailed insights into traffic patterns and potential bottlenecks. Ping and Traceroute are useful for basic connectivity testing but not for detailed congestion analysis. netstat shows network statistics but doesn’t capture packet-level details.

Question #57

During HPL execution on a DGX cluster, the benchmark fails with "not enough memory" errors despite sufficient physical RAM.

Which HPL.dat parameter adjustment is most effective?

A . Reduce the problem size while maintaining the same block size.
B . Set PMAP to 1 to enable process mapping.
C . Increase block size to 6144 to maximize GPU utilization.
D . Disable double-buffering via BCAST parameter.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

High-Performance Linpack (HPL) is a memory-intensive benchmark that allocates a large portion of available GPU memory to store the matrix $N$. While a server may have 2TB of physical system RAM, the "not enough memory" error usually refers to the HBM (High Bandwidth Memory) on the GPUs themselves. In a DGX H100 system, each GPU has 80GB of HBM3. If the problem size ($N$) specified in the HPL.dat file is too large, the required memory for the matrix will exceed the aggregate capacity of the GPU memory. Reducing the problem size ($N$) while maintaining the optimal block size ($NB$) ensures that the problem fits within the GPU memory limits while still pushing the computational units to their peak performance. Increasing the block size (Option C) would actually increase the memory footprint of certain internal buffers, potentially worsening the issue. Reducing $N$ is the standard procedure to stabilize the run during the initial tuning phase of an AI cluster bring-up.

Question #58

A data scientist reports slow data loading times when training a large language model. The data is stored in a Ceph cluster. You suspect the client-side caching is not properly configured.

Which Ceph configuration parameter(s) should you investigate and potentially adjust to improve data loading performance? Select all that apply.

A . client cache size
B . client quota
C . mds cache size
D . fuse_client_max_background

Reveal Solution Hide Solution

Correct Answer: A,D
A,D

Explanation:

Client-side caching in Ceph is primarily controlled by ‘client cache size’ which determines the amount of memory the Ceph client uses for caching data. ‘mds cache size’ controls the metadata server cache size, impacting metadata operations. controls the maximum number of background requests a FUSE client can make, influencing concurrency. affects the number of threads used by the OSDs, not the client-side caching, and ‘client quota’ limits storage usage, not caching.

Question #59

You are tasked with configuring a BlueField-2 DPIJ to offload network virtualization functions.

Which Mellanox OFED command is MOST crucial for verifying that SR-IOV is correctly enabled and functional on the host and the DPU, prior to configuring vPorts?

A . ‘mlxlink’ to check link speed and FEC settings.
B . ‘mst statuS to verify the status of Mellanox devices and their firmware.
C . ‘Ispci -vv’ to check for the presence of Virtual Functions (VFs) and their assignment.
D . ‘ifconfig’ or ‘ip addr show’ to verify IP address assignments to network interfaces.
E . ‘ethtool -i to check driver information.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

"Ispci -vv’ is the most crucial command. It shows the presence of Virtual Functions (VFs) which confirms SR-IOV is enabled. It also shows the device IDs assigned, allowing verification that the host and DPIJ are correctly seeing and assigning VFs to the network interfaces. While other commands are useful for general network diagnostics, they don’t directly verify SR-IOV functionality like ‘Ispci -vv’ does.

Question #60

You’re profiling the performance of a PyTorch model running on an AMD server with multiple NVIDIA GPUs. You notice significant overhead in the data loading pipeline.

Which of the following strategies can help optimize data loading and improve GPU utilization? Select all that apply.

A . Using the ‘torch.utils.data.DataLoader’ with multiple worker processes.
B . Loading the entire dataset into RAM before training.
C . Implementing asynchronous data prefetching using ‘torch .Generator’.
D . IJsing a faster storage system (e.g., NVMe SSD instead of HDD).
E . Reducing the batch size to decrease the amount of data loaded per iteration.

Reveal Solution Hide Solution

Correct Answer: A D
A D

Explanation:

Using multiple worker processes in ‘DataLoader’ enables parallel data loading. Asynchronous prefetching allows data to be loaded while the GPU is processing the current batch. A faster storage system reduces the 1/0 bottleneck. Loading the entire dataset into RAM might not be feasible for large datasets. Reducing batch size reduces the amount of data loaded but could decrease overall GPU utilizatiom

1 2 3 4 5 6 7 8 9 10 11 12

Exams