Practice Free NCP-AII Exam Online Questions
You are tasked with validating the NVLink performance between GPUs in an NVIDIA DGXAI 00 system.
Which tool is the most appropriate for measuring the bandwidth and latency of NVLink interconnections under a synthetic workload?
- A . nvidia-smi
- B . NCCL tests (e.g., nccl-tests/net_send_recv)
- C . iostat
- D . memtest86+
- E . dmesg
B
Explanation:
NCCL tests, specifically, are designed for benchmarking the communication performance between GPUs using NVLink. ‘nvidia-smr provides GPU monitoring information but not detailed bandwidth/latency tests. ‘iostat’ is for 1/0 statistics. ‘memtest86’’ tests system memory. ‘dmesg’ displays kernel messages.
You are tasked with deploying a cluster of NVIDIAAIOO GPUs in a high-density server environment. The server chassis has a limited power budget and cooling capacity.
Which of the following strategies is MOST effective in validating that the power and cooling infrastructure can adequately support the GPU workload during peak performance, minimizing the risk of thermal throttling and system instability?
- A . Rely solely on the GPU manufacturer’s stated Thermal Design Power (TDP) specifications and allocate power based on these values.
- B . Monitor GPU temperature using ‘nvidia-smi’ during a sustained compute-intensive workload and compare it to the GPU’s thermal threshold. If the temperature remains below the threshold, the cooling is adequate.
- C . Employ a power monitoring tool (e.g., IPMI, Redfish) to measure the actual power consumption of the server during a stress test that mimics the intended Ai workload. Cross-reference this with the power supply unit’s (PSU) rating and the cooling system’s capacity.
- D . Simulate the Ai workload with a synthetic benchmark (e.g., Linpack) and extrapolate power consumption based on the benchmark’s performance metrics.
- E . Observe the GPU clock speeds during a workload. If the clock speeds are at the maximum rated speed, the power and cooling are sufficient.
C
Explanation:
Option C provides the most comprehensive approach. TDP is a theoretical maximum and doesn’t reflect real-world power consumption. Monitoring temperature is important but doesn’t account for total power draw. Synthetic benchmarks may not accurately represent the Ai workload. Monitoring actual power consumption and comparing it to the PSU rating and cooling capacity offers the most accurate validatiom.
You are deploying a BlueField-2 DPU-based server in a VMware vSphere environment.
Which network virtualization technology is most commonly used in conjunction with the DPU to provide accelerated networking and security features within the virtualized environment?
- A . VXLAN (Virtual Extensible LAN)
- B . GRE (Generic Routing Encapsulation)
- C . IPsec (Internet Protocol Security)
- D . SR-IOV (Single Root 1/0 Virtualization)
- E . LACP (Link Aggregation Control Protocol)
A
Explanation:
VXLAN is a widely adopted network virtualization technology that is frequently used with BlueField DPUs in vSphere environments. DPUs can offload VXLAN encapsulation and decapsulation, improving performance and reducing the CPU load on the host. SR-IOV provides direct access to the NIC for VMs, but it’s not a network virtualization technology in the same sense as VXLAN. GRE and IPsec are tunneling protocols but less common in vSphere for this specific use case. LACP is for link aggregation, not virtualization.
Consider the following ‘ibroute’ command used on an InfiniBand host: ‘ibroute add dest Oxla dev ib0’.
What is the MOST likely purpose of this command?
- A . To add a default route for all traffic destined outside the InfiniBand subnet.
- B . To create a static route for traffic destined to LID Ox1a, using the InfiniBand interface ib0.
- C . To configure the MTU size on the ib0 interface to Ox1a bytes.
- D . To disable routing on the ib0 interface.
- E . To configure a static route for traffic destined to IP address Ox1a, using the InfiniBand interface ib0.
B
Explanation:
The ‘ibroute add dest Ox1a dev ibC command creates a static route for traffic destined for the InfiniBand LID (Local Identifier) Ox1a, using the InfiniBand interface named ‘ib0’. InfiniBand routing is primarily based on LIDS, not IP addresses directly (though IP over 1B is possible). The ‘dest’ parameter specifies the destination LID.
You are tasked with setting up network fabric ports to connect several servers, each with multiple NVIDIA GPUs, to an InfiniBand switch. Each server has two ConnectX-6 adapters.
What is the best strategy to maximize bandwidth and redundancy between the servers and the InfiniBand fabric?
- A . Connect only one adapter from each server to the switch to minimize cable clutter.
- B . Connect both adapters from each server to the same switch, but do not configure link aggregation.
- C . Connect both adapters from each server to the same switch and configure link aggregation (LACP or static LAG) on both the server and the switch.
- D . Connect one adapter from each server to one switch, and the second adapter to a different switch, without link aggregation.
- E . Connect one adapter from each server to one switch, and the second adapter to a different switch, and configure multi-pathing on the servers.
E
Explanation:
Connecting each adapter to a different switch and configuring multi-pathing provides the highest level of bandwidth and redundancy. Link aggregation to the same switch improves bandwidth but doesn’t provide redundancy if that switch fails. Connecting only one adapter obviously limits bandwidth. Multi-pathing allows the servers to use both adapters simultaneously, increasing bandwidth, and provides automatic failover if one of the switches or links fails.
You are tasked with setting up network fabric ports to connect several servers, each with multiple NVIDIA GPUs, to an InfiniBand switch. Each server has two ConnectX-6 adapters.
What is the best strategy to maximize bandwidth and redundancy between the servers and the InfiniBand fabric?
- A . Connect only one adapter from each server to the switch to minimize cable clutter.
- B . Connect both adapters from each server to the same switch, but do not configure link aggregation.
- C . Connect both adapters from each server to the same switch and configure link aggregation (LACP or static LAG) on both the server and the switch.
- D . Connect one adapter from each server to one switch, and the second adapter to a different switch, without link aggregation.
- E . Connect one adapter from each server to one switch, and the second adapter to a different switch, and configure multi-pathing on the servers.
E
Explanation:
Connecting each adapter to a different switch and configuring multi-pathing provides the highest level of bandwidth and redundancy. Link aggregation to the same switch improves bandwidth but doesn’t provide redundancy if that switch fails. Connecting only one adapter obviously limits bandwidth. Multi-pathing allows the servers to use both adapters simultaneously, increasing bandwidth, and provides automatic failover if one of the switches or links fails.
You are deploying a new NVLink Switch based cluster. The GPUs are installed in different servers, but need to be configured to utilize
NVLink interconnect.
Which of the following should be performed during the installation phase to confirm correct configuration?
- A . Run NCCL tests to verify the GPU-to-GPU bandwidth and latency between servers.
- B . Verify that GPUDirect RDMA is enabled and functioning correctly.
- C . Check that the ‘nvidia-sm’ command shows the correct NVLink topology.
- D . Run standard TCP/IP network bandwidth tests to check inter-server communication.
- E . All the GPU’s are in the same IP subnet
A,B,C
Explanation:
NCCL tests are specifically designed to test GPU-to-GPU communication. Ensuring GPUDirect RDMA is functioning is essential for low-latency communication. ‘nvidia-smi’ should display the NVLink topology. TCP/IP tests do not test the NVLink connection. It does not matter if GPUs on different servers are in the same IP subnet as NVLink communication occur directly between the GPUs using RDMA mechanism. Subnetting affects traditional networking layer communication, but not low-level device communication.
You are attempting to install NGC CLI on a CentOS 7 system, but the ‘pip install nvidia-cli’ command fails with a ‘Could not find a version that satisfies the requirement nvidia-cli’ error. You have confirmed that ‘pip’ is installed and working.
What could be the cause of this issue?
- A . The CentOS 7 system does not have the required Python version installed. NGC CLI requires Python 3.6 or later.
- B . The system’s package manager (YUM) is not configured correctly, preventing ‘pip’ from finding the NGC CLI package.
- C . The ‘pip’ version is outdated and incompatible with the NGC CLI package. Upgrade ‘pip’ using ‘pip install ―upgrade pip’.
- D . The system’s firewall is blocking access to the Python Package Index (PyPl). CentOS 7 is not supported by NGC CLI.
A,C
Explanation:
A likely reason is an outdated Python version (A), as NGC CLI requires Python 3.6 or later. Another potential issue is an outdated pip’ version (C) which could be incompatible with the NGC CLI package. Confirming the correct python version and up to date pip usually resolves this issue.
Option E is incorrect, CentOS 7 is supported with correct configuration.
Consider a scenario where you need to run two different deep learning models, Model A and Model B, within separate Docker containers on the same NVIDIA GPU. Model A requires CUDA 11.2, while Model B requires CUDA 11.6.
How can you achieve this while minimizing conflicts and ensuring each model has access to its required CUDA version?
- A . Install both CUDA 11.2 and CUDA 11.6 on the host system and use ‘CUDA VISIBLE DEVICES* to isolate each model to a specific CUDA version.
- B . Use separate Docker images for each model, each based on the appropriate ‘nvidia/cuda’ image (e.g., ‘nvidia/cuda:ll .2-base-ubuntu20.04’ and nvidia/cuda: 1 1.6-base-ubuntu20.04 s).
- C . Install both CUDA 11.2 and CUDA 11.6 inside each Docker container and use ‘LD LIBRARY PATH’ to switch between the CUDA versions for each model.
- D . Create a single Docker image with both CUDA versions and dynamically link the correct CUDA libraries at runtime using environment variables.
- E . Mount the CUDA libraries from the host machine into both containers using Docker volumes, ensuring each container has access to both CUDA versions.
B
Explanation:
The recommended and most straightforward approach is to use separate Docker images (B), each based on the specific nvidia/cuda’ image version needed. This creates isolated environments, avoiding conflicts and ensuring each model has the correct CUDA toolkit. Installing multiple CUDA versions on the host (A) can lead to conflicts and isn’t necessary with Docker. Installing multiple CUDA versions within a single container (C, D) adds complexity and potential conflicts. Mounting CUDA libraries from the host (E) might work, but it’s less isolated and can create dependency management issues.
A data scientist reports that training performance on a DGX A100 server has significantly degraded over the past week. ‘nvidia-smi’ shows all GPUs functioning, but ‘nvprof’ reveals substantially increased ‘cudaMemcpy’ times.
What is the MOST likely bottleneck?
- A . The CPU is heavily loaded, causing contention for system memory bandwidth.
- B . The PCle bus is saturated, limiting data transfer speeds between the CPU and GPUs.
- C . The NVLink connections between GPUs are failing, forcing data transfers through PCle.
- D . The GPUs are overheating, causing thermal throttling and slower memory transfers.
- E . The storage system is slow, delaying data loading and preprocessing.
A
Explanation:
Increased ScudaMemcpy’ times indicate a bottleneck in data transfer between the CPU and GPIJs or within the GPU memory itself. While PCle saturation or failing NVLink connections could contribute, a heavily loaded CPU is often the primary culprit. CPU-bound preprocessing tasks or general system load can create contention for system memory, slowing down data movement to the GPUs. The NVLink being used is less likely because ‘nvidia-smi’ would likely report errors or lower speeds between the GPUs