Practice Free NCP-AII Exam Online Questions
You are tasked with installing NVIDIA GPUs into a server that supports both single and double-width cards. You want to maximize GPU density.
What is the MOST important factor to consider when choosing between single and double-width cards?
- A . The clock speed of the GPUs.
- B . The amount of VRAM on the GPUs.
- C . The available PCIe slots and their spacing within the server chassis, and the server’s cooling capacity.
- D . The price of the GPUs.
- E . The brand of the GPUs.
C
Explanation:
While clock speed, VRAM, price, and brand are relevant, the physical constraints of the server (PCle slot availability/spacing and cooling capacity) are paramount when deciding between single and double-width cards. Double-width cards offer more performance but require more space and cooling. If spacing isn’t proper, and the cooling isn’t adequet, performance is not relevant. If the cooling is inadequate, and cards are too close together, performance will suffer due to throttling.
You have created MIG instances on an A100 GPU and want to dynamically adjust their size based on workload demands.
Which of the following methods is the most appropriate for automatically resizing MIG instances in response to changing resource requirements?
- A . Use ‘nvidia-smi’ to manually destroy and recreate MIG instances with different sizes as needed.
- B . Implement a script that monitors GPU utilization and automatically adjusts Kubernetes resource quotas to match.
- C . Leverage a GPU virtualization platform with dynamic resource allocation capabilities that integrates with MIG.
- D . Utilize CUDA MPS to dynamically allocate GPU resources to different processes.
- E . Adjust the application code to use less GPIJ memory dynamically.
C
Explanation:
Dynamically resizing MIG instances requires a mechanism that can automatically adjust the underlying GPU partitioning based on workload demands. The most appropriate method is leveraging a GPU virtualization platform (C) that offers dynamic resource allocation and integrates with MIG. These platforms can monitor resource utilization and automatically resize MIG instances accordingly. Manually resizing (A) is impractical for dynamic adjustments. Kubernetes resource quotas (B) control container resource limits, not the underlying MIG configuration. CUDA MPS (D) allows sharing a single GPU but doesn’t resize MIG instances. Adjusting application code (E) doesn’t address the need for dynamic MIG resizing.
You’re designing a data center network for inference workloads. The primary requirement is high availability.
Which of the following considerations are MOST important for your topology design?
- A . Minimizing hop count
- B . Implementing redundant paths
- C . Using the cheapest possible switches
- D . Prioritizing north-south bandwidth over east-west bandwidth
- E . Centralized routing
B
Explanation:
High availability necessitates redundant paths to avoid single points of failure. Minimizing hop count reduces latency and the potential for network congestion. Using cheap switches is detrimental to availability. Inference workloads typically involve both north-south and east-west traffic, but redundant paths and minimizing hop count are most crucial for availability. Centralized routing can be a single point of failure and is generally less resilient than distributed routing.
You suspect a faulty NVIDIA ConnectX-6 network adapter in a server used for RDMA-based distributed training.
Which commands or tools can you use to diagnose potential issues with the adapter’s hardware and connectivity?
- A . Ispci -v to verify the adapter is detected and its resources are allocated correctly.
- B . ibstat to check the adapter’s status, link speed, and active ports.
- C . ethtool to examine the adapter’s Ethernet settings and statistics.
- D . ping to test basic network connectivity.
- E . nvsmimonitord to monitor GPU metrics and detect anomalies.
A,B,C
Explanation:
All options except E are relevant for diagnosing network adapter issues. ‘Ispci -v’ (A) verifies hardware detection. ‘ibstat’ (B) checks InfiniBand-specific details. ‘ethtoor (C) examines Ethernet settings. ‘ping’ (D) tests basic connectivity. ‘nvsmimonitord’ (E) focuses on GPU monitoring, not network adapters.
You’re troubleshooting a DGX-I server exhibiting performance degradation during a large-scale distributed training job. ‘nvidia-smü shows all GPUs are detected, but one GPU consistently reports significantly lower utilization than the others. Attempts to reschedule orkloads to that GPU frequently result in CUDA errors.
Which of the following is the MOST likely cause and the BEST initial roubleshooting step?
- A . A driver issue affecting only one GPU; reinstall NVIDIA drivers completely.
- B . A software bug in the training script utilizing that specific GPU’s resources inefficiently; debug the training script.
- C . A hardware fault with the GPU, potentially thermal throttling or memory issues; run ‘nvidia-smi -i -q’ to check temperatures, power limits, and error counts.
- D . Insufficient cooling in the server rack; verify adequate airflow and cooling capacity for the rack.
- E . Power supply unit (PSU) overload, causing reduced power delivery to that GPU; monitor PSU load and check PSU specifications.
C
Explanation:
While all options are possibilities, the consistently lower utilization and CUDA errors point strongly to a hardware fault. Running nvidia-smi -i -q’ provides detailed telemetry data, including temperature, power limits, and ECC error counts, which are crucial for diagnosing GPU hardware issues.
You’re optimizing a deep learning model for deployment on NVIDIA Tensor Cores. The model uses a mix of FP32 and FP16 precision. During profiling with NVIDIA Nsight Systems, you observe that the Tensor Cores are underutilized.
Which of the following strategies would MOST effectively improve Tensor Core utilization?
- A . Increase the batch size to fully utilize the available GPU memory.
- B . Ensure that all matrix multiplications are performed using FP16 precision.
- C . Pad the input tensors to dimensions that are multiples of 8 for optimal Tensor Core alignment.
- D . Enable CUDA graph capture to reduce kernel launch overhead.
- E . Decrease the learning rate to improve training stability and reduce the need for gradient clipping.
C
Explanation:
Padding input tensors (C) to multiples of 8 is crucial for optimal Tensor Core performance, as Tensor Cores operate most efficiently on data with these dimensions. Using FP16 (B) is important, but proper alignment is key for full utilization. Increasing batch size (A) can improve overall throughput but doesn’t directly address Tensor Core utilization. CIJDA graph capture (D) reduces kernel launch overhead, not Tensor Core utilization directly. Decreasing learning rate (E) is unrelated to Tensor Core performance.
An AI cluster needs to transmit data at 200Gbps over a distance of 2km using single-mode fiben Considering cost and performance, which transceiver type is the most appropriate?
- A . 200GBASE-SR4
- B . 200GBASE-LR4
- C . 200GBASE-ER4
- D . 200GBASE-CR4
- E . 200GBASE-DR4
B
Explanation:
200GBASE-LR4 is the most appropriate. ‘LR’ designates Long Reach, typically up to 10km on single-mode fiber. SR4 is short reach (typically copper or very short fiber). ER4 is Extended Reach (up to 40km). CR4 is copper, for very short distances. DR4 is typically used for up to 500m to 2km distances depending on the specific implementation, so while possible LR4 is a better fit for guaranteed 2km.
In an InfiniBand fabric, what is the primary role of the Subnet Manager (SM) with respect to routing?
- A . To forward packets based on destination IP addresses, similar to a traditional IP router.
- B . To discover the network topology, calculate routing paths, and program the forwarding tables (LID tables) in the switches.
- C . To monitor the network for congestion and dynamically adjust packet priorities using Quality of Service (QOS) mechanisms.
- D . To provide a command-line interface for users to manually configure routing tables on each InfiniBand switch.
- E . To act as a firewall, blocking unauthorized traffic based on pre-defined rules.
B
Explanation:
The Subnet Manager (SM) is responsible for discovering the InfiniBand topology, calculating routes, and programming the forwarding tables (LID tables) within the switches. This is crucial for establishing connectivity and ensuring efficient data transfer within the fabric. InfiniBand uses LID (Local Identifier) based routing, not IP addresses directly.
You’re monitoring the storage I/O for an AI training workload and observe high disk utilization but relatively low CPU utilization.
Which of the following actions is LEAST likely to improve the performance of the training job?
- A . Switching from HDDs to NVMe SSDs.
- B . Implementing data prefetching to load data into memory before it’s needed.
- C . Increasing the batch size of the training job.
- D . Adding more RAM to the system.
- E . Reducing the number of parallel data loading threads.
E
Explanation:
High disk utilization and low CPU utilization indicate an 1/0 bottleneck. Switching to faster storage (A), prefetching data (B), increasing the batch size (C), and adding more RAM (D) can all help alleviate the I/O bottleneck. Reducing the number of parallel data loading threads (E) would likely worsen the bottleneck by underutilizing the available 1/0 bandwidth.
Which of the following are key benefits of using NVIDIA NVLink� Switch in a multi-GPU server setup for AI and deep learning workloads?
- A . Increased GPU-to-GPIJ communication bandwidth.
- B . Reduced latency in inter-GPU data transfers.
- C . Simplified GPU resource management.
- D . Support for larger GPU memory pools than a single server can physically accommodate.
- E . Enhanced security features compared to PCle based interconnections.
A,B
Explanation:
NVLink provides significantly higher bandwidth and lower latency compared to PCle, enabling faster communication between GPUs. NVLink switches allow for pooling of GPU memory across multiple servers, enabling training of larger models that wouldn’t fit on a single server’s GPU memory. While it enhances infrastructure capabilities, it doesn’t inherently simplify GPU resource management or directly provide enhanced security compared to PCle regarding the data transfer. Management features might exist on top of the NVLink infrastructure.
