Practice Free NCP-AII Exam Online Questions
A financial services firm is deploying an AI model for fraud detection that requires rapid inference and data retrieval across multiple sites.
Which feature should their storage system prioritize?
- A . Multi-protocol data access with low latency.
- B . High capacity with moderate speed.
- C . Tape backup systems.
- D . Low-cost HDD solutions.
A
Explanation:
Fraud detection in financial services is a real-time AI workload. The system must ingest transaction data, retrieve historical customer profiles, and perform inference in milliseconds. This requires a storage architecture that supports multi-protocol access (such as S3 for ingestion and POSIX/NFS for inference engines) combined with low latency. In these environments, storage latency directly impacts the "Time to Decision". An All-Flash storage tier is mandatory, as traditional HDD solutions (Option D) or moderate speed systems (Option B) introduce "Tail Latency" that can cause the fraud detection model to time out during peak transaction windows. Additionally, multi-site synchronization ensures that the latest model weights and historical data are available across different geographic data centers for high availability and localized inference.
You’ve installed a new NVIDIA GPU in your A1 server. After the installation and driver setup, you notice that while ‘nvidia-smi’ recognizes the GPU, the available memory reported is significantly lower than the GPU’s specifications.
What are the potential root causes and how would you systematically troubleshoot this?
- A . The GPU is faulty and needs to be replaced.
- B . The system BIOS is incorrectly configured, limiting GPU memory allocation.
- C . The integrated graphics is using a significant amount of system memory, reducing what’s available to the GPU. Disable the integrated graphics in the BIOS.
- D . The driver is not correctly installed. Reinstall the latest NVIDIA driver.
- E . The reported memory is the currently allocated memory, not the total available. Run a CUDA program to allocate more memory and observe the change.
C
Explanation:
Integrated graphics stealing system memory is a common cause, and disabling it frees up resources for the dedicated GPU. While a faulty GPU, BIOS settings, or driver issues are possibilities, integrated graphics is a more likely and easily verifiable cause. The reported memory is the total usable, not just allocated.
You have a server with two NVIDIA GPUs connected via NVLink. You want to verify that NVLink is functioning correctly.
Which command(s) or tool(s) can you use to check the NVLink status and bandwidth?
- A . ‘nvidia-smi nvlink -statue
- B . Ispci’
- C . nvcc ―version’
- D . ‘nvidia-smi topo -m’
- E . ‘nvidia-settings’ (GUI tool)
A,D
Explanation:
‘nvidia-smi nvlink -statuS provides a direct overview of the NVLink status, including link speed and errors. ‘nvidia-smi topo shows the topology of the GPUs and how they are connected, including NVLink connections. ‘Ispci’ lists PCl devices but doesn’t provide NVLink- specific information. ‘nvcc ―version’ checks the CUDA compiler version. ‘nvidia-settings’ is a GUI tool that can display some information, but it’s less precise than ‘nvidia-smr for NVLink status.
You are running a distributed training job on a multi-GPU server. After several hours, the job fails with a NCCL (NVIDIA Collective Communications Library) error. The error message indicates a failure in inter-GPU communication. ‘nvidia-smi’ shows all GPUs are healthy.
What is the MOST probable cause of this issue?
- A . A bug in the NCCL library itself; downgrade to a previous version of NCCL.
- B . Incorrect NCCL configuration, such as an invalid network interface or incorrect device affinity settings.
- C . Insufficient inter-GPU bandwidth; reduce the batch size to decrease communication overhead.
- D . A faulty network cable connecting the server to the rest of the cluster.
- E . Driver incompatibility issue between NCCL and the installed NVIDIA driver version.
BE
Explanation:
NCCL errors during inter-GPU communication often stem from configuration issues (B) or driver incompatibilities (E). Incorrect network interface or device affinity settings can prevent proper communication. Driver versions might not fully support the NCCL version being used. Reducing batch size (C) might alleviate symptoms but doesn’t address the root cause. A faulty network cable (D) would likely cause broader network issues beyond NCCL. Downgrading NCCL (A) is a potential workaround but not the ideal first step.
You notice that one of the fans in your GPU server is running at a significantly higher RPM than the others, even under minimal load. ipmitool sensor’ output shows a normal temperature for that GPU.
What could be the potential causes?
- A . The fan’s PWM control signal is malfunctioning, causing it to run at full speed.
- B . The fan bearing is wearing out, causing increased friction and requiring higher RPM to maintain airflow.
- C . The fan is attempting to compensate for restricted airflow due to dust buildup.
- D . The server’s BMC (Baseboard Management Controller) has a faulty temperature sensor reading, causing it to overcompensate.
- E . A network connectivity issue is causing higher CPU utilization, leading to increased system-wide heat.
A, C
Explanation:
A malfunctioning PWM control signal, worn fan bearings, or restricted airflow can all cause a fan to run at higher RPMs. While a faulty BMC sensor could be a cause, the question states that ‘ipmitool sensor’ shows a normal temperature. Network connectivity issues are less likely to cause an isolated fan to run high, if the GPU temperature is normal.
A GPU in your AI server consistently overheats during inference workloads. You’ve ruled out inadequate cooling and software bugs.
Running ‘nvidia-smi’ shows high power draw even when idle.
Which of the following hardware issues are the most likely causes?
- A . Degraded thermal paste between the GPU die and the heatsink.
- B . A failing voltage regulator module (VRM) on the GPU board, causing excessive power leakage.
- C . Incorrectly seated GPU in the PCle slot, leading to poor power delivery.
- D . A BIOS setting that is overvolting the GPU.
- E . Insufficient system RAM.
B
Explanation:
Degraded thermal paste loses its ability to conduct heat effectively. A failing VRM can cause excessive power draw and heat generation. An incorrectly seated GPU can cause instability and poor power delivery, leading to overheating. Overvolting in BIOS will definitely cause overheating. While insufficient RAM can cause performance issues, it is less likely to lead to overheating.
An AI server equipped with multiple NVIDIA GPUs experiences frequent reboots during peak workload periods. The system event logs indicate ‘Uncorrectable Machine Check Exception’ errors. You suspect a power delivery issue.
Besides checking the PSUs, what other hardware component(s) should be thoroughly inspected to identify potential causes?
- A . The CPU and system memory.
- B . The motherboard VRMs (Voltage Regulator Modules) responsible for supplying power to the GPUs.
- C . The network interface cards (NICs).
- D . The storage drives (SSDs/HDDs).
- E . The server’s CMOS battery.
B
Explanation:
While ‘Uncorrectable Machine Check Exception’ errors can have various causes, a power delivery issue to the GPUs is a strong possibility in this scenario. The motherboard VRMs are responsible for regulating and supplying power to the GPUs. If they are failing or inadequate, it can lead to power instability and these types of errors during high load.
After installing a new NVIDIA GPU, you attempt to run a CUDA application, but you encounter the following error: ‘CUDA error: CUDA driver version is insufficient for CUDA runtime version’. You have verified the driver and CUDA toolkit are installed.
What is the MOST likely reason for this error, and how do you resolve it?
- A . The CUDA toolkit is too old. Update the CIJDA toolkit.
- B . The NVIDIA driver is too old for the CUDA toolkit. Update the NVIDIA driver to a version that supports the CUDA toolkit.
- C . The GPU is not compatible with the CUDA toolkit. Install a different GPIJ.
- D . The CUDA VISIBLE DEVICES environment variable is not set correctly.
- E . The CUDA runtime libraries are missing from the system path. Add them to the PATH variable.
B
Explanation:
This error indicates an incompatibility between the driver and the CUDA toolkit. The most common reason is an outdated driver. The driver must be at least as new as the CUDA toolkit’s minimum required driver version. CUDA VISIBLE DEVICES relates to GPU selection, not driver version.
You are designing an AI infrastructure cluster for training large language models (LLMs). The dataset consists of 10TB of image data and 5TB of text data. You estimate that intermediate training data (checkpoints, temporary files) will require an additional 20TB of storage. You want to use a parallel file system for optimal performance.
Considering a replication factor of 2 for data redundancy and a 20% overhead for file system metadata, what is the minimum raw storage capacity you should provision?
- A . 42 TB
- B . 84 TB
- C . 92.4 TB
- D . 70 TB
- E . 100.8 TB
B
Explanation:
Total data size: IOTB + 5TB + 20TB = 35TB. With a replication factor of 2, the storage required is 35TB 2 = 70TB. Adding 20% overhead for metadata, we get 70TB 1.2 = 84 TB. Therefore, the minimum raw storage capacity is 84 + 8.4 = 92.4 TB. Overhead needs to be calcualted from after replication is implemented, so replication + 20% overhead.
After ClusterKit reports "GPU-Host latency exceeds threshold," which NVIDIA diagnostic tool should be used to isolate hardware faults?
- A . Re-run ClusterKit with –stress=gpu -Y 60 to extend test duration
- B . nvidia-smi topo -m to inspect GPU topology connections
- C . DCGM Diags dcgmi diag -r 2
- D . ib_write_bw to measure InfiniBand bandwidth between nodes
C
Explanation:
"GPU-Host latency" issues in NVIDIA DGX or HGX systems are frequently caused by incorrect PCIe affinity or sub-optimal NUMA (Non-Uniform Memory Access) mapping. If a GPU is forced to communicate with a CPU core or an HCA that is not on its local PCIe switch/root complex, latency increases significantly as data must cross the QPI/UPI inter-processor links. The command nvidia-smi topo -m provides a detailed matrix of the system’s internal topology, showing how GPUs, CPUs, and
NICs are connected. It identifies whether the connection is via a single PCIe switch (PIX), multiple switches (PXB), or across the CPU (SYS). By inspecting this map, an administrator can identify if a software process is pinned to the wrong NUMA node or if a hardware path is unexpectedly degraded. While DCGM (Option C) is good for checking component health, it doesn’t map the logical-to-physical affinity paths that cause specific latency "threshold" warnings.
