Practice Free NCP-AII Exam Online Questions
You are evaluating different parallel file systems for an AI training cluster. You need a file system that supports POSIX compliance and offers high bandwidth and low latency.
Which of the following options are viable candidates?
- A . BeeGFS
- B . GiusterFS
- C . Ceph
- D . Lustre
- E . NFS
A,D
Explanation:
BeeGFS and Lustre are designed for high-performance computing and AI workloads, offering high bandwidth, low latency, and POSIX compliance. GlusterFS and Ceph are more general-purpose distributed file systems. NFS is generally not suitable for demanding AI workloads due to its performance limitations.
An Ai infrastructure relies on a liquid cooling system to dissipate heat from multiple NVIDIA GPUs. After a recent software update, users report intermittent performance degradation and system crashes. You suspect a cooling issue.
Which TWO of the following checks are the MOST critical in diagnosing the root cause?
- A . Verify the pump speed and coolant flow rate within the liquid cooling system.
- B . Check the CPU temperature using ‘sensors’ command.
- C . Analyze the system logs for GPU-related errors, specifically those indicating thermal throttling or power capping.
- D . Examine the ambient temperature in the data center.
- E . Run a memory test on the host system.
A,C
Explanation:
Verifying pump speed and flow rate (A) is crucial for liquid cooling systems. Reduced flow can lead to inadequate cooling and thermal issues. Analyzing system logs for GPU-related errors (C) will directly indicate whether thermal throttling or power capping are occurring, which are common symptoms of cooling problems.
You are managing an NVIDIA DGX A100 server and notice that GPU utilization fluctuates significantly during a supposedly constant training workload. You suspect power capping might be the cause.
How can you definitively determine if power capping is active and affecting GPU performance?
- A . Monitor GPU temperature using ‘nvidia-smi’. If the temperature is consistently below the thermal threshold, power capping is likely active.
- B . Examine the ‘pstate’ value reported by ‘nvidia-smi’. A lower ‘pstate’ indicates power capping.
- C . Use ‘nvidia-smi’ to query the ‘ClocksThrottleReasonS. If ‘PowerCap’ is listed as active, power capping is in effect.
- D . Check the server’s BIOS settings for any power management configurations that might be limiting GPU power consumption.
- E . Monitor the voltage supplied to the GPUs. If the voltage is consistently lower than the maximum rated voltage, power capping is active.
C
Explanation:
The most direct way to determine if power capping is active is to use ‘nvidia-smi’ and query the ‘ClocksThrottleReasonS. If PowerCap’ is listed as active, it definitively indicates that power capping is in effect and limiting the GPU’s performance. The ‘pstate’ value can be indicative, but not always conclusive. Temperature alone doesn’t confirm power capping.
You are configuring network fabric ports for NVIDIA GPUs in a server. The GPUs are connected to the network via PCIe.
What is the primary factor that determines the maximum achievable bandwidth between the GPUs and the network?
- A . The clock speed of the CPU.
- B . The amount of system RAM.
- C . The PCIe generation and number of lanes connecting the GPUs to the network adapter (e.g., PCIe 4.0 x16).
- D . The speed of the system’s hard drives or SSDs.
- E . The color of the Ethernet cables.
C
Explanation:
The PCIe generation (e.g., PCIe 4.0, PCIe 5.0) and the number of lanes (e.g., x8, x16) directly determine the maximum theoretical bandwidth available between the GPUs and the network adapter. Higher PCIe generations and more lanes provide greater bandwidth. For example, PCIe 4.0 x16 offers significantly more bandwidth than PCIe 3.0 x8. All other options are either irrelevant or have a negligible impact on this particular bottleneck.
You are configuring network fabric ports for NVIDIA GPUs in a server. The GPUs are connected to the network via PCIe.
What is the primary factor that determines the maximum achievable bandwidth between the GPUs and the network?
- A . The clock speed of the CPU.
- B . The amount of system RAM.
- C . The PCIe generation and number of lanes connecting the GPUs to the network adapter (e.g., PCIe 4.0 x16).
- D . The speed of the system’s hard drives or SSDs.
- E . The color of the Ethernet cables.
C
Explanation:
The PCIe generation (e.g., PCIe 4.0, PCIe 5.0) and the number of lanes (e.g., x8, x16) directly determine the maximum theoretical bandwidth available between the GPUs and the network adapter. Higher PCIe generations and more lanes provide greater bandwidth. For example, PCIe 4.0 x16 offers significantly more bandwidth than PCIe 3.0 x8. All other options are either irrelevant or have a negligible impact on this particular bottleneck.
Which of the following statements are correct regarding the use of NVIDIA GPUs with Docker containers?
- A . The NVIDIA Container Toolkit allows you to run GPU-accelerated applications in Docker containers without modifying the container image.
- B . You must install NVIDIA drivers inside the Docker container to enable GPU support.
- C . The ‘nvidia-smr command can only be run on the host machine, not inside a Docker container.
- D . CUDA libraries are required inside the container if your application uses CODA.
- E . Using environment variables like ‘CUDA VISIBLE DEVICES’ within the container can influence which GPUs are accessible to the application.
A,D,E
Explanation:
The NVIDIA Container Toolkit allows GPU-accelerated apps to run in Docker without altering the image. The host’s drivers are leveraged. CUDA libraries are necessary inside the container if your app uses CUDA. is used to control GPU visibility within the container. Drivers are not needed inside the container because they’re managed by the host (making B incorrect), and ‘nvidia-smi’ can be run inside containers if the NVIDIA Container Toolkit is properly set up (making C incorrect).
After configuring MIG on an NVIDIAAIOO GPU, you run ‘nvidia-smü and observe that all MIG instances are in the ‘Disabled’ state.
Which of the following are potential reasons for this issue? (Select all that apply)
- A . The necessary NVIDIA drivers are not installed or are incompatible with the GPU.
- B . The GPU is not in MIG mode.
- C . The MIG instances are correctly configured but have not been allocated to any processes.
- D . The ‘nvidia-persistenced’ service is not running.
- E . The system’s power supply is insufficient.
A,B,D
Explanation:
MIG instances being in a ‘Disabled’ state indicates a fundamental problem with the MIG configuration. Incompatible drivers (A) will prevent the instances from being properly initialized. If the GPU is not explicitly placed into MIG mode (B), no MIG instances will be available. The ‘nvidia-persistenced’ service (D) ensures that driver state persists across reboots, and its absence can cause MIG instances to revert to a disabled state. While unallocated instances (C) will exist, they should be in an ‘Idle’ state, not ‘Disabled’. An insufficient power supply (E) might prevent the GPU from functioning correctly, but it’s less likely to specifically cause a ‘Disabled’ MIG state.
When installing multiple NVIDIA GPUs, which of the following factors are MOST important to consider regarding PCIe slot configuration? (Choose two)
- A . Ensure all GPUs are installed in slots of the same color.
- B . Ensure each GPU is installed in a slot with sufficient PCIe lanes (e.g., x16).
- C . Ensure the PCIe slots are directly connected to the CPU for optimal bandwidth.
- D . Install the GPUs in the lowest numbered slots first.
- E . Ensure all GPUs have the same PCIe generation (e.g. Gen4).
B,C
Explanation:
The number of PCIe lanes directly impacts bandwidth. Direct CPU connection minimizes latency. Slot color and numbering are usually irrelevant. Same PCIe Gen isn’t critical as long as minimum requirements are met.
You are developing a distributed deep learning application that uses multiple GPUs across several Docker containers running on different physical servers.
How do you ensure that each container can access and utilize the GPUs on its respective host?
- A . Install the same version of NVIDIA drivers on all host machines and configure a network file system (NFS) to share the CUDA libraries between the containers.
- B . Ensure the NVIDIA Container Toolkit is installed and configured on each host machine, and use a container orchestration platform like Kubernetes to manage the deployment and GPU assignment.
- C . Manually configure each container to use the ‘CUDA VISIBLE DEVICES’ environment variable to specify the GPUs it should use on its respective host.
- D . Use Docker Swarm and specify GPU resource constraints in the ‘docker-compose.yml’ file to allocate GPUs to each service.
- E . Create a custom Docker network and configure each container to use the network’s gateway as the default GPU device.
B
Explanation:
The most robust solution for distributed GPU utilization is to leverage a container orchestration platform like Kubernetes (B) along with the NVIDIA Container Toolkit. Kubernetes handles scheduling, resource allocation (including GPUs), and networking across multiple nodes.
The NVIDIA Container Toolkit ensures that each container can access the GPUs on its host. While (C) is useful, it’s not sufficient for multi-server deployments. Docker Swarm (D) can work but lacks the sophisticated GPU scheduling capabilities of Kubernetes. NFS sharing (A) is unnecessary and can introduce performance bottlenecks. A custom Docker network (E) doesn’t directly address GPU access.
An AI infrastructure uses a combination of air-cooled and liquid-cooled NVIDIA GPUs. You want to optimize cooling performance based on the specific thermal characteristics of each GPU type and their location within the server rack.
How can you achieve granular cooling control and monitoring to address these heterogeneous cooling requirements effectively? SELECT TWO.
- A . Implement rack-level airflow management solutions, such as blanking panels and cable management, to improve overall airflow uniformity.
- B . Use a centralized monitoring system to track GPU temperatures and power consumption, but apply the same cooling profile to all GPUs regardless of type.
- C . Deploy per-server cooling solutions with independent fan control for each server node, allowing for tailored airflow adjustments.
- D . Employ liquid cooling only for the highest TDP GPUs and rely on ambient air cooling for all other components.
- E . Implement dynamic fan speed control based on individual GPU temperatures, leveraging tools like ‘nvidia-smi’ and custom scripts, for air-cooled GPUs.
A,E
Explanation:
Implementing rack-level airflow management (A) improves overall airflow uniformity, which benefits all GPUs, regardless of cooling type. Implementing dynamic fan speed control based on individual GPU temperatures for air-cooled GPUs (E) allows for fine-grained adjustments to cooling performance. Per-server cooling solutions (C) can be helpful, but less scalable/practical in most datacenters. Using the same cooling profile for all GPUs (B) is ineffective. Cooling only high TDP GPUs (D) may not be sufficient.