Practice Free NCP-AII Exam Online Questions
During NVLink Switch configuration, you encounter issues where certain GPUs are not being recognized by the system.
Which of the following troubleshooting steps are most likely to resolve this problem?
- A . Verify that all NVLink cables are securely connected and properly seated.
- B . Check the system BIOS settings to ensure that NVLink is enabled and configured correctly.
- C . Ensure that the NVLink Switch firmware is compatible with the installed GPUs.
- D . Reinstall the operating system.
- E . Check the Power supply for enough capacity and stability.
A,B,C
Explanation:
Physical connection issues (A), BIOS configuration (B), and firmware incompatibility (C) are the most common causes of GPUs not being recognized. Reinstalling the operating system is a drastic measure that is unlikely to solve the problem. Checking the Power supply may also required to ensure the complete system have enough capacity and stability.
You are deploying a new A1 cluster using RoCEv2 over a lossless Ethernet fabric.
Which of the following QOS (Quality of Service) mechanisms is critical for ensuring reliable RDMA communication?
- A . DSCP (Differentiated Services Code Point) marking
- B . ECN (Explicit Congestion Notification)
- C . PFC (Priority Flow control)
- D . ACL (Access Control List)
- E . Rate Limiting
C
Explanation:
PFC (Priority Flow Control) is essential for RoCEv2 over Ethernet. It prevents packet loss due to congestion by pausing traffic on a specific link, ensuring reliable RDMA communication. DSCP is used for traffic prioritization but doesn’t prevent loss. ECN signals congestion but relies on endpoints to react. ACLs are for security. Rate Limiting will cause unnecessary Packet Loss.
An A1 inferencing server, using NVIDIA Triton Inference Server, experiences intermittent crashes under peak load. The logs reveal CUDA out-of-memory errors (00M) despite sufficient system RAM. You suspect a GPU memory leak within one of the models.
Which strategy BEST addresses this issue?
- A . Increase the system RAM to accommodate the growing memory footprint.
- B . Implement CUDA memory pooling within the Triton Inference Server configuration to reuse memory allocations efficiently.
- C . Reduce the batch size and concurrency of the offending model in the Triton configuration.
- D . Upgrade the GPUs to models with larger memory capacity.
- E . Disable other models running on the same GPU to free up memory.
B,C
Explanation:
Options B and C directly address the 00M issue. CUDA memory pooling enables efficient reuse of GPU memory, minimizing allocations and deallocations. Reducing batch size and concurrency decreases the memory footprint of the model, alleviating the pressure on GPU memory. While upgrading GPUs (D) is a solution, it is more costly than optimizing the current configuration. Increasing system RAM (A) does not solve GPU memory issues. Disabling other models (E) reduces load but doesn’t address the core problem of the memory leak in the first place.
You are using Docker Compose to define a multi-container application that includes a GPU-accelerated service.
How would you configure the service in the ‘docker-compose.ymr file to leverage the NVIDIA runtime?
- A . Add ‘runtime: nvidia’ to the service definition.
- B . Set the environment variable ‘NVIDIA VISIBLE in the service definition.
- C . Add ‘deploy: resources: reservations: devices: – driver: nvidia count: all capabilities: [gpu]’ to the service definition.
- D . Add ‘command: ―gpus all’ to the service definition.
- E . Add ‘nvidia: all’ to the service definition.
A
Explanation:
To enable the NVIDIA runtime for a service in a ‘docker-compose.yml’ file, you should use the ‘runtime: nvidia’ directive within the service definition. The ‘deploy’ section is relevant for Swarm deployments, not standard Docker Compose. Environment variables like ‘NVIDIA VISIBLE DEVICES can further control GPU visibility, but the ‘runtime’ is fundamental for enabling the NVIDIA runtime itself. The ‘―gpus’ flag is a ‘docker run’ option, not a Compose configuration, and ‘nvidia: all’ is not a valid Compose option.
You’ve installed the NGC CLI, but when you run ‘ngc registry model list’ you get an error indicating authentication failure. You’re sure your API key is correct.
What could be the cause, and how would you diagnose this?
- A . The NGC CLI version is outdated. Upgrade to the latest version using ‘pip install ―upgrade nvidia-cli’.
- B . The environment variables ‘NGC API_KEY or ‘NGC CLI_API_KEY are set incorrectly or not set at all. Verify and set them correctly.
- C . Your organization might be behind a proxy that is blocking the NGC CLI from accessing the internet. Configure the proxy settings for the NGC CLI.
- D . Your account lacks the necessary permissions to access the NGC registry. Contact your NVIDIA administrator.
- E . The host machine’s clock is not synchronized, causing authentication issues. Synchronize the clock using ‘ntpd’ or ‘chronyd’.
C,D,E
Explanation:
Authentication failures can be caused by proxy issues (C), insufficient account permissions (D), or clock synchronization problems (E). While an outdated CLI version (A) could potentially cause issues, it’s less likely to manifest as an authentication failure. Environment variables (B) are generally not the primary source of error when using ‘ngc config set’ to configure authentication.
You’ve installed the NGC CLI, but when you run ‘ngc registry model list’ you get an error indicating authentication failure. You’re sure your API key is correct.
What could be the cause, and how would you diagnose this?
- A . The NGC CLI version is outdated. Upgrade to the latest version using ‘pip install ―upgrade nvidia-cli’.
- B . The environment variables ‘NGC API_KEY or ‘NGC CLI_API_KEY are set incorrectly or not set at all. Verify and set them correctly.
- C . Your organization might be behind a proxy that is blocking the NGC CLI from accessing the internet. Configure the proxy settings for the NGC CLI.
- D . Your account lacks the necessary permissions to access the NGC registry. Contact your NVIDIA administrator.
- E . The host machine’s clock is not synchronized, causing authentication issues. Synchronize the clock using ‘ntpd’ or ‘chronyd’.
C,D,E
Explanation:
Authentication failures can be caused by proxy issues (C), insufficient account permissions (D), or clock synchronization problems (E). While an outdated CLI version (A) could potentially cause issues, it’s less likely to manifest as an authentication failure. Environment variables (B) are generally not the primary source of error when using ‘ngc config set’ to configure authentication.
Which of the following is a primary benefit of using a CLOS network topology (e.g., Spine-Leaf) in a data center?
- A . Reduced capital expenditure (CAPEX)
- B . Increased network diameter
- C . Improved scalability and bandwidth utilization
- D . Simplified network management
- E . Enhanced security
C
Explanation:
CLOS networks like Spine-Leaf provide excellent scalability due to their non-blocking architecture, allowing for increased bandwidth utilization and easy expansion. CAPEX might be higher due to more switches. The network diameter can be larger compared to traditional topologies. While CLOS networks can be managed effectively, the management complexity can be higher. Security benefits are not a primary characteristic of the CLOS topology itself.
You are deploying a new A1 inference service using Triton Inference Server on a multi-GPU system. After deploying the models, you observe that only one GPU is being utilized, even though the models are configured to use multiple GPUs.
What could be the possible causes for this?
- A . The model configuration file does not specify the ‘instance_group’ parameter correctly to utilize multiple GPUs.
- B . The Triton Inference Server is not configured to enable CUDA Multi-Process Service (MPS).
- C . Insufficient CPU cores are available for the Triton Inference Server, limiting its ability to spawn multiple inference processes.
- D . The models are not optimized for multi-GPU inference, resulting in a single GPU bottleneck.
- E . The GPUs are not of the same type and Triton cannot properly schedule across them.
A,B
Explanation:
The ‘instance_group’ parameter in the model configuration dictates how Triton distributes the model across GPUs. Without proper configuration, it may default to a single GPIJ. CUDA MPS allows multiple CUDA applications (in this case, Triton inference processes) to share a single GPU, improving utilization. Insufficient CPU cores or non-optimized models could limit performance, but wouldn’t necessarily restrict usage to a single GPIJ. While dissimilar GPIJs can affect performance, Triton will attempt to schedule across them if configured correctly.
You have a Kubernetes cluster with nodes running different versions of the NVIDIA driver. You need to ensure that your containerized AI applications are always compatible with the specific driver version running on the node where they are scheduled.
How can you achieve this driver version compatibility in a cloud-native way?
- A . Manually create different container images for each driver version and use node selectors to schedule the correct image on the appropriate nodes.
- B . Use the NVIDIA driver capabilities to detect the driver version at runtime and dynamically load the correct libraries.
- C . Use the NVIDIA Operator to automatically manage driver installations and updates on the nodes, ensuring a consistent driver version across the cluster.
- D . Implement a webhook that inspects the node labels and injects the appropriate NVIDIA libraries into the pod at runtime.
- E . Use a shared volume to mount drivers into a container.
C
Explanation:
The NVIDIA Operator is designed to manage the lifecycle of NVIDIA drivers and related components within a Kubernetes cluster, including automated installations, updates, and version management. This ensures a consistent and compatible driver version across all nodes, simplifying application deployment and management. The operator is the cloud-native approach to managing the NVIDIA stack.
Option A is manual and inflexible.
Option B relies on proper NVIDIA Driver setup on the host.
Option D is a complex undertaking.
Option E is insecure.
You are managing an A1 infrastructure based on NVIDIA Spectrum-X switches. A new application requires strict Quality of Service (QOS) guarantees for its traffic. Specifically, you need to ensure that this application’s traffic receives preferential treatment and minimal latency.
What combination of Spectrum-X features and configurations would be MOST effective in achieving this?
- A . Configure DiffServ Code Point (DSCP) marking on the application’s traffic, map these DSCP values to specific traffic classes within the Spectrum-X switch, and configure Weighted Fair Queueing (WFQ) or Strict Priority Queueing on the egress ports.
- B . Increase the MTIJ size on all interfaces to reduce packet fragmentation and overall latency.
- C . Disable Adaptive Routing (AR) to ensure that traffic always takes the shortest path.
- D . Use VLAN tagging to isolate the application’s traffic into a separate virtual network.
- E . Enable broadcast storm protection.
A
Explanation:
DSCP marking, traffic class mapping, and WFQ/Strict Priority Queueing are fundamental QOS mechanisms. DSCP marking allows you to classify traffic based on application requirements. Traffic classes within the switch provide different levels of service. WFQ and Strict Priority Queueing ensure that high-priority traffic receives preferential treatment on egress ports. The other options are less relevant to QOS guarantees for a specific application.