Practice Free PCAD-31-02 Exam Online Questions
Which actions are valid techniques for handling erroneous categorical values in a dataset? (Choose two)
- A . Converting all values to integers
- B . Replacing inconsistent labels with a standardized value
- C . Removing rows with invalid labels
- D . Normalizing using min-max scaling
Which SQL clause is most appropriate when you need to filter records that meet a specific condition during data retrieval in an analytics pipeline?
- A . ORDER BY
- B . GROUP BY
- C . HAVING
- D . WHERE
Which element is essential to justify a conclusion drawn from a dataset?
- A . The use of color in plots
- B . The origin of the dataset
- C . Logical reasoning and supported metrics
- D . File format of the source data
Why is it important to adjust data presentations based on the audience’s background?
- A . To avoid using charts altogether
- B . To simplify all metrics to percentages only
- C . To ensure the data is understood and supports actionable insights
- D . To include as many technical terms as possible
Which technique would be most appropriate to handle missing numerical values in a dataset intended for machine learning?
- A . Replacing with NULL
- B . Dropping all columns
- C . Imputation using mean or median
- D . Filling with random values
What is the most appropriate way to ensure that a column used as a foreign key contains only valid references to a parent DataFrame in Pandas?
- A . Use df.dropna() on the foreign key column
- B . Use .isin() to compare the foreign key column against the parent key column
- C . Use df.sort_values() to sort both columns before merging
- D . Use df.merge() with how=’outer’
Which operations are valid when working with Pandas Series? (choose two)
- A . Arithmetic vector operations
- B . Merging by index using merge()
- C . Applying NumPy universal functions
- D . Using .columns to rename
When performing bootstrapping on a dataset with 500 observations, what is a typical procedure?
- A . Creating samples by removing all duplicates
- B . Generating multiple datasets of the same size by randomly sampling with replacement
- C . Scaling all values between 0 and 1 before resampling
- D . Drawing one sample and calculating the mean only once
Which techniques can be used to select a subset of rows and columns from a DataFrame using labels? (choose two)
- A . df.loc[:, ‘col1’]
- B . df.iloc[0:5, ‘col2’]
- C . df.loc[2:7, [‘col1’, ‘col2’]]
- D . df[‘col1’, ‘col2’]
Which term best describes the process of combining customer data from multiple systems into a single unified dataset?
- A . Data binning
- B . Data warehousing
- C . Data normalization
- D . Data integration
