Practice Free D-DS-FN-23 Exam Online Questions
You have the following corpus of texts:
“The cat hit the dog.”
“The dog bit the mail carrier.”
“The mail carrier chased the truck.”
“The truck hit the wall while avoiding the dog that chased the cat.”
“The cat climbed the wall.”
If the tf-idf metric is used to score relevance for search and retrieval, which term has the highest discriminatory power?
- A . Dog
- B . Chased
- C . Bit
- D . Truck
You have the following corpus of texts:
“The cat hit the dog.”
“The dog bit the mail carrier.”
“The mail carrier chased the truck.”
“The truck hit the wall while avoiding the dog that chased the cat.”
“The cat climbed the wall.”
If the tf-idf metric is used to score relevance for search and retrieval, which term has the highest discriminatory power?
- A . Dog
- B . Chased
- C . Bit
- D . Truck
You have fit a decision tree classifier using 12 input variables. The resulting tree used 7 of the 12 variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the model is 0.85.
What is your evaluation of this model?
- A . The tree is probably overfit. Try fitting shallower trees and using an ensemble method.
- B . The AUC is high, and the small nodes are all very pure. This is an accurate model.
- C . The tree did not split on all the input variables. You need a larger data set to get a more accurate model.
- D . The AUC is high, so the overall model is accurate. It is not well-calibrated, because the small nodes will give poor estimates of probability.
What does the R code z <- f[1:10, ] do?
- A . Assigns the first 10 rows of f to the vector z
- B . Assigns the 1st 10 columns of the 1st row of f to z
- C . Assigns a sequence of values from 1 to 10 to z
- D . Assigns the 1st 10 columns to z
When would you use GROUP BY ROLLUP clause in your OLAP query?
- A . where all subtotals and grand totals are to be included in the output
- B . where only the subtotals are to be included in the output
- C . where only the grand totals are to be included in the output
- D . where only specific subtotals and grand totals for a combination of variables are to be included in the output
Which activity is performed in the Operationalize phase of the data analytics lifecycle?
- A . Try different variables
- B . Try different analytical techniques
- C . Assess the benefits
- D . Transform existing variables
On analyzing the results of a K-means clustering output, you noticed that splits on variables you expected to see were not observed.
What actions should be taken?
- A . Use the value of K where the value of WSS given for K represents the overall dispersion of the data
- B . Decrease the value of K
- C . Decrease the number of variables in the model
- D . Increase the value of K
Which term is used to describe separating a database into separate parts that can be processed in parallel?
- A . Deduplication
- B . Replication
- C . Sharding
- D . Reconstituting
Consider these itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (gloves -> hat)?
- A . 75%
- B . 60%
- C . 66%
- D . 80%
What are considerations in a data science and Big Data analytics project?
- A . Ignoring executive stakeholders and business users
- B . Applying the latest technologies to demonstrate technical skills
- C . Analysis flexibility and decision making
- D . Building data silos and bypassing data privacy rules