Practice Free D-DS-FN-23 Exam Online Questions
You build a decision tree to classify five different types of customers based on their browsing history from a sample of 500. The resulting decision tree has 17 layers. One of the leaf nodes has only three customers.
What do you conclude?
- A . The decision tree needs to be rebuilt without the three customers
- B . The decision tree needs to be rebuilt to see if the results change
- C . The sample size is too small, so the classes may not be accurate
- D . Due to large number of layers, there may be an overfitting problem
Determine the frequency of calls by both product type and customer language.
Which goals are suitable to be completed with MapReduce?
- A . Goal 2 and 4
- B . Goal 1 and 3
- C . Goals 1, 2, 3, 4
- D . Goals 2, 3, 4
Consider a database with 4 transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
You decide to run the association rules algorithm where minimum support is 50%.
Which rule has a confidence at least 50%?
- A . {cheese} => {bread}
- B . {juice} => {cheese}
- C . {milk} => {soda}
- D . {soda} => {milk}
Which Hadoop service is responsible for requesting resources for, and monitoring the completion of, MapReduce processes?
- A . Application Manager
- B . NameNode
- C . Application Master
- D . DataNode
In association rules, given X -> Y, what is confidence?
- A . Difference in the probability of X and Y appearing together compared with expectations if they were statistically independent
- B . Percentage of transactions that contain the itemset
- C . How many times more often X and Y occur together than expected if they were statistically independent, expressed as a ratio
- D . Percentage of transactions with X that also contain Y
In association rules, given X -> Y, what is confidence?
- A . Difference in the probability of X and Y appearing together compared with expectations if they were statistically independent
- B . Percentage of transactions that contain the itemset
- C . How many times more often X and Y occur together than expected if they were statistically independent, expressed as a ratio
- D . Percentage of transactions with X that also contain Y
What is the optimal usage scenario for the Hadoop Distributed File System?
- A . Small files and low latency
- B . Small files and high throughput
- C . Large files and high throughput
- D . Large files and low latency
What converts SQL-like commands into either Tez, Spark, or MapReduce jobs that are submitted to the Hadoop cluster?
- A . Pig
- B . HBase
- C . Hive
- D . Mahout
What does R code nv <- v[v < 1000] do?
- A . Selects the values in vector v that are less than 1000 and assigns them to the vector nv
- B . Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
- C . Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv
- D . Selects values of vector v less than 1000, modifies v, and makes a copy to nv
In which phase of the analytic lifecycle would you expect to spend most of the project time?
- A . Discovery
- B . Data preparation
- C . Communicate Results
- D . Operationalize