Practice Free D-DS-FN-23 Exam Online Questions
Which word or phrase completes the statement; "The intersection of two sets is to INNER JOIN as the union of two sets is to __________."?
- A . FULL OUTER JOIN
- B . FULL CROSS JOIN
- C . CROSS JOIN
- D . OUTER JOIN
You have the data from a popular e-commerce website. You are exploring the time spent (in seconds) on the website by 100,000 customers across 14 different product categories.
What visualization can be used to represent the relationship between time spent and product category?
- A . Rug plot
- B . Scatter plot
- C . Box and whisker plot
- D . Hexbin plot
How is dimensionality defined in a "bag of words" document representation?
- A . Average number of words per sentence in the document
- B . Total number of words in the document
- C . Number of unique terms in the document
- D . Frequency of repeated words in the document
Which method is used to solve for coefficients b0, b1, .., bn in your linear regression model: Y = b0 + b1x1+b2x2+….+bnxn
- A . Ordinary Least squares
- B . Apriori Algorithm
- C . Ridge and Lasso
- D . Integer programming
You have created a scatterplot of two continuous variables for 2000 records. You want to add a line to the scatterplot to check linearity of the data.
Which function would best address this need?
- A . abline()
- B . glm()
- C . hist()
- D . lm()
What is an appropriate data visualization to use in a presentation for an analyst audience?
- A . Pie chart
- B . Area chart
- C . Stacked bar chart
- D . ROC curve
Refer to the exhibit.
What is the approximate R-squared value for a linear regression model fitted to the data associated with this scatterplot?
- A . 0.01
- B . 0.96
- C . 4
- D . 16
Refer to the exhibit.
What is the approximate R-squared value for a linear regression model fitted to the data associated with this scatterplot?
- A . 0.01
- B . 0.96
- C . 4
- D . 16
You have been assigned to run a linear regression model for each of 5, 000 distinct districts, and all the data is currently stored in a PostgreSQL database.
Which tool/library would you use to produce these models with the least effort?
- A . MADlib
- B . Mahout
- C . R
- D . HBase
The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop.
Which tool should they use?
- A . Sqoop
- B . Pig
- C . Chukwa
- D . Scribe