Practice Free DP-600 Exam Online Questions
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have a Fabric tenant that contains a new semantic model in OneLake.
You use a Fabric notebook to read the data into a Spark DataFrame.
You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.
Solution: You use the following PySpark expression:
df.explain().show()
Does this meet the goal?
- A . Yes
- B . No
B
Explanation:
Correct Solution: You use the following PySpark expression:
df.summary()
summary(*statistics)[source]
Computes specified statistics for numeric and string columns. Available statistics are: – count – mean – stddev – min – max – arbitrary approximate percentiles specified as a percentage (eg, 75%)
If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles (percentiles at 25%, 50%, and 75%), and max.
Note This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
>>> df.summary().show() +——-+——————+—–+
| stddev|2.1213203435596424| null|
Incorrect:
* df.show()
* df.explain().show()
* df.explain()
explain(extended=False)[source]
Prints the (logical and physical) plans to the console for debugging purpose.
Parameters: extended C boolean, default False. If False, prints only the physical plan.
>>> df.explain()
== Physical Plan ==
Scan ExistingRDD[age#0,name#1]
>>> df.explain(True)
== Parsed Logical Plan ==
…
== Analyzed Logical Plan ==
…
== Optimized Logical Plan ==
…
== Physical Plan ==
Reference: https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have a Fabric tenant that contains a new semantic model in OneLake.
You use a Fabric notebook to read the data into a Spark DataFrame.
You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.
Solution: You use the following PySpark expression:
df.explain().show()
Does this meet the goal?
- A . Yes
- B . No
B
Explanation:
Correct Solution: You use the following PySpark expression:
df.summary()
summary(*statistics)[source]
Computes specified statistics for numeric and string columns. Available statistics are: – count – mean – stddev – min – max – arbitrary approximate percentiles specified as a percentage (eg, 75%)
If no statistics are given, this function computes count, mean, stddev, min, approximate quartiles (percentiles at 25%, 50%, and 75%), and max.
Note This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
>>> df.summary().show() +——-+——————+—–+
| stddev|2.1213203435596424| null|
Incorrect:
* df.show()
* df.explain().show()
* df.explain()
explain(extended=False)[source]
Prints the (logical and physical) plans to the console for debugging purpose.
Parameters: extended C boolean, default False. If False, prints only the physical plan.
>>> df.explain()
== Physical Plan ==
Scan ExistingRDD[age#0,name#1]
>>> df.explain(True)
== Parsed Logical Plan ==
…
== Analyzed Logical Plan ==
…
== Optimized Logical Plan ==
…
== Physical Plan ==
Reference: https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have a Fabric tenant that contains a lakehouse named Lakehouse1. Lakehouse1 contains a Delta table named Customer.
When you query Customer, you discover that the query is slow to execute. You suspect that maintenance was NOT performed on the table.
You need to identify whether maintenance tasks were performed on Customer.
Solution: You run the following Spark SQL statement:
DESCRIBE HISTORY customer
Does this meet the goal?
- A . Yes
- B . No
A
Explanation:
Correct Solution: You run the following Spark SQL statement:
DESCRIBE HISTORY customer
DESCRIBE HISTORY
Applies to: Databricks SQL, Databricks Runtime
Returns provenance information, including the operation, user, and so on, for each write to a table. Table history is retained for 30 days.
Syntax
DESCRIBE HISTORY table_name
Note: Work with Delta Lake table history
Each operation that modifies a Delta Lake table creates a new table version. You can use history information to audit operations, rollback a table, or query a table at a specific point in time using time travel.
Retrieve Delta table history
You can retrieve information including the operations, user, and timestamp for each write to a Delta table by running the history command. The operations are returned in reverse chronological order.
DESCRIBE HISTORY ‘/data/events/’ — get the full history of the table
DESCRIBE HISTORY delta.`/data/events/`
DESCRIBE HISTORY ‘/data/events/’ LIMIT 1 — get the last operation only
DESCRIBE HISTORY eventsTable
Incorrect:
* DESCRIBE DETAIL customer
DESCRIBE TABLE statement returns the basic metadata information of a table. The metadata information includes column name, column type and column comment. Optionally a partition spec or column name may be specified to return the metadata pertaining to a partition or column respectively.
* EXPLAIN TABLE customer
* REFRESH TABLE
REFRESH TABLE statement invalidates the cached entries, which include data and metadata of the given table or view. The invalidated cache is populated in lazy manner when the cached table or the query associated with it is executed again.
Syntax
REFRESH [TABLE] tableIdentifier
Reference:
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-describe-history
https://docs.gcp.databricks.com/en/delta/history.html
https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-aux-refresh-table.html
