Practice Free Professional Data Engineer Exam Online Questions
You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed.
What should you do?
- A . Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.
- B . Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
- C . Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.
- D . Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.
The data analyst team at your company uses BigQuery for ad-hoc queries and scheduled SQL pipelines in a Google Cloud project with a slot reservation of 2000 slots. However, with the recent introduction of hundreds of new non time-sensitive SQL pipelines, the team is encountering frequent quota errors. You examine the logs and notice that approximately 1500 queries are being triggered concurrently during peak time. You need to resolve the concurrency issue.
What should you do?
- A . Update SQL pipelines and ad-hoc queries to run as interactive query jobs.
- B . Increase the slot capacity of the project with baseline as 0 and maximum reservation size as 3000.
- C . Update SOL pipelines to run as a batch query, and run ad-hoc queries as interactive query jobs.
- D . Increase the slot capacity of the project with baseline as 2000 and maximum reservation size as 3000.
C
Explanation:
To resolve the concurrency issue in BigQuery caused by the introduction of hundreds of non-time-sensitive SQL pipelines, the best approach is to differentiate the types of queries based on their urgency and resource requirements. Here’s why option C is the best choice:
SQL Pipelines as Batch Queries:
Batch queries in BigQuery are designed for non-time-sensitive operations. They run in a lower priority queue and do not consume slots immediately, which helps to reduce the overall slot consumption during peak times.
By converting non-time-sensitive SQL pipelines to batch queries, you can significantly alleviate the pressure on slot reservations.
Ad-Hoc Queries as Interactive Queries:
Interactive queries are prioritized to run immediately and are suitable for ad-hoc analysis where users expect quick results.
Running ad-hoc queries as interactive jobs ensures that analysts can get their results without delay, improving productivity and user satisfaction.
Concurrency Management:
This approach helps balance the workload by leveraging BigQuery’s ability to handle different types of queries efficiently, reducing the likelihood of encountering quota errors due to slot exhaustion.
Steps to Implement:
Identify Non-Time-Sensitive Pipelines:
Review and identify SQL pipelines that are not time-critical and can be executed as batch jobs.
Update Pipelines to Batch Queries:
Modify these pipelines to run as batch queries. This can be done by setting the priority of the query job to BATCH.
Ensure Ad-Hoc Queries are Interactive:
Ensure that all ad-hoc queries are submitted as interactive jobs, allowing them to run with higher priority and immediate slot allocation.
Reference: Links:
BigQuery Batch Queries
BigQuery Slot Allocation and Management
Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.
- A . Blaze
- B . Spark
- C . Fire
- D . Ignite
B
Explanation:
Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you use open source data tools for batch processing, querying, streaming, and machine learning.
Reference: https://cloud.google.com/dataproc/docs/
In order to securely transfer web traffic data from your computer’s web browser to the Cloud Dataproc cluster you should use a(n) _____.
- A . VPN connection
- B . Special browser
- C . SSH tunnel
- D . FTP connection
C
Explanation:
To connect to the web interfaces, it is recommended to use an SSH tunnel to create a secure connection to the master node.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#connecting_to_the_web_interfaces
In order to securely transfer web traffic data from your computer’s web browser to the Cloud Dataproc cluster you should use a(n) _____.
- A . VPN connection
- B . Special browser
- C . SSH tunnel
- D . FTP connection
C
Explanation:
To connect to the web interfaces, it is recommended to use an SSH tunnel to create a secure connection to the master node.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#connecting_to_the_web_interfaces
In order to securely transfer web traffic data from your computer’s web browser to the Cloud Dataproc cluster you should use a(n) _____.
- A . VPN connection
- B . Special browser
- C . SSH tunnel
- D . FTP connection
C
Explanation:
To connect to the web interfaces, it is recommended to use an SSH tunnel to create a secure connection to the master node.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#connecting_to_the_web_interfaces
Which is not a valid reason for poor Cloud Bigtable performance?
- A . The workload isn’t appropriate for Cloud Bigtable.
- B . The table’s schema is not designed correctly.
- C . The Cloud Bigtable cluster has too many nodes.
- D . There are issues with the network connection.
C
Explanation:
The Cloud Bigtable cluster doesn’t have enough nodes. If your Cloud Bigtable cluster is overloaded, adding more nodes can improve performance. Use the monitoring tools to check whether the cluster is overloaded.
Reference: https://cloud.google.com/bigtable/docs/performance
You currently have transactional data stored on-premises in a PostgreSQL database. To modernize your data environment, you want to run transactional workloads and support analytics needs with a single database. You need to move to Google Cloud without changing database management systems, and minimize cost and complexity.
What should you do?
- A . Migrate your workloads to AlloyDB for PostgreSQL.
- B . Migrate to BigQuery to optimize analytics.
- C . Migrate and modernize your database with Cloud Spanner.
- D . Migrate your PostgreSQL database to Cloud SQL for PostgreSQL.
A
Explanation:
The key requirements are:
On-premises PostgreSQL database.
Run transactional workloads AND support analytics needs with a single database.
Move to Google Cloud without changing database management systems (i.e., remain PostgreSQL-compatible).
Minimize cost and complexity.
AlloyDB for PostgreSQL (Option A) is the best fit for these requirements.
PostgreSQL-Compatible: AlloyDB is fully PostgreSQL-compatible, meaning minimal to no application changes are required ("without changing database management systems").
Transactional and Analytical Workloads: AlloyDB is designed to handle demanding transactional workloads while also providing significantly faster analytical query performance compared to standard PostgreSQL. It achieves this through its intelligent, database-optimized storage layer and columnar engine integration. This addresses the "single database" for both needs.
Cost and Complexity: As a managed service, it reduces operational complexity. Its performance benefits for both OLTP and OLAP can lead to better cost-efficiency by handling mixed workloads effectively on a single system.
Let’s analyze why other options are less suitable:
B (Migrate to BigQuery): BigQuery is an analytical data warehouse, not designed for transactional workloads. This violates the "single database" for both types of workloads and "without changing database management systems" (as BigQuery is not PostgreSQL).
C (Migrate to Cloud Spanner): Cloud Spanner is a globally distributed, horizontally scalable relational database. While excellent for high-availability transactional workloads, it has its own SQL dialect (ANSI 2011 with extensions, not fully PostgreSQL wire-compatible without tools like PGAdapter, which adds complexity) and a different architecture. This would involve more significant changes than moving to a PostgreSQL-compatible system. The requirement was "without changing database management systems."
D (Migrate to Cloud SQL for PostgreSQL): Cloud SQL for PostgreSQL is a fully managed PostgreSQL service. It’s excellent for transactional workloads and simpler analytical queries. However, for more demanding analytical needs on the same database instance, AlloyDB is specifically optimized to provide superior performance due to its architectural enhancements (like the columnar engine). If the analytical needs are significant, AlloyDB offers a better converged experience. While Cloud SQL is PostgreSQL-compatible, AlloyDB is positioned for superior performance on mixed workloads.
Reference: Google Cloud Documentation: AlloyDB for PostgreSQL > Overview. "AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service for your most demandingtransactional and analytical workloads… AlloyDB offers full PostgreSQL compatibility, so you can migrate your existing PostgreSQL applications with no code changes."
Google Cloud Documentation: AlloyDB for PostgreSQL > Key benefits. Highlights include "Industry-leading performance: …up to 100x faster analytical queries than standard PostgreSQL." and "Support for transactional and analytical workloads: AlloyDB is designed to efficiently handle both transactional and analytical queries, allowing you to use a single database for a wide range of applications."
You currently have transactional data stored on-premises in a PostgreSQL database. To modernize your data environment, you want to run transactional workloads and support analytics needs with a single database. You need to move to Google Cloud without changing database management systems, and minimize cost and complexity.
What should you do?
- A . Migrate your workloads to AlloyDB for PostgreSQL.
- B . Migrate to BigQuery to optimize analytics.
- C . Migrate and modernize your database with Cloud Spanner.
- D . Migrate your PostgreSQL database to Cloud SQL for PostgreSQL.
A
Explanation:
The key requirements are:
On-premises PostgreSQL database.
Run transactional workloads AND support analytics needs with a single database.
Move to Google Cloud without changing database management systems (i.e., remain PostgreSQL-compatible).
Minimize cost and complexity.
AlloyDB for PostgreSQL (Option A) is the best fit for these requirements.
PostgreSQL-Compatible: AlloyDB is fully PostgreSQL-compatible, meaning minimal to no application changes are required ("without changing database management systems").
Transactional and Analytical Workloads: AlloyDB is designed to handle demanding transactional workloads while also providing significantly faster analytical query performance compared to standard PostgreSQL. It achieves this through its intelligent, database-optimized storage layer and columnar engine integration. This addresses the "single database" for both needs.
Cost and Complexity: As a managed service, it reduces operational complexity. Its performance benefits for both OLTP and OLAP can lead to better cost-efficiency by handling mixed workloads effectively on a single system.
Let’s analyze why other options are less suitable:
B (Migrate to BigQuery): BigQuery is an analytical data warehouse, not designed for transactional workloads. This violates the "single database" for both types of workloads and "without changing database management systems" (as BigQuery is not PostgreSQL).
C (Migrate to Cloud Spanner): Cloud Spanner is a globally distributed, horizontally scalable relational database. While excellent for high-availability transactional workloads, it has its own SQL dialect (ANSI 2011 with extensions, not fully PostgreSQL wire-compatible without tools like PGAdapter, which adds complexity) and a different architecture. This would involve more significant changes than moving to a PostgreSQL-compatible system. The requirement was "without changing database management systems."
D (Migrate to Cloud SQL for PostgreSQL): Cloud SQL for PostgreSQL is a fully managed PostgreSQL service. It’s excellent for transactional workloads and simpler analytical queries. However, for more demanding analytical needs on the same database instance, AlloyDB is specifically optimized to provide superior performance due to its architectural enhancements (like the columnar engine). If the analytical needs are significant, AlloyDB offers a better converged experience. While Cloud SQL is PostgreSQL-compatible, AlloyDB is positioned for superior performance on mixed workloads.
Reference: Google Cloud Documentation: AlloyDB for PostgreSQL > Overview. "AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service for your most demandingtransactional and analytical workloads… AlloyDB offers full PostgreSQL compatibility, so you can migrate your existing PostgreSQL applications with no code changes."
Google Cloud Documentation: AlloyDB for PostgreSQL > Key benefits. Highlights include "Industry-leading performance: …up to 100x faster analytical queries than standard PostgreSQL." and "Support for transactional and analytical workloads: AlloyDB is designed to efficiently handle both transactional and analytical queries, allowing you to use a single database for a wide range of applications."
You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables.
What should you do?
- A . Make a call to the Stackdriver API to list all logs, and apply an advanced filter.
- B . In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.
- C . In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.
- D . Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.
