試験の準備方法-素敵なDatabricks-Certified-Data-Engineer-Associate日本語版トレーリング試験-完璧なDatabricks-Certified-Data-Engineer-Associate資料勉強当社Xhs1991の製品は、実践と記憶に値する専門知識の蓄積です。一緒に参加して、お客様のニーズに合わせてDatabricks-Certified-Data-Engineer-Associateガイドクイズの成功に貢献する多くの専門家がいます。仕事に取り掛かって顧客とやり取りする前に厳密に訓練された責任ある忍耐強いスタッフ。 Databricks-Certified-Data-Engineer-Associate試験の準備の質を実践し、経験すると、それらの保守性と有用性を思い出すでしょう。 Databricks-Certified-Data-Engineer-Associate練習教材が試験受験者の98%以上が夢の証明書を取得するのに役立った理由を説明しています。あなたもそれを手に入れることができると信じてください。 Databricks Certified Data Engineer Associate Exam 認定 Databricks-Certified-Data-Engineer-Associate 試験問題 (Q54-Q59):質問 # 54
Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?
A. The ability to collaborate in real time on a single notebook
B. The ability to support batch and streaming workloads
C. The ability to distribute complex data operations
D. The ability to set up alerts for query failures
E. The ability to manipulate the same data using a variety of languages
正解:B
解説:
Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse. Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale1. Delta Lake supports upserts using the merge operation, which enables you to efficiently update existing data or insert new data into your Delta tables2. Delta Lake also provides time travel capabilities, which allow you to query previous versions of your data or roll back to a specific point in time3. Reference: 1: What is Delta Lake? | Databricks on AWS 2: Upsert into a table using merge | Databricks on AWS 3: [Query an older snapshot of a table (time travel) | Databricks on AWS] Learn more
質問 # 55
Which of the following must be specified when creating a new Delta Live Tables pipeline?
A. A location of a target database for the written data
B. At least one notebook library to be executed
C. A path to cloud storage location for the written data
D. A key-value pair configuration
E. The preferred DBU/hour cost
正解:B
解説:
Option E is the correct answer because it is the only mandatory requirement when creating a new Delta Live Tables pipeline. A pipeline is a data processing workflow that contains materialized views and streaming tables declared in Python or SQL source files. Delta Live Tables infers the dependencies between these tables and ensures updates occur in the correct order. To create a pipeline, you need to specify at least one notebook library to be executed, which contains the Delta Live Tables syntax. You can also specify multiple libraries of different languages within your pipeline. The other options are optional or not applicable for creating a pipeline. Option A is not required, but you can optionally provide a key-value pair configuration to customize the pipeline settings, such as the storage location, the target schema, the notifications, and the pipeline mode.
Option B is not applicable, as the DBU/hour cost is determined by the cluster configuration, not the pipeline creation. Option C is not required, but you can optionally specify a storage location for the output data from the pipeline. If you leave it empty, the system uses a default location. Option D is not required, but you can optionally specify a location of a target database for the written data, either in the Hive metastore or the Unity Catalog.
Tutorial: Run your first Delta Live Tables pipeline, What is Delta Live Tables?, Create a pipeline, Pipeline configuration.
質問 # 56
In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?
A. Replayable Sources and Idempotent Sinks
B. Structured Streaming cannot record the offset range of the data being processed in each trigger.
C. Checkpointing and Write-ahead Logs
D. Checkpointing and Idempotent Sinks
E. Write-ahead Logs and Idempotent Sinks
正解:C
解説:
Structured Streaming uses checkpointing and write-ahead logs to record the offset range of the data being processed in each trigger. This ensures that the engine can reliably track the exact progress of the processing and handle any kind of failure by restarting and/or reprocessing. Checkpointing is the mechanism of saving the state of a streaming query to fault-tolerant storage (such as HDFS) so that it can be recovered after a failure. Write-ahead logs are files that record the offset range of the data being processed in each trigger and are written to the checkpoint location before the processing starts. These logs are used to recover the query state and resume processing from the last processed offset range in case of a failure. References: Structured Streaming Programming Guide, Fault Tolerance Semantics
質問 # 57
A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.
They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
A. USING DELTA
B. FROM CSV
C. USING CSV
D. FROM "path/to/csv"
E. None of these lines of code are needed to successfully complete the task
正解:C
質問 # 58
A data engineer needs to parse only png files in a directory that contains files with different suffixes. Which code should the data engineer use to achieve this task?