Associate-Developer-Apache-Spark-3.5無料サンプル、Associate-Developer-Apache-Spark-3.5資格模擬多くのIT業界の友達によるとDatabricks認証試験を準備することが多くの時間とエネルギーをかからなければなりません。もし訓練班とオンライン研修などのルートを通じないと試験に合格するのが比較的に難しい、一回に合格率非常に低いです。Jpexamはもっとも頼られるトレーニングツールで、DatabricksのAssociate-Developer-Apache-Spark-3.5認定試験の実践テストソフトウェアを提供したり、DatabricksのAssociate-Developer-Apache-Spark-3.5認定試験の練習問題と解答もあって、最高で最新なDatabricksのAssociate-Developer-Apache-Spark-3.5認定試験「Databricks Certified Associate Developer for Apache Spark 3.5 - Python」問題集も一年間に更新いたします。 Databricks Certified Associate Developer for Apache Spark 3.5 - Python 認定 Associate-Developer-Apache-Spark-3.5 試験問題 (Q100-Q105):質問 # 100
A data engineer wants to create a Streaming DataFrame that reads from a Kafka topic called feed.
Which code fragment should be inserted in line 5 to meet the requirement?
Code context:
spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.[LINE 5]
.load()
Options:
A. .option("topic", "feed")
B. .option("kafka.topic", "feed")
C. .option("subscribe", "feed")
D. .option("subscribe.topic", "feed")
正解:C
解説:
To read from a specific Kafka topic using Structured Streaming, the correct syntax is:
python
CopyEdit
.option("subscribe", "feed")
This is explicitly defined in the Spark documentation:
"subscribe - The Kafka topic to subscribe to. Only one topic can be specified for this option." (Source: Apache Spark Structured Streaming + Kafka Integration Guide)
"subscribe - The Kafka topic to subscribe to. Only one topic can be specified for this option." (Source: Apache Spark Structured Streaming + Kafka Integration Guide) B . "subscribe.topic" is invalid.
C . "kafka.topic" is not a recognized option.
D . "topic" is not valid for Kafka source in Spark.
質問 # 101
A data engineer is building an Apache Spark™ Structured Streaming application to process a stream of JSON events in real time. The engineer wants the application to be fault-tolerant and resume processing from the last successfully processed record in case of a failure. To achieve this, the data engineer decides to implement checkpoints.
Which code snippet should the data engineer use?
A. query = streaming_df.writeStream
.format("console")
.outputMode("append")
.option("checkpointLocation", "/path/to/checkpoint")
.start()
B. query = streaming_df.writeStream
.format("console")
.outputMode("complete")
.start()
C. query = streaming_df.writeStream
.format("console")
.option("checkpoint", "/path/to/checkpoint")
.outputMode("append")
.start()
D. query = streaming_df.writeStream
.format("console")
.outputMode("append")
.start()
正解:A
解説:
Comprehensive and Detailed Explanation From Exact Extract:
To enable fault tolerance and ensure that Spark can resume from the last committed offset after failure, you must configure a checkpoint location using the correct option key:"checkpointLocation".
From the official Spark Structured Streaming guide:
"To make a streaming query fault-tolerant and recoverable, a checkpoint directory must be specified using.
option("checkpointLocation", "/path/to/dir")."
Explanation of options:
Option A uses an invalid option name:"checkpoint"(should be"checkpointLocation") Option B is correct: it setscheckpointLocationproperly Option C lacks checkpointing and won't resume after failure Option D also lacks checkpointing configuration Reference: Apache Spark 3.5 Documentation # Structured Streaming # Fault Tolerance Semantics
質問 # 102
An engineer notices a significant increase in the job execution time during the execution of a Spark job. After some investigation, the engineer decides to check the logs produced by the Executors.
How should the engineer retrieve the Executor logs to diagnose performance issues in the Spark application?
A. Use the Spark UI to select the stage and view the executor logs directly from the stages tab.
B. Locate the executor logs on the Spark master node, typically under the /tmp directory.
C. Fetch the logs by running a Spark job with the spark-sql CLI tool.
D. Use the command spark-submit with the -verbose flag to print the logs to the console.
正解:A
解説:
The Spark UI is the standard and most effective way to inspect executor logs, task time, input size, and shuffles.
From the Databricks documentation:
"You can monitor job execution via the Spark Web UI. It includes detailed logs and metrics, including task-level execution time, shuffle reads/writes, and executor memory usage." (Source: Databricks Spark Monitoring Guide) Option A is incorrect: logs are not guaranteed to be in /tmp, especially in cloud environments.
B . -verbose helps during job submission but doesn't give detailed executor logs.
D . spark-sql is a CLI tool for running queries, not for inspecting logs.
Hence, the correct method is using the Spark UI → Stages tab → Executor logs.
質問 # 103
A data engineer is working with a large JSON dataset containing order information. The dataset is stored in a distributed file system and needs to be loaded into a Spark DataFrame for analysis. The data engineer wants to ensure that the schema is correctly defined and that the data is read efficiently.
Which approach should the data scientist use to efficiently load the JSON data into a Spark DataFrame with a predefined schema?
A. Use spark.read.format("json").load() and then use DataFrame.withColumn() to cast each column to the desired data type.
B. Use spark.read.json() to load the data, then use DataFrame.printSchema() to view the inferred schema, and finally use DataFrame.cast() to modify column types.
C. Use spark.read.json() with the inferSchema option set to true
D. Define a StructType schema and use spark.read.schema(predefinedSchema).json() to load the data.
正解:D
解説:
The most efficient and correct approach is to define a schema using StructType and pass it to spark.read.schema(...).
This avoids schema inference overhead and ensures proper data types are enforced during read.
Example:
from pyspark.sql.types import StructType, StructField, StringType, DoubleType schema = StructType([ StructField("order_id", StringType(), True), StructField("amount", DoubleType(), True),
...
])
df = spark.read.schema(schema).json("path/to/json")
- Source: Databricks Guide - Read JSON with predefined schema
質問 # 104
25 of 55.
A Data Analyst is working on employees_df and needs to add a new column where a 10% tax is calculated on the salary.
Additionally, the DataFrame contains the column age, which is not needed.
Which code fragment adds the tax column and removes the age column?
A. employees_df = employees_df.withColumn("tax", lit(0.1)).drop("age")
B. employees_df = employees_df.dropField("age").withColumn("tax", col("salary") * 0.1)
C. employees_df = employees_df.withColumn("tax", col("salary") * 0.1).drop("age")
D. employees_df = employees_df.withColumn("tax", col("salary") + 0.1).drop("age")
正解:C
解説:
To create a new calculated column in Spark, use the .withColumn() method.
To remove an unwanted column, use the .drop() method.
Correct syntax:
from pyspark.sql.functions import col
employees_df = employees_df.withColumn("tax", col("salary") * 0.1).drop("age")
.withColumn("tax", col("salary") * 0.1) → adds a new column where tax = 10% of salary.
.drop("age") → removes the age column from the DataFrame.
Why the other options are incorrect:
B: lit(0.1) creates a constant value, not a calculated tax.
C: .dropField() is not a DataFrame API method (used only in struct field manipulations).
D: Adds 0.1 to salary instead of calculating 10%.
Reference:
PySpark DataFrame API - withColumn(), drop(), and col().
Databricks Exam Guide (June 2025): Section "Developing Apache Spark DataFrame/DataSet API Applications" - manipulating, renaming, and dropping columns.