有難いAssociate-Developer-Apache-Spark-3.5|素敵なAssociate-Developer-Apache-Spark-3.5日本語版試験解答試験|試験の準備方法Databricks Certified Associate Developer for Apache Spark 3.5 - Python前提条件何でも上昇しているこの時代に、自分の制限を突破したくないのですか。給料を倍増させることも不可能ではないです。DatabricksのAssociate-Developer-Apache-Spark-3.5試験に合格したら、あなたは夢を実現することができます。Xhs1991はあなたの最高のトレーニング資料を提供して、100パーセントの合格率を保証します。これは本当のことです。疑いなくすぐXhs1991のDatabricksのAssociate-Developer-Apache-Spark-3.5試験トレーニング資料を購入しましょう。 Databricks Certified Associate Developer for Apache Spark 3.5 - Python 認定 Associate-Developer-Apache-Spark-3.5 試験問題 (Q37-Q42):質問 # 37
45 of 55.
Which feature of Spark Connect should be considered when designing an application that plans to enable remote interaction with a Spark cluster?
A. It can be used to interact with any remote cluster using the REST API.
B. It allows for remote execution of Spark jobs.
C. It is primarily used for data ingestion into Spark from external sources.
D. It provides a way to run Spark applications remotely in any programming language.
正解:B
解説:
Spark Connect enables remote execution of Spark jobs by decoupling the client from the driver using the Spark Connect protocol (gRPC).
It allows users to run Spark code from different environments (like notebooks, IDEs, or remote clients) while executing jobs on the cluster.
Key Features:
Enables remote interaction between client and Spark driver.
Supports interactive development and lightweight client sessions.
Improves developer productivity without needing driver resources locally.
Why the other options are incorrect:
A: Spark Connect is not limited to ingestion tasks.
B: It allows multi-language clients (Python, Scala, etc.) but runs via Spark Connect API, not arbitrary remote code.
C: Uses gRPC protocol, not REST.
Reference:
Databricks Exam Guide (June 2025): Section "Using Spark Connect to Deploy Applications" - describes Spark Connect architecture and remote execution model.
Spark 3.5 Documentation - Spark Connect overview and client-server protocol.
質問 # 38
A data engineer noticed improved performance after upgrading from Spark 3.0 to Spark 3.5. The engineer found that Adaptive Query Execution (AQE) was enabled.
Which operation is AQE implementing to improve performance?
A. Optimizing the layout of Delta files on disk
B. Improving the performance of single-stage Spark jobs
C. Collecting persistent table statistics and storing them in the metastore for future use
D. Dynamically switching join strategies
正解:D
解説:
Adaptive Query Execution (AQE) is a Spark 3.x feature that dynamically optimizes query plans at runtime. One of its core features is:
Dynamically switching join strategies (e.g., from sort-merge to broadcast) based on runtime statistics.
Other AQE capabilities include:
Coalescing shuffle partitions
Skew join handling
Option A is correct.
Option B refers to statistics collection, which is not AQE's primary function.
Option C is too broad and not AQE-specific.
Option D refers to Delta Lake optimizations, unrelated to AQE.
Final answer: A
質問 # 39
A data engineer wants to create an external table from a JSON file located at /data/input.json with the following requirements:
Create an external table named users
Automatically infer schema
Merge records with differing schemas
Which code snippet should the engineer use?
Options:
A. CREATE EXTERNAL TABLE users USING json OPTIONS (path '/data/input.json', schemaMerge 'true')
B. CREATE EXTERNAL TABLE users USING json OPTIONS (path '/data/input.json', mergeSchema 'true')
C. CREATE TABLE users USING json OPTIONS (path '/data/input.json')
D. CREATE EXTERNAL TABLE users USING json OPTIONS (path '/data/input.json')
正解:B
解説:
To create an external table and enable schema merging, the correct syntax is:
CREATE EXTERNAL TABLE users
USING json
OPTIONS (
path '/data/input.json',
mergeSchema 'true'
)
mergeSchema is the correct option key (not schemaMerge)
EXTERNAL allows Spark to query files without managing their lifecycle
質問 # 40
Given the code:
df = spark.read.csv("large_dataset.csv")
filtered_df = df.filter(col("error_column").contains("error"))
mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count")) reduced_df = mapped_df.groupBy("date").sum("count") reduced_df.count() reduced_df.show() At which point will Spark actually begin processing the data?
A. When the groupBy transformation is applied
B. When the show action is applied
C. When the filter transformation is applied
D. When the count action is applied
正解:D
解説:
Spark uses lazy evaluation. Transformations like filter, select, and groupBy only define the DAG (Directed Acyclic Graph). No execution occurs until an action is triggered.
The first action in the code is:reduced_df.count()
So Spark starts processing data at this line.
Reference:Apache Spark Programming Guide - Lazy Evaluation
質問 # 41
37 of 55.
A data scientist is working with a Spark DataFrame called customerDF that contains customer information.
The DataFrame has a column named email with customer email addresses.
The data scientist needs to split this column into username and domain parts.
Which code snippet splits the email column into username and domain columns?
A. customerDF = customerDF.withColumn("username", regexp_replace(col("email"), "@", ""))
B. customerDF = customerDF.select("email").alias("username", "domain")
C. customerDF = customerDF.withColumn("domain", col("email").split("@")[1])
D. customerDF = customerDF
.withColumn("username", split(col("email"), "@").getItem(0))
.withColumn("domain", split(col("email"), "@").getItem(1))
正解:D
解説:
The split() function in PySpark splits strings into an array based on a given delimiter.
Then, .getItem(index) extracts a specific element from the array.
Correct usage:
from pyspark.sql.functions import split, col
customerDF = customerDF
.withColumn("username", split(col("email"), "@").getItem(0))
.withColumn("domain", split(col("email"), "@").getItem(1))
This creates two new columns derived from the email field:
"username" → text before @
"domain" → text after @
Why the other options are incorrect:
B: regexp_replace only replaces text; does not split into multiple columns.
C: .select() cannot alias multiple derived columns like this.
D: Column objects are not native Python strings; cannot use standard .split().
Reference:
PySpark SQL Functions - split() and getItem().
Databricks Exam Guide (June 2025): Section "Developing Apache Spark DataFrame/DataSet API Applications" - manipulating and splitting column data.