Weekend Biggest Discount Flat 70% Offer - Ends in 0d 00h 00m 00s - Coupon code: 70diswrap

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Dumps

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Question 1

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

A.

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B.

itemsDf.join(transactionsDf, itemId == transactionId)

C.

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D.

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E.

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Question 2

Which of the following describes Spark's standalone deployment mode?

Options:

A.

Standalone mode uses a single JVM to run Spark driver and executor processes.

B.

Standalone mode means that the cluster does not contain the driver.

C.

Standalone mode is how Spark runs on YARN and Mesos clusters.

D.

Standalone mode uses only a single executor per worker per application.

E.

Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

Question 3

The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame

transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))

Options:

A.

1. sample

2. True

3. 0.15

4. filter

B.

1. sample

2. False

3. 0.15

4. select

C.

1. sample

2. 0.85

3. False

4. select

D.

1. fraction

2. 0.15

3. True

4. where

E.

1. fraction

2. False

3. 0.85

4. select

Question 4

Which of the following DataFrame operators is never classified as a wide transformation?

Options:

A.

DataFrame.sort()

B.

DataFrame.aggregate()

C.

DataFrame.repartition()

D.

DataFrame.select()

E.

DataFrame.join()

Question 5

Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame

transactionsDf?

Options:

A.

transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

B.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

C.

transactionsDf.select(corr(predError, value).alias("corr"))

D.

transactionsDf.select(corr(col("predError"), col("value")).alias("corr"))

(Correct)

E.

transactionsDf.select(corr("predError", "value"))

Question 6

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

A.

spark.read.json(filePath)

B.

spark.read.path(filePath, source="json")

C.

spark.read().path(filePath)

D.

spark.read().json(filePath)

E.

spark.read.path(filePath)

Question 7

The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.

Code block:

transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

Options:

A.

The "outer" argument should be eliminated, since "outer" is the default join type.

B.

The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

C.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

D.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

E.

The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Question 8

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

Options:

A.

Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.

B.

Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.

C.

Use a narrow transformation to reduce the number of partitions.

D.

Use a wide transformation to reduce the number of partitions.

Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Question 9

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

Options:

A.

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

B.

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

C.

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D.

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E.

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Question 10

Which of the following describes characteristics of the Spark driver?

Options:

A.

The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

B.

If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

C.

The Spark driver processes partitions in an optimized, distributed fashion.

D.

In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

E.

The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Question 11

Which of the following statements about RDDs is incorrect?

Options:

A.

An RDD consists of a single partition.

B.

The high-level DataFrame API is built on top of the low-level RDD API.

C.

RDDs are immutable.

D.

RDD stands for Resilient Distributed Dataset.

E.

RDDs are great for precisely instructing Spark on how to do a query.

Question 12

Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?

Options:

A.

transactionsDf.sample(True, 0.5)

B.

transactionsDf.take(1000).distinct()

C.

transactionsDf.sample(False, 0.5)

D.

transactionsDf.take(1000)

E.

transactionsDf.sample(True, 0.5, force=True)

Question 13

Which of the following statements about Spark's DataFrames is incorrect?

Options:

A.

Spark's DataFrames are immutable.

B.

Spark's DataFrames are equal to Python's DataFrames.

C.

Data in DataFrames is organized into named columns.

D.

RDDs are at the core of DataFrames.

E.

The data in DataFrames may be split into multiple chunks.

Question 14

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

The column names should be listed directly as arguments to the operator and not as a list.

B.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

C.

The select operator should be replaced by a drop operator.

D.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

E.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Question 15

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

Options:

A.

1. select

2. col("storeId")

3. cast

4. StringType

B.

1. select

2. col("storeId")

3. as

4. StringType

C.

1. cast

2. "storeId"

3. as

4. StringType()

D.

1. select

2. col("storeId")

3. cast

4. StringType()

E.

1. select

2. storeId

3. cast

4. StringType()

Question 16

Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code

block is run twice?

Options:

A.

itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

B.

itemsDf.sample(fraction=0.1, seed=87238)

C.

itemsDf.sample(fraction=1000, seed=98263)

D.

itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

E.

itemsDf.sample(fraction=0.1)

Question 17

Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?

Options:

A.

transactionsDf.filter(productId==3 or productId<1)

B.

transactionsDf.filter((col("productId")==3) or (col("productId")<1))

C.

transactionsDf.filter(col("productId")==3 | col("productId")<1)

D.

transactionsDf.where("productId"=3).or("productId"<1))

E.

transactionsDf.filter((col("productId")==3) | (col("productId")<1))

Question 18

Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?

Options:

A.

1.print(itemsDf.columns)

2.print(itemsDf.types)

B.

itemsDf.printSchema()

C.

spark.schema(itemsDf)

D.

itemsDf.rdd.printSchema()

E.

itemsDf.print.schema()

Question 19

Which of the following statements about Spark's configuration properties is incorrect?

Options:

A.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

B.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

C.

The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.

D.

The default number of partitions to use when shuffling data for joins or aggregations is 300.

E.

The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

Question 20

Which of the following describes characteristics of the Spark UI?

Options:

A.

Via the Spark UI, workloads can be manually distributed across executors.

B.

Via the Spark UI, stage execution speed can be modified.

C.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

D.

There is a place in the Spark UI that shows the property spark.executor.memory.

E.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Question 21

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

Options:

A.

transactionsDf.drop(["predError", "value"])

B.

transactionsDf.drop("predError", "value")

C.

transactionsDf.drop(col("predError"), col("value"))

D.

transactionsDf.drop(predError, value)

E.

transactionsDf.drop("predError & value")

Question 22

The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of

the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+

2.|itemId|itemName |supplier |

3.+------+----------------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |YetiX |

6.|3 |Outdoors Backpack |Sports Company Inc.|

7.+------+----------------------------------+-------------------+

Code block:

itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))

Options:

A.

1. withColumn

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 4

(Correct)

B.

1. withColumnRenamed

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 4

C.

1. withColumnRenamed

2. "itemName"

3. split

4. "itemNameBetweenSeparators"

5. 4

D.

1. withColumn

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 5

E.

1. withColumn

2. itemNameBetweenSeparators

3. str_split

4. "itemName"

5. 5

Question 23

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

Options:

A.

1. spark

2. conf

3. set

4. "spark.sql.shuffle.partitions"

B.

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

C.

1. spark

2. conf

3. get

4. "spark.sql.shuffle.partitions"

D.

1. pyspark

2. config

3. set

4. "spark.sql.shuffle.partitions"

E.

1. spark

2. conf

3. set

4. "spark.sql.aggregate.partitions"

Question 24

Which of the following describes the role of the cluster manager?

Options:

A.

The cluster manager schedules tasks on the cluster in client mode.

B.

The cluster manager schedules tasks on the cluster in local mode.

C.

The cluster manager allocates resources to Spark applications and maintains the executor processes in client mode.

D.

The cluster manager allocates resources to Spark applications and maintains the executor processes in remote mode.

E.

The cluster manager allocates resources to the DataFrame manager.

Question 25

Which is the highest level in Spark's execution hierarchy?

Options:

A.

Task

B.

Executor

C.

Slot

D.

Job

E.

Stage

Question 26

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

Options:

A.

1. filter

2. array_contains("cozy")

3. select

4. "itemId"

5. explode

6. "attributes"

B.

1. where

2. "array_contains(attributes, 'cozy')"

3. select

4. itemId

5. explode

6. attributes

C.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. map

6. "attributes"

D.

1. filter

2. "array_contains(attributes, cozy)"

3. select

4. "itemId"

5. explode

6. "attributes"

E.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. explode

6. "attributes"

Question 27

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier

whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

2.|itemId|itemName |attributes |supplier |

3.+------+----------------------------------+-----------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.__1__(__2__).select(__3__, __4__)

Options:

A.

1. filter

2. col("supplier").isin("Sports")

3. "itemName"

4. explode(col("attributes"))

B.

1. where

2. col("supplier").contains("Sports")

3. "itemName"

4. "attributes"

C.

1. where

2. col(supplier).contains("Sports")

3. explode(attributes)

4. itemName

D.

1. where

2. "Sports".isin(col("Supplier"))

3. "itemName"

4. array_explode("attributes")

E.

1. filter

2. col("supplier").contains("Sports")

3. "itemName"

4. explode("attributes")

Page: 1 / 18
Total 180 questions