Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers
Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?
Which of the following describes Spark's standalone deployment mode?
The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame
transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))
Which of the following DataFrame operators is never classified as a wide transformation?
Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame
transactionsDf?
Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?
The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.
Find the error.
Code block:
transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")
Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?
Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?
Which of the following describes characteristics of the Spark driver?
Which of the following statements about RDDs is incorrect?
Which of the following code blocks returns approximately 1000 rows, some of them potentially being duplicates, from the 2000-row DataFrame transactionsDf that only has unique rows?
Which of the following statements about Spark's DataFrames is incorrect?
The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame
transactionsDf. Find the errors.
Code block:
transactionsDf.select([col(productId), col(f)])
Sample of transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.+-------------+---------+-----+-------+---------+----+
The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to
accomplish this.
transactionsDf.__1__(__2__.__3__(__4__))
Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code
block is run twice?
Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?
Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?
Which of the following statements about Spark's configuration properties is incorrect?
Which of the following describes characteristics of the Spark UI?
Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?
The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of
the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+
2.|itemId|itemName |supplier |
3.+------+----------------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |YetiX |
6.|3 |Outdoors Backpack |Sports Company Inc.|
7.+------+----------------------------------+-------------------+
Code block:
itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))
The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code
block to accomplish this.
spark.sql.shuffle.partitions
__1__.__2__.__3__(__4__, 100)
Which of the following describes the role of the cluster manager?
Which is the highest level in Spark's execution hierarchy?
The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate
row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes
contains the element cozy.
A sample of DataFrame itemsDf is below.
Code block:
itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))
The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier
whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+
Code block:
itemsDf.__1__(__2__).select(__3__, __4__)