Dell Data Science Foundations Questions and Answers
In addition to quantitative and technical skills, what is a key aspect of the profile of a data scientist?
What metrics are used to help calculate relevance in text analysis?
In time series analysis, what function is examined to identify the order of the autoregressive component of an ARIMA model?
What are categorized as cluster and workflow management tools for Hadoop?
How should project results be communicated to executives and the project sponsor?
Refer to the exhibit, which shows pairwise counts for items purchased together.
Consider the following association rule: Milk -> Eggs
What is value of the lift?
Refer to the exhibit.
To predict whether or not a customer will renew their annual property insurance policy, an insurance company built and operationalized a naïve Bayes classification model. In the model, there are two class labels, renewal and non-renewal, that are assigned to each customer based on their attributes.
A subset of the key attributes, their values, and corresponding conditional probabilities are provided in the exhibit.
A customer has the following attributes:
● Age is greater than 65 years
● Owns their own home
● Renewal month is August
If 20% of customers do not renew the police every year, what is the score for a renewal in the naïve Bayesian model for the customer described above?
MapReduce is designed to process data in which way?
You have been given a task to improve sales force compensation of your organization. As a result of a study, your team decides to classify personnel as follows:
● Did not meet quota
● Met quota
● Exceeded 150% of quota
In which data analytics lifecycle phase should you define these categories for analysis purposes?
When should you consider using multinomial logistic regression over binary logistic regression?
After which phase of the data analytics lifecycle should you determine if the model needs any recalibration?
What action occurs during feature selection in the model building phase of the data analytics lifecycle?
What is the similarity between the matrix and array data structures in R?
In K-means clustering, what is a graph of the WSS versus the value of K used to help determine?
Refer to the exhibit.
What is the approximate R-squared value for a linear regression model fitted to the data associated with this scatterplot?
When using association rules, what is an itemset?
What are three built-in data types in the R programming language?