SKILL BENCHMARK

# Statistics for Data Analysis Competency (Intermediate Level)

• 20m
• 20 questions
The Statistics for Data Analysis Competency (Intermediate Level) benchmark will measure your ability to recall and relate the underlying statistics and probability concepts for data analysis, as well as perform statistical analysis and probability calculations using Python and Pandas. You will be evaluated on your ability to recognize the concepts of statistics and probability, such as data types, descriptive and inferential statistics and their applications, probability concepts, marginal and joint probabilities, Bayes rule, and performing statistical and probability calculations with Python. A learner who scores high on this benchmark demonstrates that they have good statistics and probability skills and can work on data analysis projects with minimal supervision.

## Topics covered

• analyze and visualize data using box plots
• calculate joint probabilities associated with the rolling of a die
• calculate marginal and conditional probability on dependent variables
• calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
• compute and visualize the standard deviation and variance of a distribution
• compute conditional probabilities
• compute marginal probabilities
• create a balanced sample using random undersampling and oversampling
• define and understand the Bayes theorem
• define joint, marginal, and conditional probability
• describe different types of probability distributions and where they occur
• enumerate the architecture of Bayesian networks
• explore the probability tables of nodes in a Bayesian network
• identify what different statistical terms represent
• implement simple random and stratified sampling on a data frame
• link the definitions of marginal and conditional probability
• outline the chain rule of probability
• query Bayesian networks to measure probabilities
• recognize how data is distributed using histograms and violin plots
• use pandas to generate a sample using cluster and systematic sampling