SKILL BENCHMARK

# Statistics for Data Analysis Proficiency (Advanced Level)

• 20m
• 20 questions
The Statistics for Data Analysis Proficiency (Advanced Level) benchmark will measure your ability to recall and relate the underlying statistics and probability concepts for data analysis and perform statistical analysis, as well as probability calculations using Python and pandas. You will be evaluated on your ability to recognize the deeper concepts of statistics and probability, such as descriptive and inferential statistics, probability concepts, probability distributions, Bayesian networks, hypothesis tests, etc. A learner who scores high on this benchmark demonstrates that they have the strong statistics and probability skills required to work independently on data analytics projects.

## Topics covered

• analyze a uniform distribution by using cumulative distribution and probability density functions
• compute marginal probabilities
• define a Bayesian model in Python
• describe and compare skewness and kurtosis
• describe the Poisson distribution and its applications
• estimate a population's mean with confidence intervals
• explore cumulative distribution, probability mass, and survival functions with binomial data
• explore the probability tables of nodes in a Bayesian network
• invoke functions available in SciPy to work with Poisson distributions
• outline one-way ANOVA and linear regression
• outline the relationship between type I errors and alpha levels
• outline the relationship between type II errors and alpha levels
• predict values with Bayesian models
• recall the central limit theorem and recognize its applications
• recognize the use of the Mann-Whitney U-test
• simulate the rolling of two die to test joint probability
• use SciPy to generate uniformly distributed samples
• use the cumulative distribution function (CDF) of a normal distribution and recognize how the mean and standard deviation (SD) influence it
• use two-way ANOVA with interaction between the independent variables
• visualize the cumulative distribution function (CDF) for different standard deviations