On the career path to Data Science, a fundamental understanding and the application and visualization of statistics is required. Discover how to use the NumPy, Pandas, and SciPy libraries to perform various statistical summary operations on real datasets and how to visualize your datasets in the context of these summaries using Matplotlib.

Data Science Statistics: Using Python to Compute & Visualize Statistics

Course Overview

create and configure simple graphs with lines and markers using the Matplotlib data visualization library

use the NumPy library to manipulate arrays and the Pandas library to load and analyze a dataset

generate histograms and pie charts to analyze distributions and create scatter plots to plot the relationship between two variables in a dataset

apply Python native functions such as max() and sum() to summarize distributions and visualize these values using Matplotlib

use NumPy to compute statistics such as the mean and median on your data

calculate statistics such as the mode and standard error of mean using the SciPy library and compute more statistics such as variance and values at various percentiles using NumPy

use NumPy to compute the correlation and covariance of two distributions and visualize their relationship with scatterplots

standardize a distribution to express its values as z-scores and use Pandas to generate a correlation and covariance matrix for your dataset

create and configure a graph using Matplotlib, enumerate the details conveyed in a Boxplot, compute statistical values using the NumPy function, and compute the correlations between all pairs of columns in a Pandas dataframe