Core Statistical Concepts: Statistics & Sampling with Python

Statistics    |    Beginner
  • 11 Videos | 1h 40m 26s
  • Includes Assessment
  • Earns a Badge
Data is one of the most valuable assets a business has, but it's only as valuable as the methods used to interpret it. Data science, which at its core includes statistics and sampling, is the key to data interpretation. In this course, practice using the pandas library in Python to work with statistics and sampling. Practice loading data from a CSV file into a pandas DataFrame. Compute a variety of statistics on data. While doing so, see how to visualize the relationship between data and computed statistics. Moving along, implement several sampling techniques, such as stratified sampling and cluster sampling. Then, explore how a balanced sample can be created from an imbalanced dataset using the imblearn module in Python. Upon completion, you'll be able to generate samples and compute statistics using various tools and methods.

WHAT YOU WILL LEARN

  • discover the key concepts covered in this course
    install the latest versions of pandas and visualization modules used to analyze data
    load data from a CSV file into a pandas DataFrame and perform some initial analysis
    calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
    use Seaborn and Matplotlib to visualize a distribution and where the mean, median, and mode fit in
    compute and visualize the standard deviation and variance of a distribution
  • implement simple random and stratified sampling on a data frame
    use pandas to generate a sample using cluster and systematic sampling
    create a balanced sample using random undersampling and oversampling
    generate synthetic data in order to create a balanced sample using the Synthetic Minority Over-sampling Technique (SMOTE)
    summarize the key concepts covered in this course

IN THIS COURSE

  • Playable
    1. 
    Course Overview
    2m 43s
    UP NEXT
  • Playable
    2. 
    Installing pandas and Data Visualization Modules
    6m 24s
  • Locked
    3. 
    Loading and Analyzing Data Using pandas
    11m 42s
  • Locked
    4. 
    Computing the Mean and Median of a Distribution
    8m 59s
  • Locked
    5. 
    Visualizing Distributions with Seaborn & Matplotlib
    11m 16s
  • Locked
    6. 
    Computing Variance and Standard Deviation
    12m 26s
  • Locked
    7. 
    Generating Random and Stratified Samples
    12m 31s
  • Locked
    8. 
    Implementing Cluster and Systematic Sampling
    10m 38s
  • Locked
    9. 
    Implementing Undersampling and Oversampling
    10m 45s
  • Locked
    10. 
    Oversampling with SMOTE
    6m 42s
  • Locked
    11. 
    Course Summary
    1m 51s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.