There are numerous options available to scale and encode features and labels in data sets to get the best out of machine learning (ML) algorithms. In this 10-video course, explore techniques such as standardizing, nomalizing, and one-hot encoding. Learners begin by learning how to use Pandas library to load a data set in the form of a CSV file and perform exploratory analysis on its features. Then use scikit-learn's Binarizer to transform the continuous data in a series to binary values; apply the MiniMaxScaler on a data set to get two similar columns to have the same range of values; and standardize multiple columns in data sets with scikit-learn's StandardScaler. Examine differences between the Normalizer and other scaling techniques, and learn how to represent values in a column as a proportion of the maximum absolute value by using the MaxAbScaler. Finally, discover how to use Pandas library to one-hot encode one or more features of your data set and distinguish between this technique and label encoding. The concluding exercise involves building ML training sets.
This 7-video course helps learners discover how to implement machine learning scaling techniques such as standardizing and min-max scaling on continuous data and one-hot encoding on categorical features to improve performance of linear regression models. In the first tutorial, you will use Pandas library to load a CSV file into a data frame and analyze its contents by using Pandas and Matplotlib. You will then learn how to create a linear regression model with scikit-learn to predict the sale price of a house and evaluate this model by using metrics such as mean squared error and r-square. Next, learners will examine the application of min-max scaling on continuous fields and one-hot encoding on the categorical columns of a data set. Then analyze effects of preprocessing by recognizing benefits of scaling and encoding data sets by evaluating the performance of a regression model built with preprocessed data. Also, learn how to use scikit-learn's StandardScaler on a data set's continuous features and compare its effects with that of min-max scaling. The concluding exercise involves preprocessing data for regression.
In this course, learners can explore how to implement machine learning scaling techniques such as standardizing and normalizing on continuous data and label encoding on the target, in order to get the best out of machine learning algorithms. Examine dimensionality reduction by using Principal Component Analysis (PCA). Start this 6-video course by using Pandas library to load a CSV data set into a data frame and scale continuous features by using a standard scaler. You will then learn how to build and evaluate a support vector classifier in scikit-learn; use Pandas and Seaborn to generate a heat map; and spot the correlations between features in a data set. Discover how to apply the technique of PCA to reduce the number of dimensions in your input data and obtain the explained variance of each principal component. In the course's final tutorial, you will explore how to apply normalization and PCA on data sets and build a classification model with the principal components of scaled data. The concluding exercise involves processing data for classification.
Machine learning (ML) is everywhere these days, often invisible to most of us. In this course, you will discover one of the fundamental problems in the world of ML: linear regression. Explore how this is solved with classic ML as well as neural networks. Key concepts covered here include how regression can be used to represent a relationship between two variables; applications of regression, and why it is used to make predictions; and how to evaluate the quality of a regression model by measuring its loss. Next, learn techniques used to make predictions with regression models; compare classic ML and deep learning techniques to perform a regression; and observe various components of a neural network and how they fit together. You will learn the two types of functions used in a neuron and their individual roles; how to calculate the optimal weights and biases of a neural network; and how to find the optimal parameters for a neural network.
Learn how to use the Scikit Learn and Keras libraries to build a linear regression model to predict a house price. This course reviews the steps needed to prepare data and configure regression models. It shows how to prepare a data set to feed a linear regression model; how to use the Pandas library to load a CSV data set file; and how to configure, train, and validate linear regression models. The course also shows how to visualize metrics with Matplotlib; how to prepare data for a Keras model, how to learn the architecture for a Keras sequential model and initialize it; and finally, how train it to use optimal weights and biases for machine learning solutions.
Discrete mathematics is the study of objects that take on distinct, separated values. The study of discrete mathematics is important in the field of Computer Science as computers can only understand discrete binary numbers. Use this course to learn more about the use and importance of discrete mathematics in the world of computer science. Examine the use of sets and perform common operations on them in Python. These operations include union, intersection, difference, and symmetric difference. When you are finished with this course, you will have the skills to use and work with sets in the real world using Python.
Probability is a branch of mathematics that deals with uncertainty, specifically with numerical estimates of how likely an event is to occur and what might happen if that event does or does not occur. Probability has many applications in statistics, engineering, finance, machine learning, and computer science. Get acquainted with the basic constructs of probability through this course. Start by examining different types of events, outcomes, and the complement of an event. You will then simulate various probabilistic experiments in Python and note how the outcomes of these experiments tend to converge towards theoretically expected outcomes as the number of trials increases. By the time you finish this course, you will be able to define and measure probabilities of common events and simulate probabilistic experiments using Python.
Linear algebra comes in handy when we need to work with a set of points represented in multi-dimensional space. Use this course to explore how systems of linear functions and equations can be represented using linear algebra. Examine how to define and compute the addition, scalar multiplication, dot product, and cross product operations on vectors, and discover how these operations are required while working with matrices. This course will also help you explore matrix multiplication, the inverse and transpose of a matrix, and computing the determinant of a matrix. By the time you finish this course, you will be able to express a system of linear functions as a matrix and perform fundamental operations on matrices, including matrix multiplication and the computation of inverses and determinants.
Simple to use yet efficient and reliable, support vector machines (SVMs) are supervised learning methods popularly used for classification tasks. This course uncovers the math behind SVMs, focusing on how an optimum SVM hyperplane for classification is computed. Explore the representation of data in a feature space, finding a hyperplane to separate the data linearly. Then, learn how to separate non-linear data. Investigate the optimization problem for SVM classifiers, looking at how the weights of the model can be adjusted during training to get the best hyperplane separating the data points. Furthermore, apply gradient descent to solve the optimization problem for SVMs. When you're done, you'll have the foundational knowledge you need to start building and applying SVMs for machine learning.
Users marvel at a system's ability to recommend items they're likely to appreciate. As someone working with machine learning, implementing these recommendation systems (also called recommender systems) can dramatically increase user engagement and goodwill towards your products or brand. Use this course to comprehend the math behind recommendation systems and how to apply latent factor analysis to make recommendations to users. Examine the intuition behind recommender systems before investigating two of the main techniques used to build them: content-based filtering and collaborative filtering. Moving on, explore latent factor analysis by decomposing a ratings matrix into its latent factors using the gradient descent algorithm and implementing this technique to decompose a ratings matrix using the Python programming language. By the end of this course, you'll be able to build a recommendation system model that best suits your products and users.
This 13-video course explores recommendation engines, systems which provide various users with items or products that they may be interested in by observing their previous purchasing, search, and behavior histories. They are used in many industries to help users find or explore products and content; for example, to find movies, news, insurance, and a myriad of other products and services. Learners will examine the three main types of recommendation systems: item-based, user-based or collaborative, and content-based. The course next examines how to collect data to be used for learning, training, and evaluation. You will learn how to use RStudio, an open-source IDE (integrated development environment) to import, filter, and massage data into data sets. Learners will create an R function that will give a score to an item based on other user ratings and similarity scores. You will learn to use R to create a function called compareUsers, to create an item-to-item similarity or content score. Finally, learn to validate and score by using the built-in R function RMSE (root mean square error).
Data wrangling, an increasingly popular tool among today's top firms, has become more complex as data become even more unstructured and varied in their source. In this 13-video Skillsoft Aspire course, you will learn how to simplify the task by organizing and cleaning disparate data to present your data in the best format possible with Trifacta, which accelerates data wrangling to enhance productivity for data scientists. Learn to reshape data, look up data, and pivot data. Explore essential methods for wrangling data, including how to use Trifacta to standardize, format, filter, and extract data. Also covered are other key topics: how to split and merge columns; utilize conditional aggregation; apply transforms to reshape data; and join two data sets into one by using join operations. In the concluding exercise, learners will be asked to start by loading a data set into Trifacta; to replace any missing values, if necessary; and to use a row filter operation, use a group by operation, and use an aggregate function in the group by operation.
With data now being one of the most valuable assets to tap into, the demand for data science skills increases by the day. Statistics and sampling are at the core of data science. Use this course as a theoretical introduction to using samples to reveal various statistics. Examine what exactly is meant by statistics and samples. Explore descriptive statistics, namely measures of central tendency and of dispersion. Study probability sampling techniques, including simple random sampling and cluster sampling. Investigate how undersampling and oversampling are used to generate more balanced datasets. Upon completion, you'll know the best way to use statistics and samples for your specific goals and needs.
Data is one of the most valuable assets a business has, but it's only as valuable as the methods used to interpret it. Data science, which at its core includes statistics and sampling, is the key to data interpretation. In this course, practice using the pandas library in Python to work with statistics and sampling. Practice loading data from a CSV file into a pandas DataFrame. Compute a variety of statistics on data. While doing so, see how to visualize the relationship between data and computed statistics. Moving along, implement several sampling techniques, such as stratified sampling and cluster sampling. Then, explore how a balanced sample can be created from an imbalanced dataset using the imblearn module in Python. Upon completion, you'll be able to generate samples and compute statistics using various tools and methods.
Machine learning (ML) is everywhere these days, often invisible to most of us. In this course, you will discover one of the fundamental problems in the world of ML: linear regression. Explore how this is solved with classic ML as well as neural networks. Key concepts covered here include how regression can be used to represent a relationship between two variables; applications of regression, and why it is used to make predictions; and how to evaluate the quality of a regression model by measuring its loss. Next, learn techniques used to make predictions with regression models; compare classic ML and deep learning techniques to perform a regression; and observe various components of a neural network and how they fit together. You will learn the two types of functions used in a neuron and their individual roles; how to calculate the optimal weights and biases of a neural network; and how to find the optimal parameters for a neural network.
Learn how to use the Scikit Learn and Keras libraries to build a linear regression model to predict a house price. This course reviews the steps needed to prepare data and configure regression models. It shows how to prepare a data set to feed a linear regression model; how to use the Pandas library to load a CSV data set file; and how to configure, train, and validate linear regression models. The course also shows how to visualize metrics with Matplotlib; how to prepare data for a Keras model, how to learn the architecture for a Keras sequential model and initialize it; and finally, how train it to use optimal weights and biases for machine learning solutions.
Learners continue their exploration of data science in this 10-video course, which deals with using NumPy, Pandas, and SciPy libraries to perform various statistical summary operations on real data sets. This beginner-level course assumes some prior experience with Python programming and an understanding of basic statistical concepts such as mean, standard deviation, and correlation. The course opens by exploring different ways to visualize data by using the Matplotlib library, including univariate and bivariate distributions. Next, you will move to computing descriptor statistics for distributions, such as variance and standard error, by using the NumPy, Pandas, and SciPy libraries. Learn about the concept of the z-score, in which every value in a distribution is expressed in terms of the number of standard deviations from the mean value. Then cover the computation of the z-score for a series using SciPy. In the closing exercise, you will make use of the matplotlib data visualization library through three points represented by given coordinates, then enumerate all of the details which are conveyed in a Boxplot.
Along the career path to Data Science, a fundamental understanding of statistics and modeling is required. The goal of all modeling is generalizing as well as possible from a sample to the population of big data as a whole. In this 10-video Skillsoft Aspire course, learners explore the first step in this process. Key concepts covered here include the objectives of descriptive and inferential statistics, and distinguishing between the two; objectives of population and sample, and distinguishing between the two; and objectives of probability and non-probability sampling and distinguishing between them. Learn to define the average of a data set and its properties; the median and mode of a data set and their properties; and the range of a data set and its properties. Then study the inter-quartile range of a data set and its properties; the variance and standard deviation of a data set and their properties; and how to differentiate between inferential and descriptive statistics, the two most important types of descriptive statistics, and the formula for standard deviation.
Data science is an interdisciplinary field that seeks to find interesting generalizable insights within data and then puts those insights to monetizable use. In this 8-video Skillsoft Aspire course, learners can explore the first step in obtaining a representative sample from which meaningful generalizable insights can be obtained. Examine basic concepts and tools in statistical theory, including the two most important approaches to sampling-probability and nonprobability sampling-and common sampling techniques used for both approaches. Learn about simple random sampling, systematic random sampling, and stratified random sampling, including their advantages and disadvantages. Next, explore sampling bias. Then consider what is probably the most popular type of nonprobability sampling technique-the case study, used in medical education, business education, and other fields. A concluding exercise on efficient sampling invites learners to review their new knowledge by defining the two properties of all probability sampling techniques; enumerating the three types of probability sampling techniques; and listing two types of nonprobability sampling.
In this Skillsoft Aspire course on data science, learners can explore hypothesis testing, which finds wide applications in data science. This beginner-level, 10-video course builds upon previous coursework by introducing simple inferential statistics, called the backbone of data science, because they seek to posit and prove or disprove relationships within data. You will start by learning steps in simple hypothesis testing: the null and alternative hypotheses, s-statistic, and p-value, as ach term is introduced and explained. Next, listen to an informative discussion of a specific family of hypothesis tests, the t-test. Then learn to describe their applications, and become familiar with how to use cases including linear regression. Learn about Gaussian distribution and the related concepts of correlation, which measures relationships between any two variables, and autocorrelation, a special form used in the concept of time-series analysis. In the closing exercise, review your knowledge by differentiating between the null and the alternative hypotheses in a hypothesis testing procedure, then enumerating four distinct uses for different types of t-tests.
Explore how statistical analysis can turn raw data into insights, and then examine how to use the data to improve business intelligence, in this 10-video course. Learn how to scrutinize and perform analytics on the collected data. The course explores several approaches for identifying values and insights from data by using various standard and intuitive principles, including data exploration and data ingestion, along with the practical implementation by using R. First, you will learn how to detect outliers by using R, and how to compare simple linear regression models, with and without outliers, to improve the quality of the data. Because today's data are available in diversified formats, with large volume and high velocity, this course next demonstrates how to use a variety of technologies: Apache Kafka, Apache NiFi, Apache Sqoop, and Wavefront (a program for simulating two-dimensional acoustic systems) to ingest data. Finally, you will learn how these tools can help users in data extraction, scalability, integration support, and security.
In this 12-video course, learners will explore the concept of computational theory and its models by discovering how to model and implement computational theory on formal language, automata theory, and context-free grammar. Begin by examining the computational theory fundamentals and the prominent branches of computation, and also the prominent models of computation for machine learning. Then look at the concept of automata theory and list the prominent automata classes. Next, explore the finite state machine principles, and recognize the essential principles driving formal language theory and the automata theory principles. Learners will recall the formal language elements; define the concept of regular expressions; and list the theorems used to manage the semantics. Examine the concept of regular grammar and list the essential grammars used to generate regular languages. Also, examine regular language closure properties, and defining and listing the prominent features of context-free grammar. The concluding exercise involves identifying practical usage, branches, and models of computational theory, specifying notations of formal language, and listing types of context-free grammar.
Discover the concepts of pushdown automata, Turing machines, and finite transducers in this 12-video course, in which learners can examine how to identify limitations and complexities in computation and how to apply P and NP classes to manage them. Begin by recalling the machine learning analytical capabilities of grammar, then look at context-free grammar normal forms, using Chomsky normal forms and Greibach normal forms to manage context-free grammars. Describe pushdown automata and features of nondeterministic pushdown automata. This leads on to Turing machines, their capabilities, and the prominent variations in the building themes of Turing machines. Learners explore the concept of finite transducers, and the types of finite transducers. Recall the underlying limitations of computations and the limitations of computational theory, and the complexities of computation, computational theory complexities, and how it can impact Turing machine models and language families. Learn about handling computation complexities with P class and handling computation complexities with NP class. The concluding exercise involves describing properties and variations of Turing machines, types of finite transducers, and properties of recursively enumerable languages.
Machine learning (ML) is everywhere these days, often invisible to most of us. In this course, you will discover one of the fundamental problems in the world of ML: linear regression. Explore how this is solved with classic ML as well as neural networks. Key concepts covered here include how regression can be used to represent a relationship between two variables; applications of regression, and why it is used to make predictions; and how to evaluate the quality of a regression model by measuring its loss. Next, learn techniques used to make predictions with regression models; compare classic ML and deep learning techniques to perform a regression; and observe various components of a neural network and how they fit together. You will learn the two types of functions used in a neuron and their individual roles; how to calculate the optimal weights and biases of a neural network; and how to find the optimal parameters for a neural network.
Learn how to use the Scikit Learn and Keras libraries to build a linear regression model to predict a house price. This course reviews the steps needed to prepare data and configure regression models. It shows how to prepare a data set to feed a linear regression model; how to use the Pandas library to load a CSV data set file; and how to configure, train, and validate linear regression models. The course also shows how to visualize metrics with Matplotlib; how to prepare data for a Keras model, how to learn the architecture for a Keras sequential model and initialize it; and finally, how train it to use optimal weights and biases for machine learning solutions.
Several factors usually influence an outcome, and users need to consider all of those by using regression. Regression models help us mathematically evaluate our hunches. This course explores machine learning techniques and the risks involved with multiple factor linear regression. Key concepts covered here include reasons to use multiple features in a regression, and how to configure, train, and evaluate the linear regression model. Next, learn to create a data set with multiple features in a form that can be fed to a neural network for training and validation. Review Keras sequential model architecture, its training parameters, and ways to test its predictions. Learn how to use Pandas and Seaborn to view correlations and enumerate risks. Conclude by applying parsimonious regression to rebuild linear regression models.
Logistic regression is a technique used to estimate the probability of an outcome for machine learning solutions. In this 10-video course, learners discover the concepts and explore how logistic regression is used to predict categorical outcomes. Key concepts covered here include the qualities of a logistic regression S-curve and the kind of data it can model; learning how a logistic regression can be used to perform classification tasks; and how to compare logistic regression with linear regression. Next, you will learn how neural networks can be used to perform a logistic regression; how to prepare a data set to build, train, and evaluate a logistic regression model in Scikit Learn; and how to use a logistic regression model to perform a classification task and evaluate the performance of the model. Learners observe how to prepare a data set to build, train, and evaluate a Keras sequential model, and how to build, train, and validate Keras models by defining various components, including activation functions, optimizers and the loss function.
This 6-video course focuses on understanding Google's TensorFlow estimators, and showing learners how they simplify the task of building simple linear and logistic regression models for machine learning solutions. As a prerequisite, learners should have a basic understanding of ML (machine learning), and basic experience programming in Python. Though not required, familiarity with the Scikit-learn library and the Keras API will simplify the labs part of this course. First, you will learn how TensorFlow estimators abstract many of the details in creating a neural network, and you will then learn that you no longer need to define the type of neural network model, nor will you need to add definitions to layer. When using an estimator, learners only need to feed in training and validation data. In the course labs, you will build both a linear regression model and a classifier by using TensorFlow estimators. Finally, you will learn how to evaluate your model using the prebuilt methods available in the estimator.
Explore the concept of machine learning linear models, classifications of linear models, and prominent statistical approaches used to implement linear models. This 11-video course also explores the concepts of bias, variance, and regularization. Key concepts covered here include learning about linear models and various classifications used in predictive analytics; learning different statistical approaches that are used to implement linear models [single regression, multiple regression and analysis of variance (ANOVA)]; and various essential components of a generalized linear model (random component, linear predictor and link function). Next, discover differences between the ANOVA and analysis of covariance (ANCOVA) approaches of statistical testing; learn about implementation of linear regression models by using Scikit-learn; and learn about the concepts of bias, variance, and regularization and their usages in evaluating predictive models. Learners explore the concept of ensemble techniques and illustrate how bagging and boosting algorithms are used to manage predictions, and learn to implement bagging algorithms with the approach of random forest by using Scikit-learn. Finally, observe how to implement boosting ensemble algorithms by using Adaboost classifier in Python.
Explore the features of simple and multiple regression, implement simple and multiple regression models, and explore concepts of gradient descent and regularization and different types of gradient descent and regularization. Key concepts covered in this 12-video course include characteristics of the prominent types of linear regression; essential features of simple and multiple regressions and how they are used to implement linear models; and how to implement simple regression models by using Python libraries for machine learning solutions. Next, observe how to implement multiple regression models in Python by using Scikit-learn and StatsModels; learn the different types of gradient descent; and see how to classify the prominent gradient descent optimization algorithms from the perspective of their mathematical representation. Learn how to implement a simple representation of gradient descent using Python; how to implement linear regression by using mini-batch gradient descent to compute hypothesis and predictions; and learn the benefits of regularization and the objectives of L1 and L2 regularization. Finally, learn how to implement L1 and L2 regularization of linear models by using Scikit-learn.
Explore the fundamentals of linear algebra, including characteristics and its role in machine learning, in this 13-video course. Learners can examine important concepts associated with linear algebra, such as the class of spaces, types of vector space, vector norms, linear product vector and theorems, and various operations that can be performed on matrix. Key concepts examined in this course include important classes of spaces associated with linear algebra; features of vector spaces and the different types of vector spaces and their application in distribution and Fourier analysis; and inner product spaces and the various theorems that are applied on inner product spaces. Next, you will learn how to implement vector arithmetic by using Python; learn how to implement vector scalar multiplication with Python; and learn the concept and different types of vector norms. Finally, learn how to implement matrix-matrix multiplication, matrix-vector multiplication, and matric-scalar multiplication by using Python; and learn about matrix decomposition and the roles of Eigenvectors and Eigenvalues in machine learning.
Learners will discover how to apply advanced linear algebra and its principles to derive machine learning implementations in this 14-video course. Explore PCA, tensors, decomposition, and singular-value decomposition, as well as how to reconstruct a rectangular matrix from singular-value decomposition. Key concepts covered here include how to use Python libraries to implement principal component analysis with matrix multiplication; sparse matrix and its operations; tensors in linear algebra and arithmetic operations that can be applied; and how to implement Hadamard product on tensors by using Python. Next, learn how to calculate singular-value decomposition and reconstruct a rectangular matrix; learn the characteristics of probability applicable in machine learning; and study probability in linear algebra and its role in machine learning. You will learn types of random variables and functions used to manage random numbers in probability; examine the concept and characteristics of central limit theorem and means and learn common usage scenarios; and examine the concept of parameter estimation and Gaussian distribution. Finally, learn the characteristics of binomial distribution with real-time examples.
The graph data structure plays a significant role in modeling entities in the real world. A graph comprises nodes and edges that are used to represent entities and relationships, respectively. A graph can be used to model a social network or a professional network, roads and rail infrastructure, and telecommunication and telephone networks. Through this course, you'll explore graph data structure, graph components, and different types of graphs and their use cases. Start by discovering how to represent directed, undirected, weighted, and unweighted graphs in NetworkX. You'll then learn more about visualizing nodes and connections in graphs using Matplotlib. This course will also help you examine how to implement graph algorithms on all graph types using NetworkX. Upon completing this course, you will have the skills and knowledge to create and work with graphs using NetworkX in Python.
Probability is all about estimating the likeliness of the occurrence of specific events. Use this course to learn more about defining and measuring joint, marginal, and conditional probabilities of events. Start by exploring the chain rule of probability and then use this rule to compute conditional probabilities of multiple events. You'll also investigate the steps involved in measuring the expected value of a random variable as the weighted sum of all outcomes, with each outcome weighted by its probability. By the time you finish this course, you will be able to compute joint, marginal, and conditional probabilities and the expected value of a random variable, as well as effectively utilize the chain rule of probability.
Decision trees are an effective supervised learning technique for predicting the class or value of a target variable. Unlike other supervised learning methods, they're well-suited to classification and regression tasks. Use this course to learn how to work with decision trees and classification, distinguishing between rule-based and ML-based approaches. As you progress through the course, investigate how to work with entropy, Gini impurity, and information gain. Practice implementing both rule-based and ML-based decision trees and leveraging powerful Python visualization libraries to construct intuitive graphical representations of decision trees. Upon completion, you'll be able to create, use, and share rule-based and ML-based decision trees.
Linear Regression analysis is a simple yet powerful technique for quantifying cause and effect relationships. Use this course to get your head around linear regression as the process of fitting a straight line through a set of points. Learn how to define residuals and use the least square error. Define and measure the R-squared, implement regression analysis, visualize your data by computing a correlation matrix and plotting it in the form of a correlation heatmap, and use scatter plots as a prelude to performing the regression analysis. Finish by implementing the regression analysis first using functions that you write yourself and then using the scikit-learn python library. By the end of the course, you'll be able to identify the need for linear regression and implement it effectively.
Gradient descent is an extremely powerful numerical optimization technique widely used to find optimal values of model parameters during the model training phase of machine learning. Use this course as an introduction to gradient descent, examining how it can be used in a wide variety of optimization problems. Explore how it can be used to perform linear regression, carefully studying the matrix equations used to compute the gradients and updating the model parameters using the gradients as well as the learning rate hyperparameter. Finally, apply a form of gradient descent known as stochastic gradient descent to fit an S-curve, thus implementing logistic regression on a data set. By the end of the course, you'll be able to assuredly implement logistic regression using gradient descent.
Machine learning (ML) is widely used across all industries, meaning engineers need to be confident in using it. Pre-built libraries are available to start using ML with little knowledge. However, to get the most out of ML, it's worth taking the time to learn the math behind it. Use this course to learn how distances are measured in ML. Investigate the types of ML problems distance-based models can solve. Examine different distance measures, such as Euclidean, Manhattan, and Cosine. Learn how the distance-based ML algorithms K Nearest Neighbors (KNN) and K-means work. Lastly, use Python libraries and various metrics to compute the distance between a pair of points. Upon completion, you'll have a solid foundational knowledge of the mechanisms behind distance-based machine learning algorithms.
Knowing the math behind machine learning (ML) opens up many exciting avenues. There are vast amounts of ML algorithms you could learn. However, the distance-based algorithms K Nearest Neighbors and K-means clustering are arguably the most popular due to their simplicity and efficacy. In this course, practice building a classification model using the K Nearest Neighbors algorithm. Build upon this algorithm to perform regression. Then, perform a clustering operation by implementing the K-means algorithm. And in doing so, explore the techniques involved in converging the centroids towards their optimal positions. Upon completion, you'll be able to perform classification, regression, and clustering using the KNN and K-means algorithms.
Support vector machines (SVMs) are a popular tool for machine learning enthusiasts at any level. They offer speed and accuracy, are computationally uncomplicated, and work well with small datasets. In this course, learn how to implement a soft-margin SVM classifier using gradient descent in the Python programming language and the LIBSVM library to build a support vector classifier and regressor. For your first task, generate synthetic data that can be linearly separated by an SVM binary classifier, implement the classifier by applying gradient descent, and train and evaluate the model. Moving on, learn how to use a pre-built SVM classifier supplied by the LIBSVM module. Then use LIBSVM to train a support vector regressor, evaluate it, and use it for predictions. Upon completion, you'll know how to work with custom SVM classifiers and pre-built SVM classification and regression models.
Hypothesis testing is the bedrock of inferential statistics, allowing us to draw inferences reliably about the population as a whole. Use this course to learn more about the distinction between descriptive and inferential statistics and how the latter seek to generalize from the sample to the population as a whole. Examine the components of a typical hypothesis test, such as the null and alternative hypothesis, the test statistic, and the p-value. You'll also explore type-I and type-II errors and the use cases and conceptual underpinnings of t-tests and ANOVA. By the time you finish this course, you will be able to identify use-cases for hypothesis testing and conceptually construct the appropriate null and alternative hypotheses for such tests.
One-sample T-tests are probably the single most commonly used type of hypothesis test. Through this course, learn to manually implement the one-sample T-test to know exactly how the p-value and test statistic are calculated. You'll examine various library implementations of the one-sample T-test and apply the test on data drawn from several different distributions. This course will also help you explore the non-parametric Wilcoxon signed-rank test, which is conceptually very similar to the one-sample T-test and helps estimate the median rather than the mean of that population without making assumptions about the population distribution. Upon completion of this course, you will be able to use the one-sample T-test as well as its non-parametric equivalent to evaluate both one-sided and two-sided hypotheses about the population mean or median.
In situations where two independent samples are drawn from different populations or where paired samples are available, such as in a before-after scenario, two-sample and paired T-tests are needed, respectively. Use this course to explore how two-sample T-tests can be used to test the null hypothesis that two independent samples have drawn from populations with equal means. You'll examine type I and type II errors and the use of paired samples T-tests. By the time you finish this course, you will be able to test whether two samples - either drawn independently or explicitly linked - are drawn from populations with equal means.
Two-sample T-tests are great for comparing population means given two samples. However, if the number of samples increases beyond two, we need a much more versatile and powerful technique - analysis of variance (ANOVA). Use this course to learn more about non-parametric tests and the ANOVA analysis. In this course, you'll explore the different use cases for Mann-Whitney U-tests, the use of the non-parametric paired Wilcoxon signed-rank test, and perform pairwise T-tests and ANOVA. You'll also get a chance to try your hand at the non-parametric variant of ANOVA - Kruskal Wallis test and post hoc tests, such as Tukey's honestly significant difference test (HSD). After completing this course, you will be able to account for the effect of one or two independent categorical variables, each having an arbitrary number of levels, on a dependent variable using ANOVA.
Calculus is a branch of mathematics that deals with continuous change and with how the output of a function changes when the inputs into that function change by vanishingly small amounts. Calculus has wide-ranging applications - in optimization, machine learning, economics, and medicine. You will start this course by defining a derivative in terms of its mathematical formula and interpreting that derivative of a function at a point in two ways: as the slope of the tangent line to the function at that point or as the instantaneous rate of change of that function at that point. You will also apply these concepts to a constant function, verify that its derivative is zero, and understand the reason behind it. By the time you finish this course, you'll have a good foundation in the basics of differential calculus.
Linear functions change at a constant rate, and that in turn, makes the rate of change of a linear function a constant. This and other related insights into linear and other mathematical functions can be quantified using calculus. Through this course, you'll examine the steps involved in applying the differentiation operation to study a moving particle. You'll then understand how the partial derivative of a function that depends on multiple independent variables is computed with respect to one of those independent variables by holding all other independent variables constant. This course will also allow you to investigate how partial derivatives play a crucial role in the training phase of building a machine learning (ML) model. Upon completion of this course, you will be able to compute the partial derivative of a function that depends on multiple independent variables and better understand the training process of a machine learning model.
Integral calculus is a major branch of calculus that deals with integrating - i.e., aggregating - an infinite number of infinitesimal increments to a function. Integration is the inverse operation of differentiation and has wide-ranging applications across science, engineering, and social sciences. Begin this course by understanding how integration can be used to compute the area under a curve. You'll then explore the relationship between derivatives and integrals and discover how differentiation and integration are inverse operations. You'll wrap up the course by investigating the steps involved in computing the integral of several different types of functions and visualize these integrals using a combination of SymPy, Seaborn, and Matplotlib. By the time you finish this course, you'll be able to solve definite as well as indefinite integrals and visualize such integrals as the area under a curve in Python.
To master data science, you must learn the techniques surrounding data research. In this 10-video course, learners will discover how to apply essential data research techniques, including JMP measurement, and how to valuate data by using descriptive and inferential methods. Begin by recalling the fundamental concept of data research that can be applied on data inference. Then learners look at steps that can be implemented to draw data hypothesis conclusions. Examine values, variables, and observations that are associated with data from the perspective of quantitative and classification variables. Next, view the different scales of standard measurements with a critical comparison between generic and JMP models. Then learn about the key features of nonexperimental and experimental research approaches when using real-time scenarios. Compare differences between descriptive and inferential statistical analysis and explore the prominent usage of different types of inferential tests. Finally, look at the approaches and steps involved in the implementation of clinical data research and sales data research using real-time scenarios. The concluding exercise involves implementing data research.
This course explores EDA (exploratory data analysis) and data research techniques necessary to communicate with data management professionals involved in application, implementation, and facilitation of the data research mechanism. You will examine EDA as an important way to analyze extracted data by applying various visual and quantitative methods. In this 10-video course, learners acquire data exploration techniques to derive different data dimensions to derive value from the data. You will learn proper methodologies and principles for various data exploration techniques, analysis, decision-making, and visualizations to gain valuable insights from the data. This course covers how to practically implement data exploration by using R random number generator, Python, linear algebra, and plots. You will use EDA to build learning sets which can be utilized by various machine learning algorithms or even statistical modeling. You will learn to apply univariate visualization, and to use multivariate visualizations to identify the relationship among the variables. Finally, the course explores dimensionality reduction to apply different dimension reduction algorithms to deduce the data in a state which is useful for analytics.
This 12-video course explores implementation of statistical data research algorithms using R to generate random numbers from standard distribution, and visualizations using R to graphically represent the outcome of data research. You will learn to apply statistical algorithms like PDF (probability density function), CDF (cumulative distribution function), binomial distribution, and interval estimation for data research. Learners become able to identify the relevance of discrete versus continuous distribution in simplifying data research. This course then demonstrates how to plot visualizations by using R to graphically predict the outcomes of data research. Next, learn to use interval estimation to derive an estimate for an unknown population parameter, and learn to implement point and interval estimation by using R. Learn data integration techniques to aggregate data from different administrative sources. Finally, you will learn to use Python libraries to create histograms, scatter, and box plot; and use Python to implement missing values and outliers. The concluding exercise involves loading data in R, generating a scatter chart, and deleting points outside the limit of x vector and y vector.
Explore how different t-tests can be performed by using the SciPy library for hypothesis testing in this 10-video course, which continues your explorations of data science. This beginner-level course assumes prior experience with Python programming, along with an understanding of such terms as skewness and kurtosis and concepts from inferential statistics, such as t-tests and regression. Begin by learning how to perform three different t-tests-the one-sample t-test, the independent or two-sample t-test, and the paired t-test-on various samples of data using the SciPy library. Next, learners explore how to interpret results to accept or reject a hypothesis. The course covers, as an example, how to fit a regression model on the returns on an individual stock, and on the S&P 500 Index, by using the scikit-learn library. Finally, watch demonstrations of measuring skewness and kurtosis in a data set. The closing exercise asks you to list three different types of t-tests, identify values which are returned by t-tests, and write code to calculate the percentage returns from time series data using Pandas.
Explore essential approaches of deriving value from existing data in this 12-video course. Learn to produce meaningful information by implementing certain techniques such as data cleansing, data wrangling, and data categorization. The course goal is to teach learners how to derive appropriate data dimension, and apply data wrangling, cleansing, classification, and clustering by using Python. You will examine such useful data discovery and exploration techniques as pivoting, de-identification, analysis, and data tracing. Learn how to assess the quality of target data by determining accuracy of the data being captured or ingested; data completeness; and data reliability. Other key topics covered include data exploration tools; Knime data exploration; data transformation techniques; and data quality analysis techniques. The concluding exercise asks learners to list prominent tools for data exploration; recall some of the essential types of data transformation that can be implemented; specify essential tasks that form the building block to finding data with data; and recall essential approaches of implementing data tracing.
Data professionals working with various data management systems must be able to implement data correction by using R and have a good understanding of data and data management systems. In this 12-video course, learners explore how to apply and implement various essential data correction techniques; to follow transformation rules; and to use deductive correction techniques and predictive modeling by using critical data and analytical approaches. Learn more about data wrangling, essentially the process of transforming and mapping data into another format to ensure that data are appropriate for analytical requirements. Along the way, you will learn key terms and concepts, including how to design data dimension; dimensional data design; cleansing data, and cleansing data with Python; data operations for fact finding; and common data operations for fact-finding. Next, learn about data categorization with Python; data visualization in general; and data visualization with Python. In a concluding exercise, you create a series data set by using Python; create a data frame using the series data; and, finally, calculate the standard deviation of the data frame.
In this 9-video course, learners examine statistical and machine learning implementation methods and how to manage anomalies and improvise data for better data insights and accuracy. The course opens with a thorough look at the sources of data anomaly and comparing differences between data verification and validation. You will then learn about approaches to facilitating data decomposition and forecasting, and steps and formulas used to achieve the desired outcome. Next, recall approaches to data examination and use randomization tests, null hypothesis, and Monte Carlo. Learners will examine anomaly detection scenarios and categories of anomaly detection techniques and how to recognize prominent anomaly detection techniques. Then learn how to facilitate contextual data and collective anomaly detection by using scikit-learn. After moving on to tools, you will explore the most prominent anomaly detection tools and their key components, and recognize the essential rules of anomaly detection. The concluding exercise shows how to implement anomaly detection with scikit-learn, R, and boxplot.
Discover how to use machine learning methods and visualization tools to manage anomalies and improvise data for better data insights and accuracy. This 10-video course begins with an overview of machine learning anomaly detection techniques, by focusing on the supervised and unsupervised approaches of anomaly detection. Then learners compare the prominent anomaly detection algorithms, learning how to detect anomalies by using R, RCP, and the devtools package. Take a look at the components of general online anomaly detection systems and then explore the approaches of using time series and windowing to detect online or real-time anomalies. Examine prominent real-world use cases of anomaly detection, along with learning the steps and approaches adopted to handle the entire process. Learn how to use boxplot and scatter plot for anomaly detection. Look at the mathematical approach to anomaly detection and implementing anomaly detection using a K-means machine learning approach. Conclude your coursework with an exercise on implementing anomaly detection with visualization, cluster, and mathematical approaches.
Mathematical optimization models allow us to represent our objectives, decision variables, and constraints in mathematical terms, and solving these models gives us the optimal solution to our problems. Linear programming is an optimization model that can be used when our objective function and constraints can be represented using linear terms. Use this course to learn how decision-making can be represented using mathematical optimization models. Begin by examining how optimization problems can be formulated using objective functions, decision variables, and constraints. You'll then recognize how to find an optimal solution to a problem from amongst feasible solutions through a case study. This course will also help you investigate the pros and cons of the assumptions made by linear programming and the steps involved in solving linear programming problems graphically as well as by using the Simplex method. When you are done with this course, you will have the skills and knowledge to apply linear programming to solve optimization problems.
Integer programming is a mathematical optimization model that helps find optimal solutions to our problems. Integer programming problems find more applications than linear programming and are an important tool in a developer's toolkit. Examine how to solve optimizations problems using integer programming through this course. Start by comparing the integer programming optimization model and linear programming. You'll then move on to the LP relaxation technique and how it can be used to obtain the starting point of an integer programming solution. You'll also explore the Pulp Python library through different case studies consisting of integer programming problems. Upon completing this course, you'll be able to apply integer programming to solve optimization problems.
Bayesian models are the perfect tool for use-cases where there are multiple easily observable outcomes and hard-to-diagnose underlying causes, using a combination of graph theory and Bayesian statistics. Use this course to learn more bout stating and interpreting the Bayes theorem for conditional probabilities. Discover how to use Python to create a Bayesian network and calculate several complex conditional probabilities using a Bayesian machine learning model. You'll also examine and use naive Bayes models, which are a category of Bayesian models that assume that the explanatory variables are all independent of each other. Once you have completed this course, you will be able to identify use cases for Bayesian models and construct and effectively employ such models.
Matrix decomposition refers to the process of expressing a matrix as the product of other matrices. These factorized matrices are a lot easier to work with than the original matrix, as they usually possess specific properties desirable in the contexts of various mathematical procedures. Use this course to learn how to use matrix decomposition. Explore precisely what matrices and vectors are and how they're used. Then, study various matrix operations, such as computing the transpose and the inverse of a matrix. Moving on, identify why matrices are great for expressing linear transformations of points in a coordinate space. Work with important transformations, such as shearing, reflection, and rotation. Implement the LU, QR, and Cholesky decompositions and examine their applicability and restrictions. Upon completion, you'll know when and how to implement various matrix decompositions.
Eigenvalues, eigenvectors, and the Singular Value Decomposition (SVD) are the foundation of many important techniques, including the widely used method of Principal Components Analysis (PCA). Use this course to learn when and how to use these methods in your work. To start, investigate precisely what eigenvectors and eigenvalues are. Then, explore various examples of eigendecomposition in practice. Moving on, use eigenvalues and eigenvectors to diagonalize a matrix, noting why diagonalizing matrices is extremely efficient in computing matrix higher powers. By the end of the course, you'll be able to apply eigendecomposition and Singular Value Decomposition to diagonalize different types of matrices and efficiently compute higher powers of matrices in this manner.
First conceived in the 1940s, it wasn't until the early 2010s that artificial neurons showed their true potential as layered entities in the form of neural networks. When big data processing using distributed computing became mainstream, the computational capacity was now available to train these neural networks on huge datasets. Knowing this is one thing, but understanding how it all works is where the true potential lies. Use this course to gain an intuitive understanding of how neural networks work. Explore the mathematical operations performed by a single neuron. Recognize the potential of thousands of neurons connected together in a well-architected design. Finally, implement code to mathematically perform the operations in a single layer of neurons working on batch input. When you're finished, you'll have a solid grasp of the mechanisms behind neural networks and the math behind neurons.
Because neural networks comprise thousands of neurons and interconnections, one can assume training a neural network involves millions of computations. This is where a general-purpose optimization algorithm called gradient descent comes in. Use this course to gain an intuitive and visual understanding of how gradient descent and the gradient vector work. As you advance, examine three neural network activation functions, ReLU, sigmoid, and hyperbolic tangent functions, and two variants of the ReLU function, Leaky ReLU and ELU. In examining variants of the ReLU activation function, learn how to use them to deal with deep neural network training issues. Finally, implement a neural network from scratch using TensorFlow and basic Python. When you're done, you'll be able to illustrate the mathematical intuition behind neural networks and be prepared to tackle more complex machine learning problems.
EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Exploring mathematical statistics in its entirety-from the fundamentals to modern methods, this book introduces readers to point estimation, confidence intervals, and statistical tests.
Presenting both the conventional and less common uses of linear regression in today's cutting-edge scientific research, this book blends theory and application to equip readers with an understanding of the basic principles needed to apply regression model-building techniques in various fields of study, including engineering, management, and the health sciences.
10h 54m By Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining
Clearly balancing theory with applications, this book describes both the conventional and less common uses of linear regression in the practical context of today's mathematical and scientific research.
46m By Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining
Exploring mathematical statistics in its entirety-from the fundamentals to modern methods, this book introduces readers to point estimation, confidence intervals, and statistical tests.
Presenting an authoritative guide to statistical hypothesis testing with examples in SAS and R, this book provides an overview of the most common statistical test problems in a comprehensive way, making it easy to find and perform an appropriate statistical test.
Showing you how to use Python to delve into high school-level math topics like statistics, geometry, probability, and calculus, this book will start you with simple projects before moving on to more complex projects once you've gotten the hang of things.
Providing practicable and use case-based experience, this book discusses several years of in-depth industry research and presents vendor tools, approaches, and methodologies in discovery, visualization, and visual analytics.
Presenting a diverse blend of original illustrations and real-world examples - both classical and cutting-edge, this thought-provoking book offers multi-disciplinary perspectives and useful information about how visualizations can open your eyes to data.
Exploring mathematical statistics in its entirety-from the fundamentals to modern methods, this book introduces readers to point estimation, confidence intervals, and statistical tests.
The Math for Data Science Literacy benchmark will measure your ability to recall and relate the underlying math concepts in data science and machine learning solutions. You will be evaluated on your ability to recognize the foundational concepts of math for data science like the basics of statistics, probability, algebra, and calculus. A learner who scores high on this benchmark demonstrates that they have the basic math skills to understand and grasp the data analysis process and machine learning algorithms.
The Math for Data Science Proficiency benchmark will measure your ability to recall, relate, analyze, and apply the underlying math concepts in data science solutions and machine learning algorithms. You will be evaluated on your ability to recognize, analyze, and apply the advanced math concepts in machine learning algorithms such as statistics, probability, linear algebra and calculus, algorithm tuning techniques and optimization techniques, and the math behind complex algorithms like decision trees and recommendation systems . A learner who scores high on this benchmark demonstrates that they have the proficiency to apply advanced math concepts to develop efficient machine learning solutions.
The Statistics for Data Analysis Literacy (Beginner Level) benchmark will measure your ability to recall and relate the underlying statistics concepts for data analysis. You will be evaluated on your ability to recognize the foundational concepts of statistics, such as data types, descriptive and inferential statistics, and their applications. A learner who scores high on this benchmark demonstrates that they have the basic statistics skills to understand and grasp the data analysis process.
The Statistics for Data Analysis Competency (Intermediate Level) benchmark will measure your ability to recall and relate the underlying statistics and probability concepts for data analysis, as well as perform statistical analysis and probability calculations using Python and Pandas. You will be evaluated on your ability to recognize the concepts of statistics and probability, such as data types, descriptive and inferential statistics and their applications, probability concepts, marginal and joint probabilities, Bayes rule, and performing statistical and probability calculations with Python. A learner who scores high on this benchmark demonstrates that they have good statistics and probability skills and can work on data analysis projects with minimal supervision.
The Statistics for Data Analysis Proficiency (Advanced Level) benchmark will measure your ability to recall and relate the underlying statistics and probability concepts for data analysis and perform statistical analysis, as well as probability calculations using Python and pandas. You will be evaluated on your ability to recognize the deeper concepts of statistics and probability, such as descriptive and inferential statistics, probability concepts, probability distributions, Bayesian networks, hypothesis tests, etc. A learner who scores high on this benchmark demonstrates that they have the strong statistics and probability skills required to work independently on data analytics projects.
The Math for Data Science Literacy benchmark will measure your ability to recall and relate the underlying math concepts in data science and machine learning solutions. You will be evaluated on your ability to recognize the foundational concepts of math for data science like the basics of statistics, probability, algebra, and calculus. A learner who scores high on this benchmark demonstrates that they have the basic math skills to understand and grasp the data analysis process and machine learning algorithms.
The Math for Data Science Proficiency benchmark will measure your ability to recall, relate, analyze, and apply the underlying math concepts in data science solutions and machine learning algorithms. You will be evaluated on your ability to recognize, analyze, and apply the advanced math concepts in machine learning algorithms such as statistics, probability, linear algebra and calculus, algorithm tuning techniques and optimization techniques, and the math behind complex algorithms like decision trees and recommendation systems . A learner who scores high on this benchmark demonstrates that they have the proficiency to apply advanced math concepts to develop efficient machine learning solutions.
The Math for Data Science Literacy benchmark will measure your ability to recall and relate the underlying math concepts in data science and machine learning solutions. You will be evaluated on your ability to recognize the foundational concepts of math for data science like the basics of statistics, probability, algebra, and calculus. A learner who scores high on this benchmark demonstrates that they have the basic math skills to understand and grasp the data analysis process and machine learning algorithms.
The Math for Data Science Proficiency benchmark will measure your ability to recall, relate, analyze, and apply the underlying math concepts in data science solutions and machine learning algorithms. You will be evaluated on your ability to recognize, analyze, and apply the advanced math concepts in machine learning algorithms such as statistics, probability, linear algebra and calculus, algorithm tuning techniques and optimization techniques, and the math behind complex algorithms like decision trees and recommendation systems . A learner who scores high on this benchmark demonstrates that they have the proficiency to apply advanced math concepts to develop efficient machine learning solutions.