Aspire Journeys

Data Analyst to Data Scientist

  • 100 Courses | 88h 11m 16s
  • 4 Labs | 32h
Likes 411
 
This Skillsoft Aspire journey will first provide a foundation of data architecture, statistics, and data analysis programming skills using Python and R which will be the first step in acquiring the knowledge to transition away from using disparate and legacy data sources. You will then learn to wrangle the data using Python and R and integrate that data with Spark and Hadoop. Next you will learn how to operationalize and scale data while considering compliance and governance. To complete the journey, you will then learn how take that data and visualize it, to inform smart business decisions.

Track 1: Data Analyst

In this track of the data science Skillsoft Aspire journey, the focus is the data analyst role with a focus on: Python, R, architecture, statistics, and Spark.

  • 26 Courses | 24h 43m 11s
  • 1 Lab | 8h

Track 2: Data Wrangler

In this track of the data science Skillsoft Aspire journey, the focus will be on the data wrangler role. We will explore areas such as: wrangling with Python, Mongo, and Hadoop.

  • 25 Courses | 22h 18m 48s
  • 1 Lab | 8h

Track 3: Data Ops

For this track of the data science Skillsoft Aspire journey, the focus will be on the Data Ops role. Here we will explore areas such as: governance, security, and harnessing volume and velocity.

  • 23 Courses | 17h 35m 9s
  • 1 Lab | 8h

Track 4: Data Scientist

For this track of the data science Skillsoft Aspire journey, the focus will be on the Data Scientist role. Here we will explore areas such as: visualization, APIs, and ML and DL algorithms.

  • 26 Courses | 23h 34m 8s
  • 1 Lab | 8h

COURSES INCLUDED

Data Architecture Getting Started
In this 12-video course, learners explore how to define data, its lifecycle, the importance of privacy, and SQL and NoSQL database solutions and key data management concepts as they relate to big data. First, look at the relationship between data, information, and analysis. Learn to recognize personally identifiable information (PII), protected health information (PHI), and common data privacy regulations. Then, study the data lifecycle's six phases. Compare and contrast SQL and NoSQL database solutions and look at using Visual Paradigm to create a relational database ERD (entity-relationship diagram). To implement an SQL solution, Microsoft SQL Server is deployed in the Amazon Web Services (AWS) cloud, and a NoSQL solution by deploying DynamoDB in the AWS cloud. Explore definitions of big data and governance. Learners will examine various types of data architecture, including TOGAF (The Open Group Architecture Framework) enterprise architecture. Finally, learners study data analytics and reporting, how organizations can derive value from data they have. The concluding exercise looks at implementing effective data management solutions.
13 videos | 1h has Assessment available Badge
Data Engineering Getting Started
Data engineering is the area of data science that focuses on practical applications of data collection and analysis. This 12-video course helps learners explore distributed systems, batch versus in-memory processing, NoSQL uses, and the various tools available for data management/big data and the ETL (extract, transform, and load) process. Begin with an overview of distributed systems from a data perspective. Then look at differences between batch and in-memory processing. Learn about NoSQL stores and their use, and tools available for data management. Explore ETL—what it is, the process, and the different tools available. Learn to use Talend Open Studio to showcase the ETL concept. Next, examine data modeling and creating a data model in Talend Open Studio. Explore the hierarchy of needs when working with AI and machine learning. In another tutorial, learn how to create a data partition. Then move on to data engineering and best practices, with a look at approaches to building and using data reporting tools. Conclude with an exercise designed to create a data model.
13 videos | 46m has Assessment available Badge
Python - Introduction to NumPy for Multi-dimensional Data
ThisSkillsoft Aspire course explores NumPy, a Python library used in data science and big data. NumPy provides a framework to express data in the form of arrays, and is the fundamental building block for several other Python libraries. For this course, you will need to know basics of programming in Python3, and should also have some familiarity in working with Jupyter notebooks. You will learn how to create NumPy arrays and perform basic mathematical operations on them. Next you will see how to modify, index, slice, and reshape the arrays; and examine the NumPy library's universal array functions that operate on an element-by-element basis. Conclude by learning how to iterate various options through NumPy arrays.
11 videos | 59m has Assessment available Badge
Python - Advanced Operations with NumPy Arrays
NumPy is oneof the fundamental packages for scientific computing that allows data to be represented in dimensional arrays. This course covers the array operations you can undertake such as image manipulation, fancy indexing, and broadcasting. To take this Skillsoft Aspire course, you should be comfortable with how to create, index, and slice Numpy arrays, and apply aggregate and universal functions. Among the topics, you will learn about the several options available in NumPy to split arrays. You will learn how to use NumPy to work with digital images, which are multidimensional arrays. Next, you will observe how to manipulate a color image, perform slicing operations to view sections of the image, and use a SciPy package for image manipulation. You will learn how to use masks, an array of index values, to access multiple elements of an array simultaneously, referred to as Sansi indexing. Finally, this course covers broadcasting to perform operations between mismatched arrays.
13 videos | 1h has Assessment available Badge
Python - Introduction to Pandas and DataFrames
Simplify data analysis with Pandas DataFrames. Pandas is a Python library that enables you to work with series and tabular data, including initialization, and population. For this course, learners do not need prior experience working with Pandas, but should be familiar with Python3, and Jupyter Notebooks. Topics include the following: Define your own index for a Pandas series object; load data from a CSV (comma separated values) file, to create a Pandas DataFrame; Add and remove data from your Pandas DataFrame; Analyze a portion of your DataFrame; Examine how to reshape or reorient data, and to create a pivot table. Finally, represent multidimensional data in two-dimensional DataFrames, with multi or hierarchical indexes.
14 videos | 1h has Assessment available Badge
Python - Manipulating & Analyzing Data in Pandas DataFrames
Explore advanced data manipulation and analysis with Pandas DataFrames, a Python library that shares similarities with relational databases. To take this course, prior basic experience is needed with Pandas DataFrames, data loading, and Jupyter Notebook data manipulation. You will learn to iterate data in your DataFrame. See how to export data to Excel files, JSON (JavaScript Object Notation) files, and CSV (comma separated values) files. Sort the contents of a DataFrame and manage missing data. Group data with a multi-index. Merge disparate data into a single DataFrame through join and concatenate operations. Finally, you will determine when and where to integrate data with structured queries, similar to SQL.
10 videos | 44m has Assessment available Badge
R Data Structures
R is a programming language that is an essential skill for statistical computing and graphics. It is the tool of choice for data science professionals in every industry and field—not only to create reproducible high-quality analyses, but to take advantage of R's great graphic and charting capabilities. In this 11-video Skillsoft Aspire course, you will explore the fundamental data structures used in R, including working with vectors, lists, matrices, factors, and data frames. The key concepts in this course include: creating vectors in R and manipulating and performing operations on vectors in R; how to sort vectors in R; and how to use lists in R and explore example code line by line executing each line using the run current line command along the way. You will also examine creating matrices and performing matrix operations in R; creating factors and data frames in R; performing data frame operations in R; and how to create and use a data frame.
11 videos | 52m has Assessment available Badge
Importing & Exporting Data using R
An essential skill for statistical computing and graphics. The programming language R the tool of choice for data science professionals in every industry and field—both to take advantage of R's great graphic and charting capabilities and to create reproducible high-quality analyses. In this 8-video Skillsoft Aspire course, you will discover how to use R to import and export tabular data in CSV (comma-separated values), Excel, and HTML format. The key concepts covered in this course include how to read data from a CSV formatted text file and from an Excel spreadsheet; how to read tabular data from an HTML file; and how to export tabular data from R to a CSV file and to an Excel spreadsheet. In addition, learners will explore exporting tabular data from R to an HTML table; how to read data from an HTML table and export to CSV; and how to confirm that the contents of the CSV file were written correctly.
8 videos | 33m has Assessment available Badge
Data Exploration using R
The tool of choice for data science professionals in every modern industry and field, the programming language R has become an essential skill for statistical computing and graphics. It both creates reproducible high-quality analyses and takes advantage of superior graphic and charting capabilities. In this 10-video Skillsoft Aspire course, you will explore data in R by using the dplyr library, including working with tabular data, piping data, mutating data, summarizing data, combining datasets, and grouping data. Key concepts covered in this course include using the dplyr library to load data frames; selecting subsets of data by using dplyr; and how to filter tabular data using dplyr. You will also learn to perform multiple operations by using the pipe operator; how to create new columns with the mutate method; and how to summarize data using summary functions. Next, use the dplyr join functions to combine data. Then learn how to use the group by method from the dplyr library, and how to query data with various dplyr library functions.
10 videos | 40m has Assessment available Badge
R Regression Methods
The programming language has become an essential skill for statistical computing and graphics, the tool of choice for data science professionals in every industry and field. R creates reproducible high-quality analyses, and allows users to take advantage of its great graphic and charting capabilities. In this 8-video Skillsoft Aspire c