Aspire Journeys

Data Analyst to Data Scientist

  • 100 Courses | 96h 31m 12s
  • 4 Labs | 32h
  • Includes Test Prep
Likes 453 Likes 453
This Skillsoft Aspire journey will first provide a foundation of data architecture, statistics, and data analysis programming skills using Python and R which will be the first step in acquiring the knowledge to transition away from using disparate and legacy data sources. You will then learn to wrangle the data using Python and R and integrate that data with Spark and Hadoop. Next you will learn how to operationalize and scale data while considering compliance and governance. To complete the journey, you will then learn how take that data and visualize it, to inform smart business decisions.

Track 1: Data Analyst

In this track of the data science Skillsoft Aspire journey, the focus is the data analyst role with a focus on: Python, R, architecture, statistics, and Spark.

  • 26 Courses | 26h 50m 14s
  • 1 Lab | 8h

Track 2: Data Wrangler

In this track of the data science Skillsoft Aspire journey, the focus will be on the data wrangler role. We will explore areas such as: wrangling with Python, Mongo, and Hadoop.

  • 25 Courses | 24h 24m 7s
  • 1 Lab | 8h

Track 3: Data Ops

For this track of the data science Skillsoft Aspire journey, the focus will be on the Data Ops role. Here we will explore areas such as: governance, security, and harnessing volume and velocity.

  • 23 Courses | 19h 33m
  • 1 Lab | 8h

Track 4: Data Scientist

For this track of the data science Skillsoft Aspire journey, the focus will be on the Data Scientist role. Here we will explore areas such as: visualization, APIs, and ML and DL algorithms.

  • 26 Courses | 25h 43m 51s
  • 1 Lab | 8h


Data Architecture Getting Started
In this 12-video course, learners explore how to define data, its lifecycle, the importance of privacy, and SQL and NoSQL database solutions and key data management concepts as they relate to big data. First, look at the relationship between data, information, and analysis. Learn to recognize personally identifiable information (PII), protected health information (PHI), and common data privacy regulations. Then, study the data lifecycle's six phases. Compare and contrast SQL and NoSQL database solutions and look at using Visual Paradigm to create a relational database ERD (entity-relationship diagram). To implement an SQL solution, Microsoft SQL Server is deployed in the Amazon Web Services (AWS) cloud, and a NoSQL solution by deploying DynamoDB in the AWS cloud. Explore definitions of big data and governance. Learners will examine various types of data architecture, including TOGAF (The Open Group Architecture Framework) enterprise architecture. Finally, learners study data analytics and reporting, how organizations can derive value from data they have. The concluding exercise looks at implementing effective data management solutions.
13 videos | 1h has Assessment available Badge
Data Engineering Getting Started
Data engineering is the area of data science that focuses on practical applications of data collection and analysis. This 12-video course helps learners explore distributed systems, batch versus in-memory processing, NoSQL uses, and the various tools available for data management/big data and the ETL (extract, transform, and load) process. Begin with an overview of distributed systems from a data perspective. Then look at differences between batch and in-memory processing. Learn about NoSQL stores and their use, and tools available for data management. Explore ETL—what it is, the process, and the different tools available. Learn to use Talend Open Studio to showcase the ETL concept. Next, examine data modeling and creating a data model in Talend Open Studio. Explore the hierarchy of needs when working with AI and machine learning. In another tutorial, learn how to create a data partition. Then move on to data engineering and best practices, with a look at approaches to building and using data reporting tools. Conclude with an exercise designed to create a data model.
13 videos | 50m has Assessment available Badge
Python - Introduction to NumPy for Multi-dimensional Data
ThisSkillsoft Aspire course explores NumPy, a Python library used in data science and big data. NumPy provides a framework to express data in the form of arrays, and is the fundamental building block for several other Python libraries. For this course, you will need to know basics of programming in Python3, and should also have some familiarity in working with Jupyter notebooks. You will learn how to create NumPy arrays and perform basic mathematical operations on them. Next you will see how to modify, index, slice, and reshape the arrays; and examine the NumPy library's universal array functions that operate on an element-by-element basis. Conclude by learning how to iterate various options through NumPy arrays.
11 videos | 1h has Assessment available Badge
Python - Advanced Operations with NumPy Arrays
NumPy is oneof the fundamental packages for scientific computing that allows data to be represented in dimensional arrays. This course covers the array operations you can undertake such as image manipulation, fancy indexing, and broadcasting. To take this Skillsoft Aspire course, you should be comfortable with how to create, index, and slice Numpy arrays, and apply aggregate and universal functions. Among the topics, you will learn about the several options available in NumPy to split arrays. You will learn how to use NumPy to work with digital images, which are multidimensional arrays. Next, you will observe how to manipulate a color image, perform slicing operations to view sections of the image, and use a SciPy package for image manipulation. You will learn how to use masks, an array of index values, to access multiple elements of an array simultaneously, referred to as Sansi indexing. Finally, this course covers broadcasting to perform operations between mismatched arrays.
13 videos | 1h has Assessment available Badge
Python - Introduction to Pandas and DataFrames
Simplify data analysis with Pandas DataFrames. Pandas is a Python library that enables you to work with series and tabular data, including initialization, and population. For this course, learners do not need prior experience working with Pandas, but should be familiar with Python3, and Jupyter Notebooks. Topics include the following: Define your own index for a Pandas series object; load data from a CSV (comma separated values) file, to create a Pandas DataFrame; Add and remove data from your Pandas DataFrame; Analyze a portion of your DataFrame; Examine how to reshape or reorient data, and to create a pivot table. Finally, represent multidimensional data in two-dimensional DataFrames, with multi or hierarchical indexes.
14 videos | 1h has Assessment available Badge
Python - Manipulating & Analyzing Data in Pandas DataFrames
Explore advanced data manipulation and analysis with Pandas DataFrames, a Python library that shares similarities with relational databases. To take this course, prior basic experience is needed with Pandas DataFrames, data loading, and Jupyter Notebook data manipulation. You will learn to iterate data in your DataFrame. See how to export data to Excel files, JSON (JavaScript Object Notation) files, and CSV (comma separated values) files. Sort the contents of a DataFrame and manage missing data. Group data with a multi-index. Merge disparate data into a single DataFrame through join and concatenate operations. Finally, you will determine when and where to integrate data with structured queries, similar to SQL.
10 videos | 48m has Assessment available Badge