Data Science Track 2: Data Wrangler

In this Skillsoft Aspire track of the data science journey, the focus will be on the data wrangler role. We will explore areas such as: wrangling with Python, Mongo, and Hadoop.

Technologies: Amazon Web Services 2019, Apache Hadoop 2.9, Apache Hive 2.3.2, Apache Spark 2.3, Big Data, Data Science, Machine Learning, MongoDB 4.0, Python 3, RStudio 1.1.4, Trifacta Wrangler
In the overview, Chris will explain what will be covered in the Wrangler track of the Data Science journey as well show you how to use code assets, the lab, and browse the different assets in the track.
A key component to wrangling data is the data lake framework. In this Skillsoft Aspire course, you will discover how to implement data lakes for real-time data management. Explore data ingestion, data processing, and data life-cycle management using AWS and other open-source ecosystem products.
To become proficient at data wrangling, you need to a foundation of using various data tools and technology. In this Skillsoft Aspire course, you will explore the technology landscape and the tools used to implement data management. Discover how to use machine learning in data analytics and the capabilities of machine learning implementation in the cloud.
To become proficient at data wrangling, you need to a foundation of using various data tools and technology. In this Skillsoft Aspire course, you will explore machine learning solutions provided by AWS and Microsoft. Compare the prominent tools and frameworks that can be used to implement machine learning and deep learning.
For effective data wrangling, you need an architecture that will enable you to meet your goals. In this Skillsoft Aspire course, you will explore the concept of Serverless, Lambda architecture and process implementation using Serverless and Lambda architecture. We will also explore the various types of data architecture, data risks and the essential data discovery processes.
Python has become the preferred programming language for data science. In this Skillsoft Aspire course, you will visualize and explore data in Pandas using popular chart types like the bar graph, histogram, pie chart, and box plot. Discover how to work with time series and string data in datasets.
Python has become the preferred programming language for data science. In this Skillsoft Aspire course, you will discover how to perform advanced grouping, aggregations, and filtering operations on DataFrames. Working with masks and indexes, cleaning duplicated data, and assigning columns as categorical to perform operations is also covered.
Python has become the preferred programming language for data science. In this Skillsoft Aspire course, you will discover how to perform data transformations, data cleaning, and statistical aggregations using Pandas DataFrames.
For effective data wrangling, you need an architecture that will enable you to meet your goals. In this Skillsoft Aspire course, you will explore various types of data architecture and implementation of strategies using NoSQL, CAP theorem, and partitioning to improve performance.
A key component to wrangling data is the data lake framework. In this Skiillsoft Aspire course, you will discover how to design and implement data lakes in the cloud and on-premises using standard reference architectures and patterns that can help identify the proper data architecture.
Apache Hadoop is a collection of open-source software utilities that facilitates solving data science problems. In this Skillsoft Aspire course, you will explore how MapReduce can be used to extract the five most expensive vehicles in a dataset and then how to build an inverted index for the words appearing in a set of text files.
Apache Hadoop is a collection of open-source software utilities that facilitates solving data science problems. Hadoop enables speedy analysis of large datasets by distributing them on a cluster and processing them in parallel. In this Skillsoft Aspire course, you will explore the use of Combiners to make MapReduce applications more efficient by minimizing data transfers.
Apache Hadoop is a collection of open-source software utilities that facilitates solving data science problems. Extracting only the meaningful information from a dataset can be painstaking, especially if it is very large. In this Skillsoft Aspire course, you will examine how Hadoop's MapReduce can be used to speed up this operation.
R is a programming language that is an essential skill for data science used for statistical computing and graphics. In this Skillsoft Aspire course, you will explore the essential methods for wrangling and cleaning data with R.
MongoDB is a NoSQL database program that uses JSON-like documents with schemata and it has become a popular tool for data wrangling and data science. To carry out data wrangling tasks you need to gather, filter, modify, and query data. This course will show you how to perform MongoDB actions related to data wrangling using Python with the PyMongo library.…
Trifacta allows you to discover, wrangle & visualize complex data quickly and has become an essential tool for data science. In this Skillsoft Aspire course, you will discover the essential methods for wrangling data with Trifacta. You will learn how to standardize, format, and filter data and then how to extract and wrangle that data.
MongoDB is a NoSQL database program that uses JSON-like documents with schemata and it has become a popular tool for data wrangling and data science. To carry out data wrangling tasks you need to gather, filter, modify, and query data. In this Skillsoft Aspire course, you will learn to perform MongoDB actions related to data wrangling through the PyMongo library.
Apache Hive is one of the most popular data warehouses out in the market used for data science. In this Skillsoft Aspire course, you will explore how Hive query executions can be optimized, including techniques like bucketing datasets. Using window functions to extract meaningful insights from data is also covered.
Apache Hive is one of the most popular data warehouses out in the market used for data science.Hive allows processing of big data in parallel by means of a simple query interface. In this Skillsoft Aspire course, you will explore the ways query executions can be optimized, including the powerful technique of partitioning datasets.
Apache Hive is one of the most popular data warehouses out in the market used for data science. Hive simplifies working with large datasets in files by representing them as tables and allowing them to be queried with a simple and intuitive query language. In this Skillsoft Aspire course, you will explore working with complex data types in Hive.
Apache Hive is one of the most popular data warehouses out in the market used for data science. It simplifies working with large datasets in files by representing them as tables. This allows them to be queried with a simple and intuitive query language. In ths course Skillsoft Aspire course, you will explore how to create, load, and query Hive…
Apache Hive is one of the most popular data warehouses out in the market used for data science. It allows the processing of big data in parallel in a cluster using a simple and intuitive query language. In this Skillsoft Aspire course, you will discover the fundamental concepts of Hive.
Apache Hive is one of the most popular data warehouses out in the market used for data science. Hive allows analysis of big data by means of a simple query interface. In this Skillsoft Aspire course, you will explore the optimizations that allow Hive to handle parallel processing of data, while users can still contribute to improving query performance.
Apache Spark is an open-source cluster-computing framework used for data science and it has become the defacto big data framework. In this Skillsoft Aspire course, you will learn how to analyze a Spark DataFrame by treating it as though it were a relational database table. Discover how to create a view from a Spark DataFrame and run SQL queries against…
Apache Spark is an open-source cluster-computing framework used for data science and it has become the defacto big data framework. In this Skillsoft Aspire course, you will explore how to analyze real datasets using the DataFrame API methods. Discover how to optimize operations using shared variables and combine data from multiple DataFrames using joins.