Final Exam: Data Wrangler

Intermediate

1 video | 32s
Includes Assessment
Earns a Badge

(6)

Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

WHAT YOU WILL LEARN

Change column values by applying functions

recognize the capabilities of microsoft machine learning tools

implement deep learning using keras

perform statistical operations on dataframes

create and configure pandas series object

load multiple sheets from an excel document

change date formats to the iso 8601 standard

identify and troubleshoot missing data

recognize the machine learning tools provided by aws for data analysis

identify and work with time-series data

plot pie charts, box plots, and scatter plots using pandas

create and configure pandas dataframe objects

use a regular expression to extract data into a new column

extract subsets of data using filtering

handle common errors encountered when reading csv data

work with scikit-learn to implement machine learning

identify kinds of masking operations

build and run the application and confirm the output using hdfs from both the command line and the web application

describe the different primitive and complex data types available in hive

apply grouping and aggregation operations on a dataframe to analyze categories of data in a dataset

perform create, read, update, and delete operations on a mongodb document

work with data in the form of key-value pairs - map data structures in hive

use a spark accumulator as a counter

list the various frameworks that can be used to process data from data lakes

install mongodb and implement data partitioning using mongodb

recognize the read and write optimizations in mongodb

use createindex to build an index on a collection

create and analyze categories of data in a dataset using windows

split columns based on a pattern

create and instantiate a directed acyclic graph in airflow
load a few rows of data into a table and query it with simple select statements

use the find operation to select documents from a collection

define the mapper for a mapreduce application to build an inverted index from a set of text files

describe the data processing strategies provided by mapreduce v2, hive, pig, and yam for processing data with data lakes

configure the reducer and the driver for the inverted index application

describe the beneficial features that we can achieve using serverless and lambda architectures

create the driver program for the mapreduce application

configure and test pymongo in a python program

describe data ingestion approaches and compare avro and parquet file format benefits

test airflow tasks using the airflow command line utility

define and run a join query involving two related tables

use the alter table statement to change the definition of a hive table

use the union and union all operations on table data and distinguish between the two

apply a group by transformation to aggregate with a conditional value

use the mongoexport tool to export data from mongodb to json and csv

define what a window is in the context of spark dataframes and when they can be used

setup and install apache airflow

define a vehicle type that can be used to represent automobiles to be stored in a java priorityqueue

implement data lakes using aws

create a spark dataframe from the contents of a csv file and apply some simple transformations on the dataframe

list the prominent distributed data models along with their associative implementation benefits

use maven to create a new project for a mapreduce application and plan out the map and reduce phases by examining the auto prices dataset

code up a combiner for the mapreduce application and configure the driver to use it for a partial reduction on the mapper nodes of the cluster

recall the prominent data pattern implementation in microservices

demonstrate how to ingest data using sqoop

implement a multi-stage aggregation pipeline

flatten multi-dimensional data structures by chaining lateral views

trim and clean a dataframe before a view is created as a precursor to running sql queries on it

use the mongoimport tool to import from json and csv

compare managed and external tables in hive and how they relate to the underlying data

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Final Exam: Software Testing

(1)

Course Final Exam: Prompt Engineering for Data Science

Course Final Exam: Generative AI Introduction and Overview

(1)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Final Exam: Data Analyst

(13)

Course Advanced Operations Using Hadoop MapReduce

(7)

Course Data Science Tools

(363)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Final Exam: Data Wrangler

WHAT YOU WILL LEARN

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE