Final Exam: Data Analyst

Beginner

1 video | 32s
Includes Assessment
Earns a Badge

(13)

Final Exam: Data Analyst will test your knowledge and application of the topics presented throughout the Data Analyst track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

WHAT YOU WILL LEARN

List the six phases of the data lifecycle

use the ggplot2 library to visualize data using r

create vectors in r

crawl data stored in a dynamodb table

compare and contrast sql and nosql database solutions

standardize a distribution to express its values as z-scores and use pandas to generate a correlation and covariance matrix for your dataset

specify the configurations of the mapreduce applications in the driver program and the project's pom.xml file

explain the concept of hierarchical index or multi-index and why can be useful

use the numpy library to manipulate arrays and the pandas library to load and analyze a dataset

delete a google cloud dataproc cluster and all of its associated resources

configure and view permissions for individual files and directories using the getfacl and chmod commands

recall how apache zookeeper enables the hdfs namenode and yarn resourcemanager to run in high-availability mode

load data into a redshift cluster from s3 buckets

describe the options available when iterating over 1-dimensional and multi-dimensional arrays

run the application and examine the outputs generated to get the word frequencies in the input text document

use the get and getmerge functions to retrieve one or multiple files from hdfs

export the contents of a dataframe into files of various formats

using the independent t-test and with a related sample using a paired t-test using the scipy library

create data frames in r

create and configure simple graphs with lines and markers using the matplotlib data visualization library

configure hdfs using the hdfs-site.xml file and identify the properties which can be set from it

work with the yarn cluster manager and hdfs namenode web applications that come packaged with hadoop

install pandas and create a pandas series

recognize and deal with missing data in r

export the contents of a dataframe into files of various formats

use numpy to compute the correlation and covariance of two distributions and visualize their relationship with scatterplots

recognize the challenges involved in processing big data and the options available to address them such as vertical and horizontal scaling

create and configure a hadoop cluster on the google cloud platform using its cloud dataproc service

define the inter-quartile range of a dataset and enumerate its properties

draw the shape of a gaussian distribution and enumerate its defining properties
execute the application and verify that the filtering has worked correctly; examine the job and the output files using the yarn cluster manager and hdfs namenode web uis

set up a jdbc connection on glue to the redshift cluster

import and export data in r

deploy dynamodb in the amazon web services cloud

using the mutate method

create matrices in r

identify different tools available for data management

describe the etl process and different tools available

initialize a spark dataframe from the contents of an rdd

configure a jdbc connection on glue to the redshift cluster

describe nosql stores and how they are used

create and load data into an rdd

read data from an excel spreadsheet

run etl scripts using glue

use fancy indexing with arrays using an index mask

write a simple bash script

identify the various gcp services used by dataproc when provisioning a cluster

read data from files and write data to files using the python pandas library

define linear regression

edit individual cells and entire rows and columns in a pandas dataframe

define the mean of a dataset and enumerate its properties

retrieve specific parts of an array using row and column indices

recall the steps involved in building a mapreduce application and the specific workings of the map phase in processing each row of data in the input file

use numpy to compute statistics such as the mean and median on your data

describe the concept of hierarchical index or multi-index and why can be useful

define the contents of a dataframe using the sqlcontext

describe and apply the different techniques involved in handling datasets where some information is missing

use the dplyr library to load data frames

build and run the application and confirm the output using hdfs from both the command line and the web application

transfer files from your local file system to hdfs using the copyfromlocal command

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Course Final Exam: Scrum Master

(29)

Course Final Exam: The Snowflake Data Platform

(14)

Course Final Exam: Data Primer

(9)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Python - Using Pandas to Work with Series & DataFrames

(81)

Book Data Analyst: Careers in Data Analysis

Course Data Literacy for Business Professionals

(3271)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Final Exam: Data Analyst

WHAT YOU WILL LEARN

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE