Final Exam: Data Wrangler

  • 1 Video | 35s
  • Includes Assessment
  • Earns a Badge
Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

WHAT YOU WILL LEARN

  • apply a group by transformation to aggregate with a conditional value
    apply grouping and aggregation operations on a DataFrame to analyze categories of data in a dataset
    build and run the application and confirm the output using HDFS from both the command line and the web application
    change column values by applying functions
    change date formats to the ISO 8601 standard
    code up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the cluster
    compare managed and external tables in Hive and how they relate to the underlying data
    configure and test PyMongo in a Python program
    configure the Reducer and the Driver for the inverted index application
    create and analyze categories of data in a dataset using Windows
    Create and configure Pandas dataFrame objects
    Create and configure pandas series object
    create and instantiate a directed acyclic graph in Airflow
    create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame
    create the driver program for the MapReduce application
    define and run a join query involving two related tables
    define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
    define the Mapper for a MapReduce application to build an inverted index from a set of text files
    define what a window is in the context of Spark DataFrames and when they can be used
    demonstrate how to ingest data using Sqoop
    describe data ingestion approaches and compare Avro and Parquet file format benefits
    describe the beneficial features that we can achieve using serverless and lambda architectures
    describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes
    describe the different primitive and complex data types available in Hive
    extract subsets of data using filtering
    flatten multi-dimensional data structures by chaining lateral views
    handle common errors encountered when reading CSV data
    identify and troubleshoot missing data
    identify and work with time-series data
    identify kinds of masking operations
  • implement a multi-stage aggregation pipeline
    implement data lakes using AWS
    implement deep learning using Keras
    install MongoDB and implement data partitioning using MongoDB
    list the prominent distributed data models along with their associative implementation benefits
    list the various frameworks that can be used to process data from data lakes
    load a few rows of data into a table and query it with simple select statements
    load multiple sheets from an Excel document
    perform create, read, update, and delete operations on a MongoDB document
    perform statistical operations on DataFrames
    plot pie charts, box plots, and scatter plots using Pandas
    recall the prominent data pattern implementation in microservices
    recognize the capabilities of Microsoft machine learning tools
    recognize the machine learning tools provided by AWS for data analysis
    recognize the read and write optimizations in MongoDB
    setup and install Apache Airflow
    split columns based on a pattern
    test Airflow tasks using the airflow command line utility
    trim and clean a DataFrame before a view is created as a precursor to running SQL queries on it
    use a regular expression to extract data into a new column
    use a Spark accumulator as a counter
    use createIndex to build an index on a collection
    use Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices dataset
    use the alter table statement to change the definition of a Hive table
    use the find operation to select documents from a collection
    use the mongoexport tool to export data from MongoDB to JSON and CSV
    use the mongoimport tool to import from JSON and CSV
    use the UNION and UNION ALL operations on table data and distinguish between the two
    work with data in the form of key-value pairs - map data structures in Hive
    work with scikit-learn to implement machine learning

IN THIS COURSE

  • Playable
    1. 
    Data Wrangler
    36s
    UP NEXT

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platform

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE