Final Exam: Data Wrangler
- 1 Video | 30m 32s
- Includes Assessment
- Earns a Badge
Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.
WHAT YOU WILL LEARN
apply a group by transformation to aggregate with a conditional valueapply grouping and aggregation operations on a DataFrame to analyze categories of data in a datasetbuild and run the application and confirm the output using HDFS from both the command line and the web applicationchange column values by applying functionschange date formats to the ISO 8601 standardcode up a Combiner for the MapReduce application and configure the Driver to use it for a partial reduction on the Mapper nodes of the clustercompare managed and external tables in Hive and how they relate to the underlying dataconfigure and test PyMongo in a Python programconfigure the Reducer and the Driver for the inverted index applicationcreate and analyze categories of data in a dataset using WindowsCreate and configure Pandas dataFrame objectsCreate and configure pandas series objectcreate and instantiate a directed acyclic graph in Airflowcreate a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFramecreate the driver program for the MapReduce applicationdefine and run a join query involving two related tablesdefine a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueuedefine the Mapper for a MapReduce application to build an inverted index from a set of text filesdefine what a window is in the context of Spark DataFrames and when they can be useddemonstrate how to ingest data using Sqoopdescribe data ingestion approaches and compare Avro and Parquet file format benefitsdescribe the beneficial features that we can achieve using serverless and lambda architecturesdescribe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakesdescribe the different primitive and complex data types available in Hiveextract subsets of data using filteringflatten multi-dimensional data structures by chaining lateral viewshandle common errors encountered when reading CSV dataidentify and troubleshoot missing dataidentify and work with time-series dataidentify kinds of masking operations
implement a multi-stage aggregation pipelineimplement data lakes using AWSimplement deep learning using Kerasinstall MongoDB and implement data partitioning using MongoDBlist the prominent distributed data models along with their associative implementation benefitslist the various frameworks that can be used to process data from data lakesload a few rows of data into a table and query it with simple select statementsload multiple sheets from an Excel documentperform create, read, update, and delete operations on a MongoDB documentperform statistical operations on DataFramesplot pie charts, box plots, and scatter plots using Pandasrecall the prominent data pattern implementation in microservicesrecognize the capabilities of Microsoft machine learning toolsrecognize the machine learning tools provided by AWS for data analysisrecognize the read and write optimizations in MongoDBsetup and install Apache Airflowsplit columns based on a patterntest Airflow tasks using the airflow command line utilitytrim and clean a DataFrame before a view is created as a precursor to running SQL queries on ituse a regular expression to extract data into a new columnuse a Spark accumulator as a counteruse createIndex to build an index on a collectionuse Maven to create a new project for a MapReduce application and plan out the Map and Reduce phases by examining the auto prices datasetuse the alter table statement to change the definition of a Hive tableuse the find operation to select documents from a collectionuse the mongoexport tool to export data from MongoDB to JSON and CSVuse the mongoimport tool to import from JSON and CSVuse the UNION and UNION ALL operations on table data and distinguish between the twowork with data in the form of key-value pairs - map data structures in Hivework with scikit-learn to implement machine learning
IN THIS COURSE
1.Data Wrangler33sUP NEXT
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion of this course, which can be shared on any social network or business platformDigital badges are yours to keep, forever.