Data Science Core Concepts: Amazon Web Services 2019 intermediate

https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135527&expertiselevel=1135525 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135528&expertiselevel=1135525 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135530&expertiselevel=1135525 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135527&expertiselevel=1135526 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135529&expertiselevel=1135526 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135531&expertiselevel=1135526 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135532&expertiselevel=1135526 https://www.skillsoft.com/channel/data-science-core-concepts-c3b14ff1-5115-11e7-b825-0db3c203a5f6?technologyandversion=1135533&expertiselevel=1135526
  • 11 Courses | 8h 52m 16s
  • 12 Books | 79h 47m
  • Includes Lab
  • 1 Audiobook | 10h 29m 1s
  • 1 Book | 6h 18m
  • 1 Course | 34m 57s
  • 12 Courses | 10h 20s
  • 4 Books | 15h 4m
  • Includes Lab
  • 4 Courses | 3h 28m 37s
  • 3 Courses | 2h 4m 45s
  • 2 Courses | 1h 7m 24s
  • 2 Books | 10h 32m
  • 2 Courses | 1h 10m 47s
Rating 5.0 of 1 users Rating 5.0 of 1 users (1)
 
Explore data science, a multi-disciplinary field used to analyze large amounts of data to detect relationships and uncover meaning.

GETTING STARTED

Data Gathering

  • 2m 46s
  • 7m 27s

GETTING STARTED

Automation Design & Robotics

  • 1m 58s
  • 1m 54s

GETTING STARTED

Applied Data Analysis

  • 1m 35s
  • 4m 46s

GETTING STARTED

DevOps for Data Scientists: Data DevOps Concepts

  • 1m 31s
  • 4m 33s

GETTING STARTED

Data Research Techniques

  • 1m 40s
  • 4m 10s

GETTING STARTED

Data Lake Framework & Design Implementation

  • 2m 12s
  • 3m 16s

GETTING STARTED

Data Pipeline: Process Implementation Using Tableau & AWS

  • 1m 20s
  • 6m 4s

COURSES INCLUDED

Data Gathering
In data science, you need to gather data, extracting, parsing, and scraping data from various sources, both internal and external as a critical first part in the data science pipeline. Explore examples of practical tools for data gathering.
14 videos | 1h 7m has Assessment available Badge
Data Filtering
Once data is gathered for data science, it is often in an unstructured or raw format and must be filtered for content and validity. Explore examples of practical tools and techniques for data filtering.
11 videos | 56m has Assessment available Badge
Data Transformation
After filtering data, the next step is to transform it into a usable format. Explore examples of practical tools and techniques for data transformation.
11 videos | 42m has Assessment available Badge
Data Integration
Data integration is the last step in the data wrangling process where data is put into its useable and structured format for analysis. Explore examples of practical tools and techniques for data integration.
10 videos | 37m has Assessment available Badge
Estimates & Measures
To effectively use the software and programming tools available for data scientists, you must understand underlying concepts. Discover how to use estimates and measures in data analysis.
7 videos | 32m has Assessment available Badge
Clustering, Errors, & Validation
Machine learning is a particular area of data science that uses techniques to create models from data without being explicitly programmed. Examine clustering, errors, and validation in machine learning.
10 videos | 33m has Assessment available Badge
Data Communication & Visualization
The final step in the data science pipeline is to communicate the results or findings. Explore communication and visualization concepts needed by data scientists.
15 videos | 1h 12m has Assessment available Badge
Streaming Data Architectures: An Introduction to Streaming Data in Spark
Learn the fundamentals of streaming data with Apache Spark. During this course, you will discover the differences between batch and streaming data. Observe the types of streaming data sources. Learn about how to process streaming data, transform the stream, and materialize the results. Decouple a streaming application from the data sources with a message transport. Next, learn about techniques used in Spark 1.x to work with streaming data and how it contrasts with processing batch data; how structured streaming in Spark 2.x is able to ease the task of stream processing for the app developer; and how streaming processing works in both Spark 1.x and 2.x. Finally, learn how triggers can be set up to periodically process streaming data; and the key aspects of working with structured streaming in Spark
9 videos | 50m has Assessment available Badge
Streaming Data Architectures: Processing Streaming Data with Spark
Process streaming data with Spark, the analytic engine built on Hadoop. In this course, you will discover how to develop applications in Spark to work with streaming data and generate output. Topics include the following: Configure a streaming data source; Use Netcat and write applications to process the data stream; Learn the effects of using the Update mode on your stream processing application's output; Write a monitoring application that listens for new files added to a directory; Compare the append output with the update mode; Develop applications to limit files processed in each trigger; Use Spark's Complete mode for output; Perform aggregation operations on streaming data with the DataFrame API; Process streaming data with Spark SQL queries.
11 videos | 52m has Assessment available Badge
Data Science Tools
Explore a variety of new data science tools available today; the different uses for these tools; and the benefits and challenges in deploying them in this 12-video course. First, examine a data science platform, the nucleus of technologies used to perform data science tasks. You will then explore the analysis process to inspect, clean, transform, and model data. Next, the course surveys integrating and exploring data, coding, and building models using that data, deploying the models to production, and delivering results through applications or by generating reports. You will see how a great data science platform should be flexible and scalable, and it should combine multiple features and capabilities that effectively centralize data science efforts. You will learn the six sequential steps of a typical data science workflow, from defining the objective for the project to reporting the results. Finally, explore DevOps, resources that allow developers and IT to work together in harmony which includes people, processes, and infrastructure; and its typical functionalities including integration, testing, packaging, as well as deployment.
13 videos | 47m has Assessment available Badge
The Four Vs of Data
The four Vs (volume, variety, velocity, and veracity) of big data and data science are a popular paradigm used to extract meaning and value from massive data sets. In this course, learners discover the four Vs, their purpose and uses, and how to extract value by using the four Vs. Key concepts covered here include the four Vs, their roles in big data analytics, and the overall principle of the four Vs; and ways in which the four Vs relate to each other. Next, study variety and data structure and how they relate to the four Vs; validity and volatility and how they relate to the four Vs; and how the four Vs should be balanced in order to implement a successful big data strategy. Learners are shown the various use cases of big data analytics and the four Vs of big data, and how the four Vs can be leveraged to extract value from big data. Finally, review the four Vs of big data analytics, their differences, and how balance can be achieved.
13 videos | 39m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Automation Design & Robotics
In this 12-video course, you will examine the different uses of data science tools and the overall platform, as well as the benefits and challenges of machine learning deployment. The first tutorial explores what automation is and how it is implemented. This is followed by a look at the tasks and processes best suited for automation. This leads learners into exploring automation design, including what Display Status is, and also the Human-Computer Collaboration automation design principle. Next, you will examine the Human Intervention automation design principle; automated testing in software design and development; and also the role of task runners in software design and development. Task runners are used to automate repeatable tasks in the build process. Delve into DevOps and automated deployment in software design, development, and deployment. Finally, you will examine process automation using robotics, and in the last tutorial in the course, recognize how modern robotics and AI designs are applied. The concluding exercise involves recognizing automation and robotics design application.
13 videos | 34m has Assessment available Badge

COURSES INCLUDED

Applied Data Analysis
In this 14-video course, learners discover how to perform data analysis by using Anaconda Python R, and related analytical libraries and tools. Begin by learning how to install and configure Python with Anaconda, and how R is installed by using Anaconda. Jupyter Notebook will be launched to explore data. Next, learn about the import and export of data in Python, and how to read data from, write data to files with Python Pandas library, and import and export data in R. Learn to recognize and handle missing data in R and to use the Dplyr package to transform data in R. Then learners examine Python data analysis libraries NumPy and Pandas. Next, perform exploratory data analysis in R by using mean, median, and mode. Discover how to use the Python data analysis library Pandas to analyze data and how to use the ggplot2 library to visualize data with R. Learn about Pandas built-in data visualization tools to visualize data by using Python. The closing exercise deals with performing data analysis with R and Python.
15 videos | 1h 24m has Assessment available Badge
Math for Data Science & Machine Learning
Explore the machine learning application of key mathematical topics related to linear algebra with the Python programming language in this 13-video course. The programming demonstrated in this course requires access to Python Jupyter, and requires a Python 3 Jupyter kernel. First, you will learn to work with vectors, ordered lists of numbers, in Python, and then examine how to use Python's NumPy library when working with linear algebra. Next, you will enlist the NumPy library and the array object to create a vector. Learners will continue by learning how to use the NumPy library to create a matrix, a multidimensional array, or a list of vectors. Then examine matrix multiplication and division, and linear transformations. You will learn how to use Gaussian elimination determinants and orthogonal matrices to solve a system of linear equations. This course examines the concepts of eigenvalues, eigenvectors, and eigendecomposition, a factorization of a matrix into a canonical form. Finally, you will learn how to work with pseudo inverse in Python.
14 videos | 1h 1m has Assessment available Badge
Raw Data to Insights: Data Management & Decision Making
To master data science, it is important to turn raw data into insights. In this 12-video course, you will learn to apply and implement various essential data correction techniques, transformation rules, deductive correction techniques, and predictive modeling using critical data analytical approaches by using R. The key concepts in this course include: the capabilities and advantages of the application of data-driven decision making; loading data from databases using R; preparing data for analysis; and the concept of data correction, using the essential approaches of simple transformation rules and deductive correction, Next, examine implementing data correction using simple transformation rules and deductive correction; the various essential distributed data management frameworks used to handle big data; and the approach of implementing data analytics using machine learning. Finally, learn how to implement exploratory data analysis by using R; to implement predictive modeling by using machine learning; how to correct data with deductive correction; and how to analyze data in R and facilitate predictive modeling with machine learning.
12 videos | 56m has Assessment available Badge
Data Driven Organizations
Examine data-driven organizations, how they use data science, and the importance of prioritizing data in this 13-video course. Data-driven organizations are committed to gathering and utilizing data necessary for a business holistically to gain competitive advantage. You will explore how to create a culture within an organization by involving management and training employees. You will examine analytic maturity as a metric to measure an organization's progress. Next, learn how to analyze data quality; how it is measured in a relative manner, not an absolute manner; and how it should be measured, weighed and appropriately applied to determine the value or quality of a data set. You will learn the potential business effects of missing data and the three main reasons why data are not included in a collection: missing at random, missing due to data collection, and missing not at random. This course explores the wide range of impacts when there is duplicate data. You will examine how truncated or censored data have inconsistent results. Finally, you will explore data provenance and record-keeping.
13 videos | 1h 14m has Assessment available Badge
Data Sources: Integration from the Edge
In this 11-video course, you will examine the architecture of IoT (Internet of Things) solutions and the essential approaches of integrating data sources. Begin by examining the required elements for deploying IoT solutions and its prominent service categories. Take a look at the capabilities provided and the maturity models of IoT solutions. Explore the critical design principles that need to be implemented when building IoT solutions and the cloud architectures of IoT from the perspective of Microsoft Azure, Amazon Web Services, and GCP (Google Cloud Platform). Compare the features and capabilities provided by the MQTT (Message Queuing Telemetry Transport) and XMPP (Extensible Messaging and Presence Protocol) protocols for IoT solutions. Identify key features and applications that can be implemented by using IoT controllers; learn to recognize the concept of IoT data management and the applied lifecycle of IoT data. Examine the list of essential security techniques that can be implemented to secure IoT solutions. The concluding exercise focuses on generating data streams.
11 videos | 39m has Assessment available Badge
Data Sources: Implementing Edge Data on the Cloud
To become proficient in data science, users have to understand edge computing. This is where data is processed near the source or at the edge of the network while in a typical cloud environment, data processing happens in a centralized data storage location. In this 7-video course, learners will explore the implementation of IoT (Internet of Things) on prominent cloud platforms like AWS (Amazon Web Services) and GCP (Google Cloud Platform). Discover how to work with IoT Device Simulator and generate data streams with MQTT (Message Queuing Telemetry Transport). You will next examine the approaches and steps involved in setting up AWS IoT Greengrass, and the essential components of GCP IoT Edge. Then learn how to connect a web application to AWS IoT by using MQTT over WebSockets. The next tutorial demonstrates the essential approach of using IoT Device Simulator, then on to generating streams of data by using the MQTT messaging protocol. The concluding exercise involves creating a device type, a user, and a device by using IoT Device Simulator.
7 videos | 30m has Assessment available Badge
Data Architecture Deep Dive - Design & Implementation
This 11-video Skillsoft Aspire course explores the numerous types of data architecture that can be used when working with big data; how to implement strategies by using NoSQL (not only structured query language); CAP theorem (consistency, availability, and partition tolerance); and partitioning to improve performance. Learners examine the core activities essential for data architectures: data security, privacy, integrity, quality, regulatory compliances, and governance. You will learn different methods of partitioning, and the criteria for implementing data partitioning. Next, you will install and explore MongoDB, a cross-platform document-oriented database system, and learn to read and write optimizations in MongoDB. You will learn to identify various important components of hybrid data architecture, and adapting it to your data needs. You will learn how to implement DAG (Directed Acyclic Graph) by using the Elasticsearch search engine. You evaluate your needs to determine whether to implement batch processing or stream processing. This course also covers process implementation by using serverless and Lambda architecture. Finally, you will examine types of data risk when implementing data modeling and design.
12 videos | 35m has Assessment available Badge
Data Architecture Deep Dive - Microservices & Serverless Computing
Explore numerous types of data architecture that are effective data wrangling tools when working with big data in this 9-video Skillsoft Aspire course. Learn the strategies, design, and constraints involved in implementing data architecture. You will learn the concepts of data partitioning, CAP theorem (consistency, availability, and partition tolerance), and process implementation using serverless and Lambda data architecture. This course examines Saga, newly introduced in data management pattern catalog of microservices; API (application programming interface) composition; CQRS (Command Query Responsibility Segregation); event sourcing; and application event. This course explores the differences in traditional data architecture and serverless architecture which allows you to use client-side logic and third-party services. You will learn how to use AWS (Amazon Web Services) Lambda to implement a serverless architecture. This course then explores batch processing architecture, which processes data files by using long running batch jobs to filter actual content, real-time architecture, and machine learning at scale architecture built to serve machine learning algorithms. Finally, you will explore how to build a successful data POC (proof of concept).
10 videos | 25m has Assessment available Badge
Harnessing Data Volume & Velocity: Turning Big Data into Smart Data
In this course, you will explore the concept of smart data and its associated lifecycle and benefits and the frameworks and algorithms that can help transition big data to smart data. Begin by comparing big data and smart data from the perspective of volume, variety, velocity, and veracity. Look at smart data capabilities for machine learning and artificial intelligence. Examine how to turn big data into smart data and how to use data volumes; list applications of smart data and smart process, and recall use cases for smart data application. Then explore the lifecycle of smart data and the associated impacts and benefits. Learn steps involved in transforming big data into smart data by using k-NN (K Nearest Neighbor algorithm), and look at various smart data solution implementation frameworks. Recall how to turn smart data into business by using data sharing and algorithms and how to implement clustering on smart data. Finally, learn about integrating smart data and its impact on optimization of data strategy. The exercise concerns transforming big data into smart data.
13 videos | 38m has Assessment available Badge
Data Mining and Decision Making: Modern Data Science Lifecycle
Data mining and data science are rapidly transforming decision-making business practices. For these activities to be worthwhile, raw data needs to be transformed into insights relevant to your business's goals. In this course, you'll walk through each stage of the data mining pipeline covering all requirements for reaching a conclusive and relevant business decision. You'll examine data preparation, descriptive and predictive analytics, and predictive modeling. You'll also investigate the role of model validation and implementation in machine learning. On completion, you'll have a solid grasp of how data-driven decision-making has helped other businesses succeed and how it can help yours, too, if you employ the right methods.
12 videos | 53m has Assessment available Badge
Data Mining and Decision Making: Data Preparation & Predictive Analytics
Data preparation transforms raw data into datasets with stable structures suitable for predictive analytics. This course shows you how to produce clean datasets with valid data to ensure accurate insights for sound business decision-making. Examine the role data sources, systems, and storage play in descriptive analytics. Explore best practices used for data preparation, including data collection, validation, and cleaning. Additionally, investigate some more advanced data exploration and visualization techniques, including the use of different chart types, summary statistics, and feature engineering. Upon completing this course, you'll know how to gather, store, and analyze data to make reliable predictions and smart business decisions.
12 videos | 41m has Assessment available Badge
Data Mining and Decision Making: Data Mining for Answering Business Questions
The data mining process provides the opportunity for businesses to collect additional information and insights that are unavailable through other everyday operations of the company. Use this course to learn more about how utilizing data mining effectively may provide a competitive advantage and additional knowledge about the market and competitors. Start by examining the essential concepts in data exploration using summary statistics and visuals and discover different data mining techniques. This course will also help you develop an understanding of the complete data mining process - data gathering, cleaning, exploration, and mining. After completing this course, you'll be able to use data mining to answer in-depth questions about any business.
12 videos | 58m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

DevOps for Data Scientists: Data DevOps Concepts
To carry out DevOps for data science, you need to extend the ideas of DevOps to be compatible with the processes of data science and machine learning (ML). In this 12-video course, learners explore the concepts behind integrating data and DevOps. Begin by looking at applications of DevOps for data science and ML. Then examine topological considerations for data science and DevOps. This leads into applying the high-level organizational and cultural strategies for data science with DevOps, and taking a look at day-to-day tasks of DevOps for data science. Examine the technological risks and uncertainties when implementing DevOps for data science and scaling approaches to data science in terms of DevOps computing elements. Learn how DevOps can improve communication for data science workflows and how it can also help overcome ad hoc approaches to data science. The considerations for ETL (Extract, Transform, and Load) pipeline workload improvements through DevOps and the microservice approach to ML are also covered. The exercise involves creating a diagram of data science infrastructure.
12 videos | 44m has Assessment available Badge
DevOps for Data Scientists: Data Science DevOps
In this 16-video course, learners discover the steps involved in applying DevOps to data science, including integration, packings, deployment, monitoring, and logging. You will begin by learning how to install a Cookiecutter project for data science, then look at its structure, and discover how to modify a Cookiecutter project to train and test a model. Examine the steps in the data model lifecycle and the benefits of version control for data science. Explore the tools and approaches to continuous integration for data models, to data and model security for Data DevOps, and the approaches to automated model testing for Data DevOps. Learn about the Data DevOps considerations for data science tools and IDEs (integrated developer environment) and the approaches to monitoring data models and logging for data models. You will examine ways to measure model performance in production and look at data integration with Cookiecutter. Then learn how to implement a data integration task with both Jenkins and Travis CI (continuous integration). The concluding exercise involves implementing a Cookiecutter project.
16 videos | 1h 12m has Assessment available Badge
DevOps for Data Scientists: Deploying Data DevOps
In this course, learners will explore deploying data models into production through serialization, packaging, deployment, and rollback. You will begin by watching how to serialize models using Python and Pandas. Then the 8-video course takes a look at the tools and approaches to model packaging and deployment. Next, you will explore the concept of the blue-green deployment strategy for data DevOps, which is the strategy for upgrading running software. This leads into examining the concepts behind the Canary deployment strategy in terms of data DevOps. Canary deployments can be regarded as a phase or test rollout on updates and new features. Then take a look at versioning and approaches to rolling back models for machine learning with DevOps. Finally, you will learn about some of the considerations for deploying models to web APIs (application programming interfaces). The concluding exercise involves creating a model by using Python and Pandas, then serializing the results of the model to a file.
8 videos | 33m has Assessment available Badge
DevOps for Data Scientists: Containers for Data Science
In this 16-video course, explore the use of containers in deploying data science solutions by using Docker with R, Python, Jupyter, and Anaconda. Begin with an introduction to containers and their use for deployment and data science. Then examine approaches to infrastructure as code for data deployment, and concepts behind Ansible and Vagrant approaches to data science deployment. Explore the main features of provisioning tools used in data science. You will learn how to use Docker to build data models, then use it to perform model testing for deployment, to manage R deployments, and for a PostgreSQL deployment. Also, discover how to use Docker for persistent volumes. Next, learners look at using Jupyter Docker Stacks to get up and running with Jupyter and using the Anaconda Distribution to run a Jupyter Notebook. This leads into using Jupyter Notebooks with a Cookiecutter data science project. Then learn about using Docker Compose with PostgreSQL and Jupyter Notebook, and using a container deployment for Jupyter Notebooks with R. The concluding exercise involves deploying Jupyter.
16 videos | 58m has Assessment available Badge
SHOW MORE
FREE ACCESS

COURSES INCLUDED

Data Research Techniques
To master data science, you must learn the techniques surrounding data research. In this 10-video course, learners will discover how to apply essential data research techniques, including JMP measurement, and how to valuate data by using descriptive and inferential methods. Begin by recalling the fundamental concept of data research that can be applied on data inference. Then learners look at steps that can be implemented to draw data hypothesis conclusions. Examine values, variables, and observations that are associated with data from the perspective of quantitative and classification variables. Next, view the different scales of standard measurements with a critical comparison between generic and JMP models. Then learn about the key features of nonexperimental and experimental research approaches when using real-time scenarios. Compare differences between descriptive and inferential statistical analysis and explore the prominent usage of different types of inferential tests. Finally, look at the approaches and steps involved in the implementation of clinical data research and sales data research using real-time scenarios. The concluding exercise involves implementing data research.
11 videos | 32m has Assessment available Badge
Data Research Exploration Techniques
This course explores EDA (exploratory data analysis) and data research techniques necessary to communicate with data management professionals involved in application, implementation, and facilitation of the data research mechanism. You will examine EDA as an important way to analyze extracted data by applying various visual and quantitative methods. In this 10-video course, learners acquire data exploration techniques to derive different data dimensions to derive value from the data. You will learn proper methodologies and principles for various data exploration techniques, analysis, decision-making, and visualizations to gain valuable insights from the data. This course covers how to practically implement data exploration by using R random number generator, Python, linear algebra, and plots. You will use EDA to build learning sets which can be utilized by various machine learning algorithms or even statistical modeling. You will learn to apply univariate visualization, and to use multivariate visualizations to identify the relationship among the variables. Finally, the course explores dimensionality reduction to apply different dimension reduction algorithms to deduce the data in a state which is useful for analytics.
11 videos | 49m has Assessment available Badge
Data Research Statistical Approaches
This 12-video course explores implementation of statistical data research algorithms using R to generate random numbers from standard distribution, and visualizations using R to graphically represent the outcome of data research. You will learn to apply statistical algorithms like PDF (probability density function), CDF (cumulative distribution function), binomial distribution, and interval estimation for data research. Learners become able to identify the relevance of discrete versus continuous distribution in simplifying data research. This course then demonstrates how to plot visualizations by using R to graphically predict the outcomes of data research. Next, learn to use interval estimation to derive an estimate for an unknown population parameter, and learn to implement point and interval estimation by using R. Learn data integration techniques to aggregate data from different administrative sources. Finally, you will learn to use Python libraries to create histograms, scatter, and box plot; and use Python to implement missing values and outliers. The concluding exercise involves loading data in R, generating a scatter chart, and deleting points outside the limit of x vector and y vector.
13 videos | 42m has Assessment available Badge

COURSES INCLUDED

Data Lake Framework & Design Implementation
A key component to wrangling data is the data lake framework. In this 9-video Skillsoft Aspire course, discover how to design and implement data lakes in the cloud and on-premises by using standard reference architectures and patterns to help identify the proper data architecture. Learners begin by looking at architectural differences between data lakes and data warehouses, then identifying the features that data lakes provide as part of the enterprise architecture. Learn how to use data lakes to democratize data and look at design principles for data lakes, identifying the design considerations. Explore the architecture of Amazon Web Services (AWS) data lakes and their essential components, then look at implementing data lakes using AWS. You will examine the prominent architectural styles used when implementing data lakes on-premises and on multiple cloud platforms. Next, learners will see the various frameworks that can be used to process data from data lakes. Finally, the concluding exercise compares data lakes and the data warehouse, showing how to specify data lake design patterns, and implement data lakes by using AWS.
10 videos | 33m has Assessment available Badge
Data Lake Architectures & Data Management Principles
A key component to wrangling data is the data lake framework. In this 9-video Skillsoft Aspire course, learners discover how to implement data lakes for real-time management. Explore data ingestion, data processing, and data lifecycle management with Amazon Web Services (AWS) and other open-source ecosystem products. Begin by examining real-time big data architectures, and how to implement Lambda and Kappa architectures to manage real-time big data. View benefits of adopting Zaloni data lake reference architecture. Examine the essential approach of data ingestion and comparative benefits provided by file formats Avro and Parquet. Explore data ingestion with Sqoop, and various data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes. Learn how to derive value from data lakes and describe benefits of critical roles. Learners will explore steps involved in the data lifecycle and the significance of archival policies. Finally, learn how to implement an archival policy to transition between S3 and Glacier, depending on adopted policies. Close the course with an exercise on ingesting data and archival policy.
10 videos | 34m has Assessment available Badge

COURSES INCLUDED

Data Pipeline: Process Implementation Using Tableau & AWS
Explore the concept of data pipelines, the processes and stages involved in building them, and technologies such as Tableau and Amazon Web Services (AWS) that can be used in this 11-video course. Learners begin with an initial look at the data pipeline and its features, and then the steps involved in building one. You will go on to learn about the processes involved in building data pipelines, the different stages of a pipeline, and the various essential technologies that can be used to implement one. Next, learners explore the various types of data sources that are involved in the data pipeline transformation phases. Then you learn to define scheduled data pipelines and list all the associated components, tasks, and attempts. You will learn how to install Tableau Server and command line utilities and then build data pipelines using the Tableau command line utilities. Finally, take a look at the steps involved in building data pipelines on AWS. The closing exercise involves building data pipelines with Tableau.
11 videos | 38m has Assessment available Badge
Data Pipeline: Using Frameworks for Advanced Data Management
Discover how to implement data pipelines using Python Luigi, integrate Spark and Tableau to manage data pipelines, use Dask arrays, and build data pipeline visualization with Python in this 10-video course. Begin by learning about features of Celery and Luigi that can be used to set up data pipelines, then how to implement Python Luigi to set up data pipelines. Next, turn to working with Dask library, after listing the essential features provided by Dask from the perspective of task scheduling and big data collections. Learn about implementation of Dask arrays to manage NumPy application programming interfaces (APIs). Explore frameworks that can be used to implement data exploration and visualization in data pipelines. Integrate Spark and Tableau to manage data pipelines. Move on to streaming data visualization with Python, using Python to build visualizations for streaming data. Then learn about the data pipeline building capabilities provided by Kafka, Spark, and PySpark. The concluding exercise involves setting up Luigi to implement data pipelines, Spark and Tableau integration, and building pipelines with Python.
10 videos | 32m has Assessment available Badge

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

BOOKS INCLUDED

Book

Thinking Data Science: A Data Science Practitioner's Guide
This definitive guide to Machine Learning projects answers the problems an aspiring or experienced data scientist frequently has: Confused on what technology to use for your ML development? Should I use GOFAI, ANN/DNN or Transfer Learning? Can I rely on AutoML for model development? What if the client provides me Gig and Terabytes of data for developing analytic models? How do I handle high-frequency dynamic datasets? This book provides the practitioner with a consolidation of the entire data science process in a single "Cheat Sheet".
book Duration 3h 44m book Authors By Poornachandra Sarang

Book

Data Science
A concise introduction to the emerging field of data science, explaining its evolution, relation to machine learning, current uses, data infrastructure issues, and ethical challenges.
book Duration 3h 2m book Authors By Brendan Tierney, John D. Kelleher

Book

Data Science for Dummies, 2nd Edition
Showing you how data science can help you gain in-depth insight into your business, this practical book is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space.
book Duration 5h 27m book Authors By Lillian Pierson

Book

Think Like a Data Scientist: Tackle the Data Science Process Step-by-Step
This book presents a step-by-step approach to data science, combining analytic, programming, and business perspectives into easy-to-digest techniques and thought processes for solving real world data-centric problems.
book Duration 7h 38m book Authors By Brian Godsey

Book

The Data Science Handbook
Giving extensive coverage to computer science and software engineering since they play such a central role in the daily work of a data scientist, this comprehensive book provides a crash course in data science, combining all the necessary skills into a unified discipline.
book Duration 6h 43m book Authors By Field Cady

Book

Data Science: Concepts and Practice, Second Edition
Whether you are brand new to data science or working on your tenth project, this book will show you how to analyze data and uncover hidden patterns and relationships to aid important decisions and predictions.
book Duration 8h 44m book Authors By Bala Deshpande, Vijay Kotu

Book

Getting Started with Data Science: Making Sense of Data with Analytics
Through a powerful narrative packed with unforgettable stories, this easy-to-read book offers informative, jargon-free coverage of basic theory and technique, backed with plenty of vivid examples and hands-on practice opportunities.
book Duration 8h 28m book Authors By Murtaza Haider

Book

Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery
Presenting a coherent, technical introduction to modern statistical learning and analytics, starting from the core foundations of statistics and probability, this book includes an overview of probability and statistical distributions, basics of data manipulation and visualization, and the central components of standard statistical inferences.
book Duration 10h 39m book Authors By Walter W. Piegorsch

Book

Perspectives on Data Science for Software Engineering
Presenting the best practices of seasoned data miners in software engineering, this book offers unique insights into the wisdom of the community's leaders gathered to share hard-won lessons from the trenches.
book Duration 6h 23m book Authors By Laurie Williams, Thomas Zimmermann (eds), Tim Menzies

Book

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro
Presenting an applied and interactive approach to data mining, this book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction.
book Duration 7h 56m book Authors By Galit Shmueli, Mia L. Stephens, Nitin R. Patel, Peter C. Bruce

Book

Handbook of Computational Sciences: A Multi and Interdisciplinary Approach
The Handbook of Computational Sciences is a comprehensive collection of research chapters that brings together the latest advances and trends in computational sciences and addresses the interdisciplinary nature of computational sciences, which require expertise from multiple disciplines to solve complex problems.
book Duration 5h 54m book Authors By Ahmed A. Elngar, Krishna Kant Singh, Vigneshwar. M., Zdzislaw Polkowski

Book

Data Wrangling: Concepts, Applications and Tools
Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis.
book Duration 5h 9m book Authors By Geetika Dhand, Kavita Sheoran, M. Niranjanamurthy, Prabhjot Kaur
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R
Providing both introductory and advanced techniques, this single-source guide to R data and its preparation, offers a unified, systematic approach to acquiring, modeling, manipulating, cleaning, and maintaining data in R.
book Duration 6h 18m book Authors By Lyn R. Whitaker, Samuel E. Buttrey

BOOKS INCLUDED

Book

Learn Data Analysis with Python: Lessons in Coding
A quick and practical hands-on guide to learning and using Python in data analysis, this book includes three exercises and a case study on getting data in and out of Python code in the right format.
book Duration 46m book Authors By A.J. Henley, Dave Wolf

Book

R for Data Analysis in Easy Steps
For anyone who wants to produce graphic visualizations to gain insights from gathered data, this book will give you a sound understanding of R programming so you will be able to write your own scripts that can be executed to produce graphic visualizations for data analysis.
book Duration 2h 7m book Authors By Mike McGrath

Book

Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist
Presenting best practices for data analysis and software development in R, this comprehensive book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.
book Duration 6h 29m book Authors By Thomas Mailund

Book

Data Analysis for Scientists and Engineers
A modern, graduate-level text on data analysis techniques, this detailed resource emphasizes the principles behind various techniques so that practitioners can adapt them to their own problems, or develop new techniques when necessary.
book Duration 5h 42m book Authors By Edward L. Robinson
SHOW MORE
FREE ACCESS

BOOKS INCLUDED

Book

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake
Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues.
book Duration 3h 25m book Authors By Saurabh Gupta, Venkata Giri

Book

Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets
Demonstrating how to build and provision a technology stack to yield repeatable results, this detailed guide shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.
book Duration 7h 7m book Authors By Andreas François Vermeulen

AUDIOBOOKS INCLUDED

Audiobook

Data Science For Dummies, 2nd Edition
This audio edition is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space.
audiobook Duration 10h 29m 1s audiobook Authors By Lillian Pierson

SKILL BENCHMARKS INCLUDED

Data Analytics Literacy (Beginner Level)
The Data Analytics Literacy benchmark measures whether a learner has exposure to data analytics concepts, including what data analytics is and why it's required, the various data analytics tools and frameworks available, and the different types of data analytics. A learner who scores high on this benchmark demonstrates that they have the foundational knowledge to start working on data analytics projects with training and supervision.
18m    |   18 questions

YOU MIGHT ALSO LIKE

Channel Statistics
Rating 5.0 of 1 users Rating 5.0 of 1 users (1)
Channel Big Data
Rating 4.0 of 1 users Rating 4.0 of 1 users (1)