Data Science Track 3: Data Ops

For this track of the data science journey, the focus will be on the Data Ops role. Here we will explore areas such as: governance, security, and harnessing volume and velocity.

Technologies: Apache Airflow 1.10, Big Data, Cloud Architecture, Data Architecture, Data BCP, Data Pipeline, Data Science, Data Visualization
In the overview, Chris will explain what will be covered in the DataOps track of the Data Science journey as well show you how to use code assets, the lab, and browse the different assets in the track.
Discover how to implement data pipelines using Python Luigi, integrate Spark, and Tableau to manage data pipelines, use Dask arrays, and build data pipeline visualization with Python.
Explore the concept of data pipelines, the processes and stages involved in building them, and the technologies like Tableau and AWS that can be used.
Discover how to implement cloud architecture for large scale applications, serverless computing, adequate storage, and analytical platforms using DevOps tools and cloud resources.
Using a hands-on lab approach, explore how to use Amazon Redshift to set up and configure a data warehouse on the cloud. Discover how to interact with the Redshift service using both the console and the AWS CLI.
Explore a theoretical foundation on the need for and the characteristics of scalable data architectures. Using data warehouses to store, process, and analyze big data is also covered.
Examine the security risks related to modern data capture and processing methods such as streaming analytics, the techniques and tools employed to mitigate security risks, and best practices related to securing big data.
Explore the loading of data from an external source such as Amazon S3 into a Redshift cluster, as well as the configuration of snapshots and the resizing of clusters. Discover how to use Amazon QuickSight to visualize data.
Explore the concept of smart data and the associated life cycle and benefits afforded by smart data. Frameworks and algorithms that can help transition big data to smart data are also covered.
Explore the role played by dashboards in data exploration and deep analytics. Examine the essential patterns of dashboard design and how to implement appropriate dashboards using Kibana, Tableau, and Qlikview.
Explore the concept of dashboards and the best practices that can be adopted to build effective dashboards. How to implement dashboards and visualizations using PowerBI and ELK and the concepts of leaderboard and scorecards is also covered.
Explore the concepts of transactions, transaction management policies, and rollbacks. Discover how to implement transaction management and rollbacks using SQL Server.
Explore the differences between transaction management using NoSQL and MongoDB. Discover how to implement of change data capture in databases and NoSQL.
Explore data pipelines and methods of processing them with and without ETL. Creating data pipelines using Apache Airflow is also covered.
To become proficient in data science, you have to understand edge computing. This is where data is processed near the source or at the edge of the network while in a typical cloud environment, data processing happens in a centralized data storage location. In this course you will explore the implementation of IoT on prominent cloud platforms like AWS and…
To become proficient in data science, you have to understand edge computing. This is where data is processed near the source or at the edge of the network while in a typical cloud environment, data processing happens in a centralized data storage location. In this course you will exam the architecture of IoT solutions and the essential approaches of integrating…
With the popularatity of data science, there has been an increase in the volume of tools available. In this course you will discover the different uses of data science tools and the benefits and challenges in deploying them.
As organizations become more data science aware, it's critical to understand the role of governance in big data implementation. In this course you will examine governance and its relationship with big data, and how to plan and design a big data governance strategy.
As organizations start to master data science and the volume of data collection incraeses, data sensitivity and security breaches are common in news media reports. In this course, you will explore how a structured data access governance framework results in reducing the likelihood of data security breaches.
As organizations start to master data science and the volume of data collection incraeses, datasensitivity and security breaches are common in news media reports. Before data can be sufficiently protected, its sensitivity must be known. In thius course you will exxplore how data classification determines which security measure applies to varying classes of data.
Spark is an analytics engine built on Hadoop that can be used for working with big data, data science and processing batch and streaming data. In this course you will explore the fundamentals of working with streams using Spark.
As organizations learn to master data science, it's crucial that organizations remain compliant with their big data implementations. In this course you will examine compliance and its relationship with big data, as well as popular resources for developing compliance strategies.
Spark is an analytics engine built on Hadoop that can be used for working with big data, datascience and processing batch and streaming data. In this course you will discover how to develop applications in Spark to work with streaming data and explore the different ways to process streams and generate an output.