In the Data Infrastructure with Apache Cassandra journey, you will explore Apache Cassandra, create Cassandra Clusters, and the Cassandra Query Language. You will explore installing Cassandra and making it available with Docker, creating Cassandra clusters and inspecting them using nodetool utility, and leveraging the data center in Cassandra. You will also learn Cassandra Query Language to organize tables into keyspaces, specify replication strategy, and configurations, create tables, and insert and update data. You will also learn to work with CSV and JSON data, table partitioning and clustering key columns, and also performing aggregations on the data, and leveraging user-defined functions.
In this track of the Data Infrastructure with Apache Cassandra Skillsoft Aspire journey, the focus will be on data infrastructure in an organization, data mesh architecture, data tools, messaging platforms, and data stores.
In this course, you will look into the data mesh architecture and the process of selecting data platforms which best fulfill the needs of a large, data-driven organization. Begin by delving into two approaches to managing the data in an organization: a centralized data team and a data mesh architecture, which is a more federated approach. Explore how a data mesh allows individual domain teams in an organization to manage their own data as long as it is made available to other teams and adheres to certain standards. Next, discover the various considerations for selecting data-related tools in an organization. You will get a glimpse into Apache Kafka and RabbitMQ, two widely used messaging tools, and will see use cases where each of them excel. Finally, you will look into two use cases for data stores: one for a web or mobile app and another for a team performing data analysis. Here, you will look into the use of Apache Cassandra and the Snowflake platform.
Apache Cassandra is a decentralized, distributed, wide-column store that provides great performance at petabyte-scale data for specific types of data and operations. Cassandra is great for data that can be accessed via unique keys and where each row has potentially very different column attributes. In this course, learn how to install Cassandra and make it available for use with Docker. Next, discover how to create Cassandra clusters and inspect them using the nodetool utility. Finally, explore how to leverage a datacenter in Cassandra, correctly use the snitch, and set the snitch used to the GossipingPropertyFileSnitch. Upon completion, you'll be able to enumerate the defining attributes of Apache Cassandra and identify when to use and not use Cassandra.
Apache Cassandra provides partition tolerance via its decentralized design and allows for configuration between either consistency and availability. This is why Cassandra is said to support tunable consistency. In this course, learn how Cassandra organizes tables into keyspaces and how to specify the replication strategy and factor at the keyspace level. Next, practice configuring various read and write consistency levels and explore the trade-offs between consistency and availability. Finally, discover how to run various CQL queries to create tables, insert or update data, and query data. Upon completion, you'll be able to configure tunable consistency in Cassandra, configure different replication strategies, and create and use tables.
Apache Cassandra does not support joins, which means that data is inherently denormalized. That gives rise to the need for collection fields such as sets, maps, and lists, as well as for user-defined types that allow the table creator to encapsulate related fields. Begin this course by working with set, map, and list types. Then, focus on user-defined types and counter fields. Finally, you'll learn how to work with CSV and JSON data - including reading data from and writing data to a CSV file and displaying query results in JSON format. Upon completion, you'll be able to enumerate and contrast collection fields in Cassandra; define and use set, map, and list types; leverage user-defined types and counters; and work with JSON and CSV data and the COPY command.
Primary keys play a special role in Apache Cassandra. Not only are they used to uniquely identify a row in a table, they are also used to decide where and how data is stored in the underlying cluster. Begin by creating tables with different combinations of partitioning and clustering key columns, querying the tables, and confirming the keys were taking effect. Then, explore the exact semantics of queries on partition and clustering key columns. Finally, learn how to use the nodetool and grep utilities to view properties of partitions and to verify how rows are mapped to partitions on the basis of token ranges assigned to each partition. Upon completion, you will be able to contrast primary keys in Cassandra with those in other data technologies, differentiate between clustering and partition keys, and identify types of queries that are and are not allowed in Cassandra.
Apache Cassandra is a distributed NoSQL technology meant for large-scale data, so programmatic access to Cassandra is especially important. Cassandra supports client libraries in several major programming languages, like Java, Python, and C#. Developers use these to connect to Cassandra and to work with it from code. Begin by creating and using indexes in Cassandra. Then, define and invoke user-defined functions (UDFs) to perform aggregations. Finally, you'll create a Java Maven project with the datastax library as a dependency and connect to a Cassandra database using that library. You will create a Cassandra session, execute various operations using the datastax APIs, and confirm that these queries went through successfully. Upon completion, you will be able to create indexes on Cassandra tables, perform grouping and aggregation operations, leverage UDFs, and work programmatically with Cassandra from a Java client.