Data Infrastructures with Apache Kafka Proficiency (Advanced Level)

  • 16m
  • 16 questions
The Data Infrastructures with Apache Kafka Proficiency (Advanced Level) benchmark measures your ability to build Apache Spark applications that process streaming data published to Kafka. You will be evaluated on your skills in applying transformations based on aggregations and window operations, setting up and managing a Kafka environment using Confluent, and defining an ETL pipeline involving Kafka, Spark, and Cassandra. A learner who scores high on this benchmark demonstrates that they have the skills to work on Kafka without any supervision.

Topics covered

  • build a Spark application that reads from a Kafka topic
  • create a Spark app that links up with Cassandra and Kafka
  • describe how to use Confluent to manage Kafka
  • describe what windows are in the context of Spark streaming and define them using DataFrames
  • distinguish between Spark standalone and local deployment modes
  • download and install the Confluent platform
  • execute apps on a Spark standalone cluster
  • execute Spark commands and monitor jobs with the Spark web UI
  • manipulate streaming data and publish the output to the console
  • perform aggregations on Spark DataFrames and order their contents
  • run a job on the PySpark shell and view its details from the Spark web user interface (UI)
  • set up an environment to stream files, and build an app to process files in real-time
  • subscribe to multiple Kafka topics from a Spark application
  • transform streaming data with Spark SQL
  • use the Confluent user interface and CLI to set up and work with a Kafka topic
  • write an app that generates data to periodically send to a Kafka topic