Big Data Literacy (Beginner Level)

  • 22m
  • 22 questions
The Big Data Literacy benchmark measures whether a learner has exposure to big data concepts like what big data is, various sources of big data, formats, applications, and use cases of big data analytics, distributed architecture for handling big data, various tools/technologies, and frameworks for carrying out big data analytics. A learner who scores high on this benchmark demonstrates that they have good knowledge to start learning and working on big data technologies with supervision and training.

Topics covered

  • compare and contrast parallel and distributed computing systems
  • define a non-relational database and describe how it does not use the traditional schema of rows and columns as found in traditional database systems
  • define the role of the data processing layer and specify how information captured in the previous layer is processed
  • define the role of the data storage layer using HDFS as an example of commonly used primary data storage
  • describe batch processing, its use cases, and common reasons for using it
  • describe how the NoSQL approach facilitates the horizontal distribution of large, structured, and unstructured data and specify when to use NoSQL and SQL databases
  • describe how to add structure to raw data and name big data tools that aid this process
  • describe in-memory storage systems and their use cases and advantages using examples
  • describe the challenges in the current data analytics models and system designs, such as scalability, consistency, reliability, efficiency, and maintainability
  • describe the concept of columnar databases, which store data in a column-wise format
  • describe the Hadoop system and name its main features, benefits, and use cases
  • describe the main advantages of big data analytics, including cost reduction and better decision-making
  • describe the process of deciphering correlations, market trends, patterns, and customer behavior using big data
  • describe the subcomponents of Hadoop, such as MapReduce and HDFS
  • list and describe five main challenges when dealing with big data
  • list top domains that are exploring and utilizing big data technologies, including process automation, security, and credit scoring
  • name and describe the features of Hadoop HDFS and identify common in-memory storage systems including Kudu, Elasticsearch, and CockroachDB
  • outline how HBase architecture works and compare column and row-wise storage of data
  • outline how stream processing enables quick decision-making by creating actionable real-time insights
  • outline the main pillars and components of big data architecture
  • recognize the role of NoSQL databases in horizontal distribution of large, structured, and unstructured data
  • specify why unstructured data comes from variable sources and describe how it moves from its origin to storage and gets further analyzed and visualized