Big Data Processing Beyond Hadoop and MapReduce

  • 23m
  • Ravi Sharda
  • EMC
  • 2015

Hadoop and MapReduce (MR) have been de-facto standards for Big Data processing for a long time now, so much so that they are seen by many as synonymous with “Big Data”. With MR data processing model and Hadoop Distributed File System at its core, Hadoop is great at storing and processing large amounts of data. It allows a programmer to divide a larger problem into smaller mapper and reducer tasks that can be executed, mostly in parallel, over a network of machines. Hadoop’s runtime hides much of the gory details of distributed and parallel data processing from the programmer, such as partitioning input data, breaking down MR jobs into individual tasks, scheduling the tasks for parallel execution, co-locating processing to where the data is (to the extent possible), monitoring the progress of tasks and jobs, handling partial errors and fault-tolerance on unreliable commodity hardware, synchronizing results and tasks when necessary, and so on.

In this Book

  • Big Data Processing Beyond Hadoop and MapReduce
  • Introduction
  • Interactive Querying over Hadoop
  • Iterative Computations
  • Moving Beyond MapReduce
  • Conclusion
  • References