MapReduce Map


                          MapReduce Map


MapReduce is a programming model and processing technique for computing big data sets with a parallel, distributed algorithm on a cluster. It is associated with processing and generating large datasets that can be used in various tasks.

The process is divided into two main parts: the Map phase and the Reduce phase.

  1. Map Phase: The input dataset is divided into smaller sub-parts called chunks in this phase. A map function transforms these chunks into a set of intermediate key-value pairs. This phase involves filtering and sorting the data.
  2. The typical function signature for the Map function is:
  3. pythonCopy code
  4. map(key1, value1) -> list(key2, value2)
  5. Here, key1 and value1 are the input key and value, and key2 and value2 are the intermediate key and value.
  6. Reduce Phase: The Reduce function takes the intermediate key-value pairs produced by the Map function and reduces them to smaller values. This is where the summarization of the data occurs.
  7. The typical function signature for the Reduce function is:
  8. pythonCopy code
  9. reduce(key2, list(value2)) -> list(value3)
  10. Here, key2 is the intermediate key, value2 is the list of intermediate values, and value3 is the output value.

Together, the Map and Reduce phases allow for the efficient processing of large datasets across multiple machines in a cluster.

Hadoop Training Demo Day 1 Video:

You can find more information about Hadoop Training in this Hadoop Docs Link



Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs

Please check out our Best In Class Hadoop Training Details here – Hadoop Training

💬 Follow & Connect with us:


For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at:

Our Website ➜

Follow us:





Leave a Reply

Your email address will not be published. Required fields are marked *