MapReduce Map
MapReduce is a programming model and processing technique for computing big data sets with a parallel, distributed algorithm on a cluster. It is associated with processing and generating large datasets that can be used in various tasks.
The process is divided into two main parts: the Map phase and the Reduce phase.
- Map Phase: The input dataset is divided into smaller sub-parts called chunks in this phase. A map function transforms these chunks into a set of intermediate key-value pairs. This phase involves filtering and sorting the data.
- The typical function signature for the Map function is:
- pythonCopy code
- map(key1, value1) -> list(key2, value2)
- Here, key1 and value1 are the input key and value, and key2 and value2 are the intermediate key and value.
- Reduce Phase: The Reduce function takes the intermediate key-value pairs produced by the Map function and reduces them to smaller values. This is where the summarization of the data occurs.
- The typical function signature for the Reduce function is:
- pythonCopy code
- reduce(key2, list(value2)) -> list(value3)
- Here, key2 is the intermediate key, value2 is the list of intermediate values, and value3 is the output value.
Together, the Map and Reduce phases allow for the efficient processing of large datasets across multiple machines in a cluster.
Hadoop Training Demo Day 1 Video:
Conclusion:
Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment
You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs
Please check out our Best In Class Hadoop Training Details here – Hadoop Training
Follow & Connect with us:
———————————-
For Training inquiries:
Call/Whatsapp: +91 73960 33555
Mail us at: info@unogeeks.com
Our Website ➜ https://unogeeks.com
Follow us:
Instagram: https://www.instagram.com/unogeeks
Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute
Twitter: https://twitter.com/unogeeks