MapReduce Using Python
MapReduce is a programming model and processing framework for parallel and distributed data processing. While it was originally developed for use with the Hadoop framework, you can implement a simple MapReduce program using Python without the need for Hadoop. Here’s a basic example of how to perform a MapReduce operation using Python:
Suppose you have a list of words and you want to count the frequency of each word using the MapReduce model.
Map Phase:
- First, you define a
mapper
function that takes a list of words as input and emits key-value pairs, where the key is the word, and the value is the count (which is initially set to 1).
def mapper(words):
word_count = {}
for word in words:
word_count[word] = word_count.get(word, 0) + 1
return word_count.items()
Shuffle and Sort (Grouping) Phase: 2. In the MapReduce model, the framework automatically groups and sorts the key-value pairs by keys. In this simple Python example, you can skip this step, but in a distributed MapReduce system, this phase is critical.
Reduce Phase: 3. Next, you define a reducer
function that takes the grouped and sorted key-value pairs from the Map Phase and reduces them by summing the values for each unique key (word).
def reducer(mapped_items):
word_count = {}
for key, value in mapped_items:
word_count[key] = word_count.get(key, 0) + value
return word_count.items()
Driver Code: 4. Finally, you can write the driver code to execute the MapReduce process on your data.
# Sample input data
input_data = ["apple", "banana", "cherry", "apple", "banana", "date"]
# Map Phase
mapped_items = mapper(input_data)
# Reduce Phase
reduced_items = reducer(mapped_items)
# Output the result
for key, value in reduced_items:
print(f"{key}: {value}")
When you run the driver code, it will count the frequency of each word in the input data and produce the following output:
apple: 2
banana: 2
cherry: 1
date: 1
Hadoop Training Demo Day 1 Video:
Conclusion:
Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment
You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs
Please check out our Best In Class Hadoop Training Details here – Hadoop Training
Follow & Connect with us:
———————————-
For Training inquiries:
Call/Whatsapp: +91 73960 33555
Mail us at: info@unogeeks.com
Our Website ➜ https://unogeeks.com
Follow us:
Instagram: https://www.instagram.com/unogeeks
Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute
Twitter: https://twitter.com/unogeeks