MapReduce Using Python

Share

        MapReduce Using Python

MapReduce is a programming model and processing framework for parallel and distributed data processing. While it was originally developed for use with the Hadoop framework, you can implement a simple MapReduce program using Python without the need for Hadoop. Here’s a basic example of how to perform a MapReduce operation using Python:

Suppose you have a list of words and you want to count the frequency of each word using the MapReduce model.

Map Phase:

  1. First, you define a mapper function that takes a list of words as input and emits key-value pairs, where the key is the word, and the value is the count (which is initially set to 1).
python
def mapper(words): word_count = {} for word in words: word_count[word] = word_count.get(word, 0) + 1 return word_count.items()

Shuffle and Sort (Grouping) Phase: 2. In the MapReduce model, the framework automatically groups and sorts the key-value pairs by keys. In this simple Python example, you can skip this step, but in a distributed MapReduce system, this phase is critical.

Reduce Phase: 3. Next, you define a reducer function that takes the grouped and sorted key-value pairs from the Map Phase and reduces them by summing the values for each unique key (word).

python
def reducer(mapped_items): word_count = {} for key, value in mapped_items: word_count[key] = word_count.get(key, 0) + value return word_count.items()

Driver Code: 4. Finally, you can write the driver code to execute the MapReduce process on your data.

python
# Sample input data input_data = ["apple", "banana", "cherry", "apple", "banana", "date"] # Map Phase mapped_items = mapper(input_data) # Reduce Phase reduced_items = reducer(mapped_items) # Output the result for key, value in reduced_items: print(f"{key}: {value}")

When you run the driver code, it will count the frequency of each word in the input data and produce the following output:

makefile
apple: 2 banana: 2 cherry: 1 date: 1

Hadoop Training Demo Day 1 Video:

 
You can find more information about Hadoop Training in this Hadoop Docs Link

 

Conclusion:

Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs

Please check out our Best In Class Hadoop Training Details here – Hadoop Training

💬 Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Follow us:

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks


Share

Leave a Reply

Your email address will not be published. Required fields are marked *