MapReduce Using Python

MapReduce is a programming model and processing framework for parallel and distributed data processing. While it was originally developed for use with the Hadoop framework, you can implement a simple MapReduce program using Python without the need for Hadoop. Here’s a basic example of how to perform a MapReduce operation using Python:

Suppose you have a list of words and you want to count the frequency of each word using the MapReduce model.

Map Phase:

First, you define a mapper function that takes a list of words as input and emits key-value pairs, where the key is the word, and the value is the count (which is initially set to 1).

python

def mapper(words):
    word_count = {}
    for word in words:
        word_count[word] = word_count.get(word, 0) + 1
    return word_count.items()

Shuffle and Sort (Grouping) Phase: 2. In the MapReduce model, the framework automatically groups and sorts the key-value pairs by keys. In this simple Python example, you can skip this step, but in a distributed MapReduce system, this phase is critical.

Reduce Phase: 3. Next, you define a reducer function that takes the grouped and sorted key-value pairs from the Map Phase and reduces them by summing the values for each unique key (word).

python

def reducer(mapped_items):
    word_count = {}
    for key, value in mapped_items:
        word_count[key] = word_count.get(key, 0) + value
    return word_count.items()

Driver Code: 4. Finally, you can write the driver code to execute the MapReduce process on your data.

python

# Sample input data
input_data = ["apple", "banana", "cherry", "apple", "banana", "date"]

# Map Phase
mapped_items = mapper(input_data)

# Reduce Phase
reduced_items = reducer(mapped_items)

# Output the result
for key, value in reduced_items:
    print(f"{key}: {value}")

When you run the driver code, it will count the frequency of each word in the input data and produce the following output:

makefile

apple: 2
banana: 2
cherry: 1
date: 1

Hadoop Training Demo Day 1 Video:

You can find more information about Hadoop Training in this Hadoop Docs Link

Conclusion:

Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs

Please check out our Best In Class Hadoop Training Details here – Hadoop Training

Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks

MapReduce Using Python

Hadoop Training Demo Day 1 Video:

Conclusion:

Leave a Reply Cancel reply