Map Reduce Hadoop Python

Share

              Map Reduce Hadoop Python

 

MapReduce is a programming model that allows for the distributed processing of large data sets across clusters of computers. Hadoop is a popular framework that enables this kind of processing and can work with Python.

Here’s a high-level guide for writing and executing a MapReduce job using Python and Hadoop:

  1. Write the Mapper and Reducer Code in Python

Mapper.py

pythonCopy code

#!/usr/bin/env python

import sys

# Read input from standard input

for line in sys. Stdin:

    Line = line.strip()

    words = line.split()

    for a word in words:

        print(f'{word}\t1′)

Reducer.py

pythonCopy code

#!/usr/bin/env python

import sys

current_word = None

current_count = 0

word = None

# Read input from standard input

for line in sys. Stdin:

    Line = line.strip()

    word, count = line.split(‘\t,’ 1)

    count = int(count)

    if current_word == word:

        current_count += count

    else:

        if current_word:

            print(f'{current_word}\t{current_count}’)

        current_word = word

        current_count = count

if current_word == word:

    print(f'{current_word}\t{current_count}’)

  1. Set Permissions for Executing

bashCopy code

chmod +x Mapper.py

chmod +x Reducer.py

  1. Run the MapReduce Job

You can use the Hadoop streaming utility to run your MapReduce job:

bashCopy code

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-* \

-files Mapper.py,Reducer.py \

-mapper Mapper.py \

-reducer Reducer.py \

-input input_directory \

-output output_directory

Replace the input_directory and output_directory with the correct paths to your input and output directories.

Please note that the code above is intended to serve as a general guide and might need adjustments depending on your specific environment and data.

Hadoop Training Demo Day 1 Video:

 
You can find more information about Hadoop Training in this Hadoop Docs Link

 

Conclusion:

Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs

Please check out our Best In Class Hadoop Training Details here – Hadoop Training

💬 Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Follow us:

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks


Share

Leave a Reply

Your email address will not be published. Required fields are marked *