Map Reduce Hadoop Python
MapReduce is a programming model that allows for the distributed processing of large data sets across clusters of computers. Hadoop is a popular framework that enables this kind of processing and can work with Python.
Here’s a high-level guide for writing and executing a MapReduce job using Python and Hadoop:
- Write the Mapper and Reducer Code in Python
Mapper.py
pythonCopy code
#!/usr/bin/env python
import sys
# Read input from standard input
for line in sys. Stdin:
Line = line.strip()
words = line.split()
for a word in words:
print(f'{word}\t1′)
Reducer.py
pythonCopy code
#!/usr/bin/env python
import sys
current_word = None
current_count = 0
word = None
# Read input from standard input
for line in sys. Stdin:
Line = line.strip()
word, count = line.split(‘\t,’ 1)
count = int(count)
if current_word == word:
current_count += count
else:
if current_word:
print(f'{current_word}\t{current_count}’)
current_word = word
current_count = count
if current_word == word:
print(f'{current_word}\t{current_count}’)
- Set Permissions for Executing
bashCopy code
chmod +x Mapper.py
chmod +x Reducer.py
- Run the MapReduce Job
You can use the Hadoop streaming utility to run your MapReduce job:
bashCopy code
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-* \
-files Mapper.py,Reducer.py \
-mapper Mapper.py \
-reducer Reducer.py \
-input input_directory \
-output output_directory
Replace the input_directory and output_directory with the correct paths to your input and output directories.
Please note that the code above is intended to serve as a general guide and might need adjustments depending on your specific environment and data.
Hadoop Training Demo Day 1 Video:
Conclusion:
Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment
You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs
Please check out our Best In Class Hadoop Training Details here – Hadoop Training
Follow & Connect with us:
———————————-
For Training inquiries:
Call/Whatsapp: +91 73960 33555
Mail us at: info@unogeeks.com
Our Website ➜ https://unogeeks.com
Follow us:
Instagram: https://www.instagram.com/unogeeks
Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute
Twitter: https://twitter.com/unogeeks