Hive Map Reduce

Share

                  Hive Map Reduce

Hive is a data warehousing and SQL-like query language for Hadoop. It provides a high-level interface for querying and managing large datasets stored in Hadoop Distributed File System (HDFS). While Hive mainly uses Hive Query Language (HQL) for querying data, it can also leverage the power of MapReduce for executing complex operations. Here’s how Hive and MapReduce are related:

  1. Hive Query Execution:

    • When you run a query in Hive using HQL, Hive’s query compiler translates your SQL-like queries into a series of MapReduce jobs.
    • The MapReduce jobs generated by Hive consist of mappers and reducers that work on distributed data stored in HDFS.
  2. MapReduce Behind the Scenes:

    • Hive abstracts the complexity of writing low-level MapReduce code, allowing users to focus on SQL-like queries.
    • When you submit a query, Hive generates the corresponding MapReduce code, which is then executed on the Hadoop cluster.
  3. Custom MapReduce in Hive:

    • While Hive generates MapReduce jobs for most queries, you can also write custom MapReduce code in Java and incorporate it into Hive.
    • This allows you to implement specific processing logic that cannot be expressed using regular SQL-like queries.

Here’s a basic example of how Hive and MapReduce work together:

Suppose you have a Hive table containing sales data and you want to find the total sales amount for each product category.

sql
SELECT product_category, SUM(sales_amount) FROM sales_data GROUP BY product_category;

Behind the scenes, Hive will generate MapReduce jobs to perform the following steps:

  • Map phase: Mappers read and process the data, emitting key-value pairs where the key is the product category and the value is the sales amount.
  • Shuffle and Sort phase: Data is shuffled and sorted by product category.
  • Reduce phase: Reducers calculate the sum of sales amounts for each product category.

Hadoop Training Demo Day 1 Video:

 
You can find more information about Hadoop Training in this Hadoop Docs Link

 

Conclusion:

Unogeeks is the No.1 IT Training Institute for Hadoop Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Hadoop Training here – Hadoop Blogs

Please check out our Best In Class Hadoop Training Details here – Hadoop Training

💬 Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Follow us:

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks


Share

Leave a Reply

Your email address will not be published. Required fields are marked *