MapReduce Examples with Python

In this part of the document, we will work with the movie rating dataset. We will use Python libraries to execute a map-reduce job. Let's check the data set (u.data) first. It has 1725 observations of 4 columns (variables). The first one is user_id, 2nd one is movie_id, 3rd one is rating, and the final one is time.

Now using the Python code, we are going to count the number of movies in each rating. It will complete a map-reduce job inside the Hadoop environment. There will be two parts, Map and reduce. In the below section, I will explain the python code for this map-reduce job.

Code for mapping phase:

The Key is the rating, and we are taking the value as 1. So, for each rating, it will generate a pair (rating, 1).

Code for reduce phase:

In the reduce phase, the output will be the aggregated of 1’s for each rating. So, it will add all the 1’s for rating 1 and then for rating 2 and so on.

This is a small chunk of data to explain the example.

Hadoop Tutorial

Hadoop Starter

Hadoop Explained

Hadoop Architecture

Hadoop Installation

Hadoop Distributed File System

YARN

Hadoop MapReduce

Apache Hive

Apache Pig

Hadoop Eco Components

MapReduce Examples with Python

Popular Exams

Company

Resources