i

R Programming Complete Tutorial

plyrmr: Data Manipulation with MapReduce job

This R package helps the R user to conduct common data manipulation operations on large data sets stored on Hadoop, as seen in prominent packages such as plyr and reshape2. Like rmr, it relies on Hadoop MapReduce to perform its tasks, but it provides a familiar plyr like interface while hiding many of the MapReduce details. plyrmr provides:

Setting Up Environment:

Before installing the package, we have to set the environment for Hadoop. We can execute the following command to set the Hadoop Environment.

Install plyrmr Package:

plyrmr release versions can be obtained from github.com.  Assuming an internet connection is available, download the required package and install this from R Install Packages option: