i

R Programming Complete Tutorial

rmr2: MapReduce job in R

The package that allows R developer to perform statistical analysis in R via Hadoop MapReduce functionality on a Hadoop cluster. One of the essential prerequisites of installing this package is rJava. We can install rmr2 if rJava is already in the R. rmr2 package is an excellent way to perform data analysis in the Hadoop ecosystem. Its advantages are the flexibility and integration within an R environment. The drawbacks are the need for a deep understanding of the MapReduce paradigm and a large amount of time needed to write code. I think that it's beneficial to customize the algorithms only after having used some current ones first. For instance, the first stage of the analysis may consist of aggregating data through Hive and perform Machine Learning through Mahout. Afterward, rmr2 allows modifying the algorithms in order to improve the performances and fit better the problems. The goals of rmr2 package are to provide map-reduce programmers the easiest, most productive, most elegant way to write map-reduce jobs.

Setting Up Environment:

Before installing the package, we have to set the environment for Hadoop and Java. We can execute the following command to set Hadoop and Java Environment.

Install rmr2 Package:

rmr2 release versions can be obtained from github.com.  Assuming an internet connection is available, download the required package and install this from R Install Packages option: