i

R Programming Complete Tutorial

rhdfs: Integrate R with HDFS

This R package provides basic connectivity to the Hadoop Distributed File System. R programmers can browse, read, write, and modify files stored in HDFS. The following functions are part of this package:

  • File Manipulations :hdfs.copy, hdfs.move, hdfs.rename, hdfs.delete, hdfs.rm, hdfs.del, hdfs.chown, hdfs.put, hdfs.get

  • File Read/Write:hdfs.file, hdfs.write, hdfs.close, hdfs.flush, hdfs.read, hdfs.seek, hdfs.tell, hdfs.line.reader, hdfs.read.text.file

  • Directory:hdfs.dircreate, hdfs.mkdir

  • Utility:hdfs.ls, hdfs.list.files, hdfs.file.info, hdfs.exists

  • Initialization:hdfs.init, hdfs.defaults

Setting Up Environment:

Before installing the package, we have to set the environment for Hadoop. We can execute the following command to set Hadoop Environment.

Install all the required packages from CRAN:

Download and Install rhdfs Package:

rhdfs release versions can be obtained from github.com.  Assuming an internet connection is available, download the required package and install this from R Install Packages option:

Download Link:

Ubuntu Download Folder:

Installation from R studio: