i

Hadoop Tutorial

YARN

Yet Another Resource Negotiator (YARN) is the resource management layer of Hadoop.YARN's fundamental principle is to split resource management and job scheduling/monitoring function into separate daemons. In YARN, there is one global Resource Manager. An application can be either a single job or a DAG of jobs. We have two daemons inside the YARN framework, Resource Manager and Node Manager. The Resource Manager arbitrates resources among all of the system's competing applications. The job of Node Manager is to monitor the resource usage by the container and report the same to Resource Manager. The resources are like CPU, disk, network, memory, and so on. The Application Master negotiates resources with Resource Manager and works with Node Manager to execute and monitor the job.

YARN's implementation significantly expanded the potential uses of Hadoop. Hadoop's original incarnation combined the Hadoop Distributed File System (HDFS) closely with the batch-oriented MapReduce programming system and processing engine, which also served as a resource manager and task scheduler for the big data platform. As a result, only MapReduce applications could be run by Hadoop 1.0 systems, a restriction eliminated by Hadoop YARN.

YARN was informally called NextGen MapReduce or MapReduce 2 before it received its official name. But it introduced an innovative approach that decoupled the management of cluster resources and scheduling from the data processing portion of MapReduce, allowing Hadoop to accommodate diverse processing styles and a more extensive range of applications.

For example, Hadoop clusters can now run interactive querying, streaming data and real-time analytics applications on Apache Spark and other processing engines simultaneously with MapReduce batch jobs.

YARN also allows various data processing engines like interactive processing, graph processing, stream processing as well as batch processing to process the data stored in HDFS, thus making the system much more efficient. It can dynamically allocate different resources through its various components and schedule the processing of the application.  It is quite necessary to properly manage the available resources for large volume data processing so that each application can use them.