i

Hadoop Tutorial

Read Operations in HDFS

In this section, I will explain the HDFS read operations in steps.

                                                  Fig: HDFS Read Operation

1. An HDFS client initiates read request by calling 'open()' method of FileSystem object which is of type DistributedFileSystem.

2. This object communicates to NameNode using RPC and gets metadata information such as the locations of the blocks of the file.

3. In reply to this metadata request, DataNodes addresses are returned with a copy of that block.

4. When DataNodes addresses are received, an object of type FSDataInputStreame will be returned to the client. SDataInputStream contains DFSInputStream which manages DataNode and NameNode interactions. In the 4th step of the diagram, a client invokes 'read()' method, and DFSInputStream establishes a connection with the first DataNode of the first block of the file.

5. Data is read as streams in which the client repeatedly invokes the ' read()' method. This process of read() operation continues until it reaches the end of the block.

6. After reaching the end of a block, DFSInputStream closes the connection and moves to locate the next DataNode for the next block.

7. Once a client completes with reading, it calls a close() method.