i

Hadoop Tutorial

Hadoop Distributed File System (HDFS)

HDFS is a distributed file system specially designed to run on commodity hardware. It is designed on the principle of storage of fewer large files rather than a significant number of small files. It is ideal for large-scale datasets. It has many parallels with current distributed file systems. Nonetheless, there are substantial differences from other distributed file systems. HDFS is highly fault-tolerant and is designed for low-cost hardware. HDFS provides access to application data with high throughput and is suitable for applications with large data sets. HDFS Data Replication helps us achieve this feature. It efficiently stores data, even when hardware fails.