i

Hadoop Tutorial

Why Apache Hive?

The hive was developed by Facebook and also one of the technologies that are being used to address the requirements on Facebook. They had faced a lot of challenges before the implementation of the Hive. Problems like the size of the data being generated increased or exploded, making it very difficult to handle them. The traditional Relational database could not handle the pressure. As a consequence, Facebook was looking out for better options and to overcome this problem; Facebook initially tried using the MapReduce framework. But it has complexity in programming and mandatory java knowledge making it an impractical approach. After a long effort, Apache Hive allowed them to overcome the challenges they were facing. They are now performing the following jobs using Hive:

Apache Hive tables are directly specified in the HDFS

1. Schema flexibility and evolution

2. Tables can be portioned and bucketed

3. Apache Hive saves from writing complex Hadoop

4. JDBC/ODBC drivers are available

MapReduce jobs for ad-hoc requirements. Hence, the hive provides analysis, summarization, and query of data. Hive is scalable and very fast. Since Apache Hive is a SQL like a language, it becomes straightforward for the SQL programmers to learn and implement Hive Queries in the data processing.

Hive eliminates MapReduce's complexity by offering an interface that allows the user to send SQL queries. So, now, business analysts can work with Big Data using Hive and generate meaningful insights. It also provides file access to various data stores like HDFS and HBase. The most crucial feature of Apache Hive is that to learn Hive; we don't have to learn Java.