i

Hadoop Tutorial

Apache Hive Definition

Apache Hive is an open-source data warehouse system built on top of Hadoop. It is used for querying and analyzing large datasets stored in Hadoop files. It processes structured and semi-structured data in Hadoop. Initially, we had to write complex Map-Reduce jobs, but now with the help of the Hive, we need to submit a SQL like queries. Hive is primarily aimed at users who are familiar with SQL. Hive uses a SQL like language called HiveQL (HQL). HiveQL converts SQL-like queries into MapReduce jobs automatically. The key thing to notice is that there is no need for Hive to know java. The Hive generally runs on a workstation and converts SQL query into a series of jobs to execute on a Hadoop cluster. Hive organizes data into Hive tables.  This provides a means for storing the data in HDFS in a structured way.