i

Hadoop Tutorial

Hive Installation

In this specific section, we will go through the step by step installation process of HIVE.

Step1: First, we have to check whether java and Hadoop are correctly installed or not. We can check the installed java and Hadoop version with the below commands.

Step2: Download the latest version of Hive from apache website: https://hive.apache.org

After finishing the download, please go to the Downloads directory and check the .tar file.

Step3: Move the downloaded file to a suitable location. In my case, I will move this to the location /usr/local. Use the below command to move the file.

Step4: In this step, we will unzip the HIVE tar file. It's recommended that unzip the file from the same location of the zip file. Now go to the /user/ local folder and unzip the apache-hive-3.1.2-bin.tar.gz file. 

 Step 5: change the owner of the directory to the hduser and provide proper permission on it.

Step6:  Add the HIVE_HOME to the “.bashrc” file to update the environment variables.

After save and exit this bashrc file, execute the below command to make the changes work in the same terminal.

Step 7: Now check the HIVE version using hive –version command

Step8: Start all the Hadoop services using the start-all.sh command.  Create Hive directories within HDFS. The directory 'warehouse' is the location to store the table or data related to the hive. Set read/write permissions for that directory.

Special note: if you do not have any tmp directory in HDFS, create and provide the same permission on it. 

Step9: In this section, we set the Hadoop path in hive-env.sh. I have a hive-env.sh.template file under apache-hive-3.1.2-bin/conf directory. I have copied that file and created a new file hive-env.sh.

Now open this file and add the HADOOP_HOME on it.

Step 10: Now we are going to edit hive-site.xml (if this is not available, do not worry, create a new file with the below commands) which should be available under apache-hive-3.1.2-bin/conf directory.

Compile the hive-site.xml file with the below statements.

 

 

javax.jdo.option.ConnectionURL

jdbc:derby:;databaseName=/usr/local/ apache-hive-3.1.2-bin/metastore_db;create=true

 

JDBC connect string for a JDBC metastore.

To use SSL to encrypt/authenticate the connection, provide a database-specific SSL flag in the connection URL.

For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

 

 

 

hive.metastore.warehouse.dir

/user/hive/warehouse

location of default database for the warehouse

 

 

hive.metastore.uris

The Thrift URI for the remote metastore, used by metastore client to connect to remote metastore.

 

 

javax.jdo.option.ConnectionDriverName

org.apache.derby.jdbc.EmbeddedDriver

This is the Driver class name for a JDBC metastore

 

 

javax.jdo.PersistenceManagerFactoryClass

org.datanucleus.api.jdo.JDOPersistenceManagerFactory

class implementing the jdo persistence

 

 

Step11: By default, Hive uses the Derby database. Initialize the Derby database using the below command.

After successful execution, it will return the below acknowledgement.

Step 12: Now we launch hive just using the hive command.

Our hive is ready to execute queries.

Step13: We can do some queries on the hive.

show databases;

create table student (id string, name string, dept string) row format delimited fields terminated by ‘\t‘ stored as textfile;

show tables;

Step 14: Browse the data file from http://localhost:9870

Step 15. Exit the hive with the exit command.