i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
In this specific section, we will go through the step by step installation process of HIVE.
Step1: First, we have to check whether java and Hadoop are correctly installed or not. We can check the installed java and Hadoop version with the below commands.
Step2: Download the latest version of Hive from apache website: https://hive.apache.org
After finishing the download, please go to the Downloads directory and check the .tar file.
Step3: Move the downloaded file to a suitable location. In my case, I will move this to the location /usr/local. Use the below command to move the file.
Step4: In this step, we will unzip the HIVE tar file. It's recommended that unzip the file from the same location of the zip file. Now go to the /user/ local folder and unzip the apache-hive-3.1.2-bin.tar.gz file.
Step 5: change the owner of the directory to the hduser and provide proper permission on it.
Step6: Add the HIVE_HOME to the “.bashrc” file to update the environment variables.
After save and exit this bashrc file, execute the below command to make the changes work in the same terminal.
Step 7: Now check the HIVE version using hive –version command
Step8: Start all the Hadoop services using the start-all.sh command. Create Hive directories within HDFS. The directory 'warehouse' is the location to store the table or data related to the hive. Set read/write permissions for that directory.
Special note: if you do not have any tmp directory in HDFS, create and provide the same permission on it.
Step9: In this section, we set the Hadoop path in hive-env.sh. I have a hive-env.sh.template file under apache-hive-3.1.2-bin/conf directory. I have copied that file and created a new file hive-env.sh.
Now open this file and add the HADOOP_HOME on it.
Step 10: Now we are going to edit hive-site.xml (if this is not available, do not worry, create a new file with the below commands) which should be available under apache-hive-3.1.2-bin/conf directory.
Compile the hive-site.xml file with the below statements.
javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/usr/local/ apache-hive-3.1.2-bin/metastore_db;create=true
JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide a database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse
hive.metastore.uris The Thrift URI for the remote metastore, used by metastore client to connect to remote metastore.
javax.jdo.option.ConnectionDriverName org.apache.derby.jdbc.EmbeddedDriver This is the Driver class name for a JDBC metastore
javax.jdo.PersistenceManagerFactoryClass org.datanucleus.api.jdo.JDOPersistenceManagerFactory class implementing the jdo persistence
|
Step11: By default, Hive uses the Derby database. Initialize the Derby database using the below command.
After successful execution, it will return the below acknowledgement.
Step 12: Now we launch hive just using the hive command.
Our hive is ready to execute queries.
Step13: We can do some queries on the hive.
show databases; create table student (id string, name string, dept string) row format delimited fields terminated by ‘\t‘ stored as textfile; show tables; |
Step 14: Browse the data file from http://localhost:9870
Step 15. Exit the hive with the exit command.
Don't miss out!