i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
Big data is the area of study that finds the ways to analyze, systematically extract information from data sets that are huge or complex to be dealt with by traditional data-processing application software.
Suppose we have the data of Indian individuals, and we want to analyze the data with some particular information like state, age, income, etc. This kind of study will come under big data analysis.
Big data is produced in amounts of multi terabytes. It is rapidly changing and occurs in various forms that are hard to handle and process using RDBMS or other traditional techniques. Big Data solutions provide the instruments, methodologies, and methods used to capture, store, search, and evaluate information in seconds to discover innovation and competitive gain interactions and perspectives. Eighty per cent of the data generated today is unstructured, and our traditional technologies cannot handle it. The quantity of information produced earlier was not that large. We kept archiving the report as historical data analysis was needed. Today, however, information generation is in petabytes that data cannot be re-archived and retrieved when required as data science.
To understand what Big Data is, first, we need to understand what data is. The characters, quantities, or symbols on which a computer performs operations that can be stored and transmitted as electrical signals and recorded on magnetic, optical, or mechanical recording media, is called data. The buzzword, Big Data, is also information, but the size is enormous. The word Big Data is used to define an information collection that is enormous but grows exponentially over time. In short, such information is huge and complex that it cannot be stored or processed effectively by any of the traditional data management instruments.
Big data is a collection of huge data sets that cannot be processed using traditional computing methods. It is not a single method or instrument, but a full topic, involving different instruments, techniques, and frameworks.
An excellent example of this can be social media data (Facebook, Google). Google is tracking our location, our activities, our image, our phone conversation, and a lot of things. The type of data is not the same, and it is generating a massive amount of data.
Don't miss out!