i

Hadoop Tutorial

Introduction to BIG DATA

Big data is the area of study that finds the ways to analyze, systematically extract information from data sets that are huge or complex to be dealt with by traditional data-processing application software.

Suppose we have the data of Indian individuals, and we want to analyze the data with some particular information like state, age, income, etc. This kind of study will come under big data analysis.

Big data is produced in amounts of multi terabytes. It is rapidly changing and occurs in various forms that are hard to handle and process using RDBMS or other traditional techniques. Big Data solutions provide the instruments, methodologies, and methods used to capture, store, search, and evaluate information in seconds to discover innovation and competitive gain interactions and perspectives. Eighty per cent of the data generated today is unstructured, and our traditional technologies cannot handle it. The quantity of information produced earlier was not that large. We kept archiving the report as historical data analysis was needed. Today, however, information generation is in petabytes that data cannot be re-archived and retrieved when required as data science.

To understand what Big Data is, first, we need to understand what data is. The characters, quantities, or symbols on which a computer performs operations that can be stored and transmitted as electrical signals and recorded on magnetic, optical, or mechanical recording media, is called data. The buzzword, Big Data, is also information, but the size is enormous. The word Big Data is used to define an information collection that is enormous but grows exponentially over time. In short, such information is huge and complex that it cannot be stored or processed effectively by any of the traditional data management instruments.

Big data is a collection of huge data sets that cannot be processed using traditional computing methods. It is not a single method or instrument, but a full topic, involving different instruments, techniques, and frameworks.

An excellent example of this can be social media data (Facebook, Google). Google is tracking our location, our activities, our image, our phone conversation, and a lot of things. The type of data is not the same, and it is generating a massive amount of data.