i

Hadoop Tutorial

Apache Pig Definition

Pig is a high-level programming language for analyzing large data sets, usually in a Hadoop environment. A pig was a development effort at Yahoo! In a MapReduce framework, where programs need to be translated into a series of MapReduce phases. However, this is not a direct programming model that data analysts are familiar with. So, to bridge this gap, an abstraction called Pig was built on top of Hadoop.

We generally integrate Pig with Hadoop. All the data manipulation operations are performed in Hadoop using Apache Pig. To develop data analysis programs, Pig provides a high-level language known as Pig Latin. This language offers various operators using which programmers can develop their functions for reading, writing, and processing data.

To analyze data, programmers need to write Pig scripts using the Pig Latin language. Pig Engine accepts the Pig Latin scripts as input and internally converts to Map and Reduce jobs.

Apache Pig enables us to focus more on analyzing huge data sets and to spend less writing time for Map-Reduce programs. The Pig language is designed to work upon any kind of data. That's why it is named after Pig!