Pig 0.15.0 Installation on Single Node Cluster Hadoop 2.7.0 on Ubuntu 14.10

11:31:00 PM 11:25:02 PM

Steps to Install Apache Pig

Download Latest Pig installation From http://mirrors.sonic.net/apache/pig/

$ wget -c http://mirrors.sonic.net/apache/pig/pig-0.15.0/pig-0.15.0.tar.gz

$ sudo tar -zxvf pig-0.15.0.tar.gz

$ sudo mv pig-0.15.0 /usr/local/pig

$ sudo gedit ~/.bashrc

export PIG_HOME=/usr/local/pig
export PIG_CONF_DIR=$PIG_HOME/conf
export PIG_CLASS_PATH=$PIG_CONF_DIR
export PATH=$PIG_HOME/bin:$PATH

$ source ~/.bashrc

$ pig

you will get grunt shell

Introduction

Apache Pig is a high-level scripting platform for processing large data sets in Hadoop. It simplifies complex operations using Pig Latin, which abstracts low-level MapReduce operations. In this guide, we will walk you through installing Pig 0.15.0 on a single-node Hadoop 2.7.0 cluster running Ubuntu 14.10.

Prerequisites

Before proceeding with Pig installation, ensure you have:

A machine running Ubuntu 14.10 (or a virtual machine)
Java installed (JDK 1.7 or later)
Hadoop 2.7.0 installed and configured as a single-node cluster

If you haven’t set up Hadoop yet, follow these steps before installing Pig.

Step 1: Install Java (If Not Installed)

Since Hadoop and Pig require Java, verify the installation with:

bash

java -version

If Java is not installed, install OpenJDK:

bash

sudo apt update
sudo apt install openjdk-7-jdk -y

After installation, confirm:

bash

java -version

Ensure Java is properly set in the environment variables:

bash

echo "export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))" >> ~/.bashrc
source ~/.bashrc

Step 2: Install Hadoop 2.7.0 (If Not Installed)

If you haven’t set up Hadoop, follow these steps:

Download Hadoop 2.7.0

bash

cd /usr/local
sudo wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.0/hadoop-2.7.0.tar.gz

Extract the Archive

bash

sudo tar -xzf hadoop-2.7.0.tar.gz
sudo mv hadoop-2.7.0 hadoop

Configure Environment Variables

Edit ~/.bashrc and add:

bash

export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))

Apply changes:

bash

source ~/.bashrc

Verify Hadoop Installation

bash

hadoop version

If successful, you’re ready to proceed with Pig installation.

Step 3: Download and Install Apache Pig 0.15.0

Download Pig 0.15.0

bash

cd /usr/local
sudo wget https://archive.apache.org/dist/pig/pig-0.15.0/pig-0.15.0.tar.gz

Extract the Archive

bash

sudo tar -xzf pig-0.15.0.tar.gz
sudo mv pig-0.15.0 pig

Set Up Environment Variables

Edit ~/.bashrc and add:

bash

export PIG_HOME=/usr/local/pig
export PATH=$PIG_HOME/bin:$PATH
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop

Apply changes:

bash

source ~/.bashrc

Step 4: Verify Pig Installation

To confirm Pig is installed properly, check the version:

bash

pig -version

To test Pig in interactive mode:

bash

pig -x local

If you see the grunt> prompt, Pig is working correctly. Exit by typing:

bash

quit;

Step 5: Run a Simple Pig Script

Create a Sample Input File

bash

echo -e "1\tAlice\n2\tBob\n3\tCharlie" > input.txt

Write a Pig Script

Create a script file named script.pig:

bash

nano script.pig

Add the following Pig Latin commands:

pig

data = LOAD 'input.txt' USING PigStorage('\t') AS (id:int, name:chararray);
DUMP data;

Save and exit (CTRL+X, Y, Enter).

Run the Pig Script

bash

pig -x local script.pig

You should see the output:

scss

(1, Alice)  
(2, Bob)  
(3, Charlie)

This confirms Pig is correctly installed and running.

Step 6: Running Pig in MapReduce Mode

To run Pig on Hadoop, use:

bash

pig -x mapreduce

This ensures Pig utilizes the Hadoop framework instead of local mode. Ensure Hadoop is running before executing any script.

Conclusion

You have successfully installed Apache Pig 0.15.0 on a single-node Hadoop 2.7.0 cluster on Ubuntu 14.10. Now you can process large datasets using Pig Latin, making complex data transformations easier. Experiment with different Pig commands and explore its powerful features for big data processing.

If you encounter any issues, ensure Java, Hadoop, and Pig are properly configured and paths are correctly set. Happy coding!

Pig installation. Single node Cluster setup. Pig Configuration.