Pig 0.15.0 Installation on Single Node Cluster Hadoop 2.7.0 on Ubuntu 14.10
Download Latest Pig installation From http://mirrors.sonic.net/apache/pig/
$ wget -c http://mirrors.sonic.net/apache/pig/pig-0.15.0/pig-0.15.0.tar.gz
$ sudo tar -zxvf pig-0.15.0.tar.gz
$ sudo mv pig-0.15.0 /usr/local/pig
$ sudo gedit ~/.bashrc
export PIG_HOME=/usr/local/pig
export PIG_CONF_DIR=$PIG_HOME/conf
export PIG_CLASS_PATH=$PIG_CONF_DIR
export PATH=$PIG_HOME/bin:$PATH
$ source ~/.bashrc
$ pig
you will get grunt shell
Introduction
Apache Pig is a high-level scripting platform for processing large data sets in Hadoop. It simplifies complex operations using Pig Latin, which abstracts low-level MapReduce operations. In this guide, we will walk you through installing Pig 0.15.0 on a single-node Hadoop 2.7.0 cluster running Ubuntu 14.10.
Prerequisites
Before proceeding with Pig installation, ensure you have:
- A machine running Ubuntu 14.10 (or a virtual machine)
- Java installed (JDK 1.7 or later)
- Hadoop 2.7.0 installed and configured as a single-node cluster
If you haven’t set up Hadoop yet, follow these steps before installing Pig.
Step 1: Install Java (If Not Installed)
Since Hadoop and Pig require Java, verify the installation with:
If Java is not installed, install OpenJDK:
After installation, confirm:
Ensure Java is properly set in the environment variables:
Step 2: Install Hadoop 2.7.0 (If Not Installed)
If you haven’t set up Hadoop, follow these steps:
- Download Hadoop 2.7.0
- Extract the Archive
- Configure Environment Variables
Edit ~/.bashrc
and add:
Apply changes:
- Verify Hadoop Installation
If successful, you’re ready to proceed with Pig installation.
Step 3: Download and Install Apache Pig 0.15.0
- Download Pig 0.15.0
- Extract the Archive
- Set Up Environment Variables
Edit ~/.bashrc
and add:
Apply changes:
Step 4: Verify Pig Installation
To confirm Pig is installed properly, check the version:
To test Pig in interactive mode:
If you see the grunt>
prompt, Pig is working correctly. Exit by typing:
Step 5: Run a Simple Pig Script
- Create a Sample Input File
- Write a Pig Script
Create a script file named script.pig
:
Add the following Pig Latin commands:
Save and exit (CTRL+X
, Y
, Enter
).
- Run the Pig Script
You should see the output:
This confirms Pig is correctly installed and running.
Step 6: Running Pig in MapReduce Mode
To run Pig on Hadoop, use:
This ensures Pig utilizes the Hadoop framework instead of local mode. Ensure Hadoop is running before executing any script.
Conclusion
You have successfully installed Apache Pig 0.15.0 on a single-node Hadoop 2.7.0 cluster on Ubuntu 14.10. Now you can process large datasets using Pig Latin, making complex data transformations easier. Experiment with different Pig commands and explore its powerful features for big data processing.
If you encounter any issues, ensure Java, Hadoop, and Pig are properly configured and paths are correctly set. Happy coding!
Post a Comment
image video quote pre code