Apache Pig basically has two execution modes:
- Local Mode
- MapReduce Mode
- In Local mode, pig is running in a single JVM and accessing the local file system.
- This mode is only suitable for small data sets and pig typing.
- All the available files are installed and run on your localhost and file system.
The local mode grunt shell command is:
$ pig –x local
- It is also known as Hadoop Mode.
- It is considered as the default mode.
- In this, Pig renders MapReduce jobs to Pig Latin and executes them on the cluster.
- It can be executed against a fully-distributed Hadoop installation. It can also be executed against a semi-distributed Hadoop installation.
- It presents the input and output data on HDFS.
The command for Map reduce mode:
$ pig or $pig –x mapreduce
Ways to execute Pig Program
There are three ways of executing a pig program on local mode and MapReduce mode:
- Interactive Mode
- Batch Mode
- Embedded mode
The steps to implement the code shown below is:
- Use the Grunt shell to use Pig in the interactive mode.
- Now, Invoke the Grunt shell by using the "pig" command.
- Enter your Pig Latin, and Pig commands interactively on the command prompt.
Grunt> A=load ‘password’ using PigStorage (‘:’); Grunt> B=foreach A generate $0 as id; Grunt>dump B;
We can run Pig in batch mode using Pig script and the "pig" command, which can be run on local or hadoop mode.
/* id.pig */ A= load ‘password’ using PigStorage(‘:’); --load the password file B= foreach A generate $0 as id ; -- extract the user IDs Store B into ‘id.out’; -- Write the results to the ID name of the file.
We can define our own functions in the embedded mode, which can be defined as UDF (User Defined Functions). The programming languages such as Java and Python are used here.