Apache Pig Run Modes

Apache Pig basically has two execution modes:

  • Local Mode
  • MapReduce Mode
Apache Pig Run Modes

Local Mode

  • In Local mode, pig is running in a single JVM and accessing the local file system.
  • This mode is only suitable for small data sets and pig typing.
  • All the available files are installed and run on your localhost and file system.

The local mode grunt shell command is:

$ pig –x local

MapReduce Mode

  • It is also known as Hadoop Mode.
  • It is considered as the default mode.
  • In this, Pig renders MapReduce jobs to Pig Latin and executes them on the cluster.
  • It can be executed against a fully-distributed Hadoop installation. It can also be executed against a semi-distributed Hadoop installation.
  • It presents the input and output data on HDFS.

The command for Map reduce mode:

$ pig or $pig –x mapreduce

Ways to execute Pig Program

There are three ways of executing a pig program on local mode and MapReduce mode:

  • Interactive Mode
  • Batch Mode
  • Embedded mode

Interactive Mode

The steps to implement the code shown below is:

  1. Use the Grunt shell to use Pig in the interactive mode.
  2. Now, Invoke the Grunt shell by using the "pig" command.
  3.  Enter your Pig Latin, and Pig commands interactively on the command prompt.

Example:-

Grunt> A=load ‘password’ using PigStorage (‘:’);
 Grunt> B=foreach A generate $0 as id;
 Grunt>dump B; 

Batch Mode

We can run Pig in batch mode using Pig script and the "pig" command, which can be run on local or hadoop mode.

Example:-

/* id.pig */
 A= load ‘password’ using PigStorage(‘:’); --load the password file
 B= foreach A generate $0 as id ; -- extract the user IDs
 Store B into ‘id.out’; -- Write the results to the ID name of the file. 

Embedded Mode

We can define our own functions in the embedded mode, which can be defined as UDF (User Defined Functions). The programming languages such as Java and Python are used here.