Data Processing with Pig MCQ question and answer with solution

71.
Which of the following scripts that generate more than three MapReduce jobs?

A. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = group a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
c = for b generate group.$1, group.$2, COUNT(a);
d = filter c by $2 > 3;
dump d;

B. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = display a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
c = foreach b generate group.$1, group.$2, COUNT(a);
d = filter c by $2 > 3;
dump d;

C. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = group a by (j#'PIG_SCRIPT_ID', j#'USER', j#'JOBNAME');
c = foreach b generate group.$1, group.$2, COUNT(a);
d = filter c by $2 > 3;
dump d;

D. None of the mentioned

Answer & Solution Discuss in Board Save for Later

72.
Point out the correct statement.

A. LoadPredicatePushdown is same as LoadMetadata.setPartitionFilter

B. getOutputFormat() is called by Pig to get the InputFormat used by the loader

C. Pig works with data from many sources

D. None of the mentioned

Answer & Solution Discuss in Board Save for Later

73.
Which of the following find the running time of each script (in seconds)?

A. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name,
(Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
c = group b by (id, user, script_name)
d = foreach c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000; dump d;

B. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = for a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name,
(Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
c = group b by (id, user, script_name)
d = for c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000; dump d;

C. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;
c = group b by (id, user, queue) parallel 10;
d = foreach c generate group.user, group.queue, COUNT(b);
dump d;

D. All of the mentioned

Answer & Solution Discuss in Board Save for Later

74.
Which of the following script determines the number of scripts run by user and queue on a cluster?

A. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;
c = filter b by status != 'SUCCESS';
dump c;

B. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;
c = group b by (id, user, script_name) parallel 10;
d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;
e = filter d by max_reduces == 1;
dump e;

C. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;
c = group b by (id, user, queue) parallel 10;
d = foreach c generate group.user, group.queue, COUNT(b);
dump d;

D. None of the mentioned

Answer & Solution Discuss in Board Save for Later

75.
Point out the wrong statement.

A. Pig can invoke code in language like Java Only

B. Pig enables data workers to write complex data transformations without knowing Java

C. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL

D. Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig

Answer & Solution Discuss in Board Save for Later

76.
Which of the following script is used to check scripts that have failed jobs?

A. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;
c = filter b by status != 'SUCCESS';
dump c;

B. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;
c = group b by (id, user, script_name) parallel 10;
d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;
e = filter d by max_reduces == 1;
dump e;

C. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;
c = group b by (id, user, queue) parallel 10;
d = foreach c generate group.user, group.queue, COUNT(b);
dump d;

D. None of the mentioned

Answer & Solution Discuss in Board Save for Later

77.
Which of the following code is used to find scripts that use only the default parallelism?

A. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as job;
c = filter b by status != 'SUCCESS';
dump c;

B. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;
c = group b by (id, user, script_name) parallel 10;
d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;
e = filter d by max_reduces == 1;
dump e;

C. a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);
b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;
c = group b by (id, user, queue) parallel 10;
d = foreach c generate group.user, group.queue, COUNT(b);
dump d;

D. None of the mentioned

Answer & Solution Discuss in Board Save for Later

78.
Pig Latin is . . . . . . . . and fits very naturally in the pipeline paradigm while SQL is instead declarative.

A. functional

B. procedural

C. declarative

D. all of the mentioned

Answer & Solution Discuss in Board Save for Later

79.
In comparison to SQL, Pig uses . . . . . . . .

A. Lazy evaluation

B. ETL

C. Supports pipeline splits

D. All of the mentioned

Answer & Solution Discuss in Board Save for Later

80.
Which of the following is an entry in jobconf?

A. pig.job

B. pig.input.dirs

C. pig.feature

D. none of the mentioned

Answer & Solution Discuss in Board Save for Later

Data Processing with Pig MCQ question and answer with solution | Hadoop MCQs

71. Which of the following scripts that generate more than three MapReduce jobs?

Answer & Solution

72. Point out the correct statement.

Answer & Solution

73. Which of the following find the running time of each script (in seconds)?

Answer & Solution

74. Which of the following script determines the number of scripts run by user and queue on a cluster?

Answer & Solution

75. Point out the wrong statement.

Answer & Solution

76. Which of the following script is used to check scripts that have failed jobs?

Answer & Solution

77. Which of the following code is used to find scripts that use only the default parallelism?

Answer & Solution

78. Pig Latin is . . . . . . . . and fits very naturally in the pipeline paradigm while SQL is instead declarative.

Answer & Solution

79. In comparison to SQL, Pig uses . . . . . . . .

Answer & Solution

80. Which of the following is an entry in jobconf?

Answer & Solution

71.
Which of the following scripts that generate more than three MapReduce jobs?

72.
Point out the correct statement.

73.
Which of the following find the running time of each script (in seconds)?

74.
Which of the following script determines the number of scripts run by user and queue on a cluster?

75.
Point out the wrong statement.

76.
Which of the following script is used to check scripts that have failed jobs?

77.
Which of the following code is used to find scripts that use only the default parallelism?

78.
Pig Latin is . . . . . . . . and fits very naturally in the pipeline paradigm while SQL is instead declarative.

79.
In comparison to SQL, Pig uses . . . . . . . .

80.
Which of the following is an entry in jobconf?