MapReduce Interview Question Part2

Q11 Explain what is the purpose of RecordReader in Hadoop?
Answer: In Hadoop, the RecordReader loads the data from its source and converts it into key, value pairs suitable for reading by the Mapper.

MapReduce Interview Questions
explore mapreduce story
Q12 Explain MapReduce and its needs while programming with Apache Pig
Answer: All programs in Apache Pig have been written usually in query language which is also called nowadays as Pig Latin. It is has some similarity with SQL language of query as well. In order get the query executed, you must also remember to make use of an engine that specialises in this. Queries are converted from pig engines into jobs and therefore MapReduce will act as an engine of execution which is required to run programs.
Q13 What are some typical functions of Job Tracker?
Answer: The following are some typical tasks of JobTracker:-
  • When Client applications submit map reduce jobs to the Job tracker
  • The JobTracker talks to the Name node to determine the location of the data
  • The JobTracker locates TaskTracker nodes with available slots at or near the data
  • The JobTracker submits the work to the chosen TaskTracker nodes
  • The TaskTracker nodes are monitored. If they do not submit heartbeat signals they are deemed to have failedand the work is scheduled on different TaskTracker
  • When the work is completed, the JobTracker updates its status
  • Client applications can poll the JobTracker for information
Q14What are the four basic parameters of a mapper?
Answer: The four basic parameters of a mapper are LongWritable, text; text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters.
Q15 How can we change the split size if our commodity hardware has less storage space?
Answer: If our commodity hardware has less storage space, we can change the split size by writing the ‘custom splitter’. There is a feature of customization in Hadoop which can be called from the main method.
Q16 What is a TaskInstance?
Answer: The actual Hadoop MapReduce jobs that run on each slave node are referred to as Task instances. Every task instance has its own JVM process. For every new task instance, a JVM process is spawned by default for a task.
Q17 What do the master class and the output class do?
Answer: Master is defined to update the Master or the job tracker and the output class is defined to write data onto the output location.
Q18 What is the input type/format in MapReduce by default?
Answer: By default the type input type in MapReduce is ‘text’.
Q19 Is it mandatory to set input and output type/format in MapReduce?
Answer: No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as ‘text’.
Q20 How is Hadoop different from other data processing tools?
Answer: In Hadoop, based upon your requirements, you can increase or decrease the number of mappers without bothering about the volume of data to be processed. This is the beauty of parallel processing in contrast to the other data processing tools available.