Insufficient Parallel Tasks

Solution 5: Adjust Number of Map Tasks & Reduce Tasks & Memory
Both over allocation and Under allocation of Map Tasks & Reduce Tasks will degrade the performance. So we need to find out the optimized values by trial and error methods to keep the cluster resource utilization in balanced.
mapreduce.tasktracker.map.tasks.maximum
2
The maximum number of map tasks that will be run simultaneously by a task tracker.
mapreduce.tasktracker.reduce.tasks.maximum
2
The maximum number of reduce tasks that will be run simultaneously by a task tracker.
mapreduce.map.memory.mb
1024
The amount of memory to request from the scheduler for each map task.
mapreduce.map.cpu.vcores
1
The number of virtual cores to request from the scheduler for each map task.
mapreduce.reduce.memory.mb
1024
The amount of memory to request from the scheduler for each reduce task.
mapreduce.reduce.cpu.vcores
1
The number of virtual cores to request from the scheduler for each reduce task.
yarn.app.mapreduce.am.resource.mb
1536
The amount of memory the MR AppMaster needs.
yarn.app.mapreduce.am.resource.cpu-vcores   
1
The number of virtual CPU cores the MR AppMaster needs.
The above are the default values in mapred-default.xml file and these can be overridden in mapred-site.xml to better utilize node managers resources completely.
We can also adjust the Memory for tasks with property mapred.child.java.opts = -Xmx2048M in mapred-site.xml
If we have 16 CPU cores and 32 GB RAM on Node Managers, then we can tune these properties upto 8 Map Tasks and 4 Reduce Tasks with memory 2048 MB allocated to each task at the maximum and leaving 4 cpu cores in buffer for other tasks/operations running on same Node Manager.
We can also set these properties at Job level on Configuration Object.
Below are some additional Reduce Side Tuning Properties

mapreduce.reduce.shuffle.parallelcopies
5
The default number of parallel transfers run by reduce during the copy(shuffle) phase.
mapreduce.shuffle.max.threads
0
Max allowed threads for serving shuffle connections. Set to zero to indicate the default of 2 times the number of available processors (as reported by Runtime.availableProcessors()). Netty is used to serve requests, so a thread is not needed for each connection.
mapreduce.shuffle.transferTo.allowed

This option can enable/disable using nio transferTo method in the shuffle phase. NIO transferTo does not perform well on windows in the shuffle phase. Thus, with this configuration property it is possible to disable it, in which case custom transfer method will be used. Recommended value is false when running Hadoop on Windows. For Linux, it is recommended to set it to true. If nothing is set then the default value is false for Windows, and true for Linux.
mapreduce.shuffle.transfer.buffer.size
131072
This property is used only if mapreduce.shuffle.transferTo.allowed is set to false. In that case, this property defines the size of the buffer used in the buffer copy code for the shuffle phase. The size of this buffer determines the size of the IO requests.
mapreduce.reduce.markreset.buffer.percent
0.0
The percentage of memory -relative to the maximum heap size- to be used for caching values when using the mark-reset functionality.
mapreduce.map.speculative
true
If true, then multiple instances of some map tasks may be executed in parallel.
mapreduce.reduce.speculative
true
If true, then multiple instances of some reduce tasks may be executed in parallel.
mapreduce.job.speculative.speculative-cap-running-tasks
0.1
The max percent (0-1) of running tasks that can be speculatively re-executed at any time.
mapreduce.job.speculative.speculative-cap-total-tasks
0.01
The max percent (0-1) of all tasks that can be speculatively re-executed at any time.
mapreduce.job.speculative.minimum-allowed-tasks
10
The minimum allowed tasks that can be speculatively re-executed at any time.
mapreduce.job.speculative.retry-after-no-speculate
1000
The waiting time(ms) to do next round of speculation if there is no task speculated in this round.
mapreduce.job.speculative.retry-after-speculate
15000
The waiting time(ms) to do next round of speculation if there are tasks speculated in this round.