Insufficient Parallel Tasks
Solution
5: Adjust Number
of Map Tasks & Reduce Tasks & Memory
Both over allocation and
Under allocation of Map Tasks & Reduce Tasks will degrade the performance.
So we need to find out the optimized values by trial and error methods to keep
the cluster resource utilization in balanced.
mapreduce.tasktracker.map.tasks.maximum
|
2
|
The
maximum number of map tasks that will be run simultaneously by a task
tracker.
|
||
mapreduce.tasktracker.reduce.tasks.maximum
|
2
|
The
maximum number of reduce tasks that will be run simultaneously by a task
tracker.
|
||
mapreduce.map.memory.mb
|
1024
|
The
amount of memory to request from the scheduler for each map task.
|
||
mapreduce.map.cpu.vcores
|
1
|
The
number of virtual cores to request from the scheduler for each map task.
|
||
mapreduce.reduce.memory.mb
|
1024
|
The
amount of memory to request from the scheduler for each reduce task.
|
||
mapreduce.reduce.cpu.vcores
|
1
|
The
number of virtual cores to request from the scheduler for each reduce task.
|
||
yarn.app.mapreduce.am.resource.mb
|
1536
|
The
amount of memory the MR AppMaster needs.
|
||
yarn.app.mapreduce.am.resource.cpu-vcores
|
1
|
The
number of virtual CPU cores the MR AppMaster needs.
|
||
The above are the default
values in mapred-default.xml file and these can be overridden
in mapred-site.xml to better utilize node managers resources
completely.
We can also adjust the
Memory for tasks with property mapred.child.java.opts =
-Xmx2048M in mapred-site.xml
If we have 16 CPU cores and
32 GB RAM on Node Managers, then we can tune these properties upto 8 Map Tasks
and 4 Reduce Tasks with memory 2048 MB allocated to each task at the
maximum and leaving 4 cpu cores in buffer for other tasks/operations running on
same Node Manager.
We can also set these
properties at Job level on Configuration Object.
Below are some
additional Reduce Side Tuning Properties
mapreduce.reduce.shuffle.parallelcopies
|
5
|
The
default number of parallel transfers run by reduce during the copy(shuffle)
phase.
|
||
mapreduce.shuffle.max.threads
|
0
|
Max
allowed threads for serving shuffle connections. Set to zero to indicate the
default of 2 times the number of available processors (as reported by
Runtime.availableProcessors()). Netty is used to serve requests, so a thread
is not needed for each connection.
|
||
mapreduce.shuffle.transferTo.allowed
|
This
option can enable/disable using nio transferTo method in the shuffle phase.
NIO transferTo does not perform well on windows in the shuffle phase. Thus,
with this configuration property it is possible to disable it, in which case
custom transfer method will be used. Recommended value is false when running
Hadoop on Windows. For Linux, it is recommended to set it to true. If nothing
is set then the default value is false for Windows, and true for Linux.
|
|||
mapreduce.shuffle.transfer.buffer.size
|
131072
|
This
property is used only if mapreduce.shuffle.transferTo.allowed is set to
false. In that case, this property defines the size of the buffer used in the
buffer copy code for the shuffle phase. The size of this buffer determines
the size of the IO requests.
|
||
mapreduce.reduce.markreset.buffer.percent
|
0.0
|
The
percentage of memory -relative to the maximum heap size- to be used for
caching values when using the mark-reset functionality.
|
||
mapreduce.map.speculative
|
true
|
If
true, then multiple instances of some map tasks may be executed in parallel.
|
||
mapreduce.reduce.speculative
|
true
|
If
true, then multiple instances of some reduce tasks may be executed in
parallel.
|
||
mapreduce.job.speculative.speculative-cap-running-tasks
|
0.1
|
The
max percent (0-1) of running tasks that can be speculatively re-executed at
any time.
|
||
mapreduce.job.speculative.speculative-cap-total-tasks
|
0.01
|
The
max percent (0-1) of all tasks that can be speculatively re-executed at any
time.
|
||
mapreduce.job.speculative.minimum-allowed-tasks
|
10
|
The
minimum allowed tasks that can be speculatively re-executed at any time.
|
||
mapreduce.job.speculative.retry-after-no-speculate
|
1000
|
The
waiting time(ms) to do next round of speculation if there is no task
speculated in this round.
|
||
mapreduce.job.speculative.retry-after-speculate
|
15000
|
The
waiting time(ms) to do next round of speculation if there are tasks
speculated in this round.
|
Post a Comment
image video quote pre code