Answer to Hadoop Real time Questions Part 2
11. Hadoop performance tunning
Please go through Performance Tuning
12.Planning of Hadoop cluster
Two main aspect need to consider:
- The number of machines
- Specification of the machines- (RAM, Storage and Processor)
Cluster Specification:
Production cluster size :
You need say the answer based on daily income data size and duration of project project running and hard disk size of each machine.
Important aspect to be considered while planning:
13. what is ranger
Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem.
yes or no
Generally Pig or Hive having some Built-in functions,we can use that Built-in functions for our program with out adding any extra code but some times required logic is not available in that built-in functions. Thus there is need of UDF, at that time user have to write some own custom user defined functions called UDF (user defined function).
15.have you written any script automated in system for cluster.
I've seen this done very nicely using Foreman, Chef, and Ambari Blueprints. Foreman was used to provision the VMs, Chef scripts were used to install Ambari, configure the Ambari blueprint, and to create the cluster using the Blueprint.
16. Kerberos installation and configuration
On the terminal use this command :
Production cluster size :
You need say the answer based on daily income data size and duration of project project running and hard disk size of each machine.
Important aspect to be considered while planning:
- Hardware Requirement for NameNodes:
- Hardware Requirement for JobTracker/ResourceManager:
- Memory sizing:depends on the size of data
- Processors: number of cores
- Hardware Requirement for SlavesNodes
- capacity planning:
- Number of nodes: capacity planning / The number of hard disks we need. Ex 120Tb/12 1Tb = 10 nodes.
13. what is ranger
Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem.
- Ranger tutorial can be found here
yes or no
Generally Pig or Hive having some Built-in functions,we can use that Built-in functions for our program with out adding any extra code but some times required logic is not available in that built-in functions. Thus there is need of UDF, at that time user have to write some own custom user defined functions called UDF (user defined function).
15.have you written any script automated in system for cluster.
I've seen this done very nicely using Foreman, Chef, and Ambari Blueprints. Foreman was used to provision the VMs, Chef scripts were used to install Ambari, configure the Ambari blueprint, and to create the cluster using the Blueprint.
16. Kerberos installation and configuration
On the terminal use this command :
>>user@ubuntu:~$ sudo apt-get install krb5-user
Press Y when asked. and then press enter when ask for package configuration.
Done
Press Y when asked. and then press enter when ask for package configuration.
Done
17.How well does Hadoop scaling ?
adding or removing nodes/machine to Cluster
type: Scale-up vs Scale-out or commissioning and decommission
18.Name upgrade and increase cluster size like commissioning and decommission
Commissioning: Adding nodes
Decommission: removing nodes
19.have you used metrics ?
No
Metrics are statistical information exposed by Hadoop daemons, used for monitoring, performance tuning and debug. There are many metrics available by default and they are very useful for troubleshooting.
about metrics you can read here
20.how to decide a cluster size ...based on data size. can u tell which formula we are using
check question number 12
21.can you explain me complete Hadoop eco system and it works?
Tell every thing what you know about hadoop stack.
22.15 node cluster ...how many datanodes are in that
23.How much data you processed in 15 node cluster
24.Every day how much data you processing.
please look at the example specified in question 12 and here .All above question answer are mentioned there.
adding or removing nodes/machine to Cluster
type: Scale-up vs Scale-out or commissioning and decommission
18.Name upgrade and increase cluster size like commissioning and decommission
Commissioning: Adding nodes
Decommission: removing nodes
19.have you used metrics ?
No
Metrics are statistical information exposed by Hadoop daemons, used for monitoring, performance tuning and debug. There are many metrics available by default and they are very useful for troubleshooting.
about metrics you can read here
20.how to decide a cluster size ...based on data size. can u tell which formula we are using
check question number 12
21.can you explain me complete Hadoop eco system and it works?
Tell every thing what you know about hadoop stack.
22.15 node cluster ...how many datanodes are in that
23.How much data you processed in 15 node cluster
24.Every day how much data you processing.
please look at the example specified in question 12 and here .All above question answer are mentioned there.
- See question list at: Here
Post a Comment
image video quote pre code