Understanding Big Data | Hadoop Developer Self Learning

roshan

2:30:00 AM 3:02:12 PM

With reference to my earlier post related to Hadoop Developer Self Learning Outline.

I am going to write short and simple tutorial on it.

Free Hadoop Tutorial for you

explore big data system and get your hand dirty.

This post will consist of below topic

Understanding Big Data

3V (Volume-Variety-Velocity) characteristics
Structured and Unstructured Data
Application and use cases of Big Data
Limitations of traditional large Scale systems

A. 3V (Volume-Variety-Velocity) characteristics

These 3V's are known as Characteristics of 'Big Data'

1.Volume – The name 'Big Data' itself is related to a size which is enormous. Size of data plays very crucial role in determining value out of data.

Data volume is increasing exponentially

2.Variety – The next aspect of 'Big Data' is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.

Various formats, types, and structures:Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc…

A single application can be generating/collecting many types of data

3.Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.

Data is begin generated fast and need to be processed fast

The fourt 'V ' introduced by IBM

4.Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

B. Structured and Unstructured Data

Structured data

Information stored DB
Strict format

Limitation

Not all data collected is structured

Semi-structured data

Data may have certain structure but not all information collected has identical structure
Some attributes may exist in some of the entities of a particular type but not in others

Unstructured data

“Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner.”

It is expected Structured data is 20 % and rest is in unstructured and Semi-structured data

C. Application and use cases of Big Data

below are the major sector where big data is widely used

Public Sector Services.
Healthcare contributions.
Learning Services.
Insurance Services.
Industrialized and Natural Resources.
Transportation Services.
Banking Sectors and Fraud Detection.

D. Limitations of traditional large Scale systems

Traditional large scale computing involved complex processing on small amounts of data
Exponential growth in data drove development of distributed computing
Distributed computing is difficult!
Hadoop addresses distributed computing challenges

Bring the computation to the data
Fault tolerance
Scalability
Hadoop hides the ‘plumbing’ so developers can focus on the data

go through this difference

what is difference between Operational vs. Analytical Systems it will be helpful.

Please comment in case of any doubt or correction required.

Free Hadoop Tutorial

Understanding Big Data | Hadoop Developer Self Learning

Free Hadoop Tutorial for you

Post a Comment