The main key to Hadoop capability is bringing the data and processing together.
There are two basic Hadoop components
1.Hadoop distributed file system: HDFS is where we store the data. It is a distributed file system that provides built-in redundancy and fault tolerance for all the Hadoop processing
2.Mapreduce framework: It is a programming model for large scale data processing in distributed manner. There are 2 major steps in map reduce : Map and reduce
Map: Map step is the master process that divides problem into smaller sub-problems and distribute them to slave nodes as a map task
Reduce: Reduce step processes the data from the slave nodes and outputs from the map task serves as the input to reduce task and to form the final and ultimate output.
Hadoop stack:
Diagram (original version of Hadoop)
PIG, HIVE, others are applications that they allow us to …show more content…
There is a shortage of talent necessary to fill in the organization in order to deal with all of the bigdata.
"By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know- how to use the analysis of bigdata to make effective decisions" -Mckinsey Global Institute "Bigdata report"
"The unemployment rate in U.S. continues to be abysmal (9.1% in July), but the tech world has spawned as new kind of highly skilled, nerdy-cool job that companies are scrambling to fill.....Data scientist" -Fortune Magazine
The data science jobs are growing rapidly. Below figure shows that how much these jobs are in