<> One , Hadoop The origin and development history of
Doug Cutting The architecture of the full-text search engine of Lucene, In the processing of massive data, we encountered and google The same problem .
Google It's open GFS and Mapreduce thought
Doug Cutting It's for people 2 In his spare time HDFS and Mapreduce mechanism
Goolge Hadoop
file system GFS -> HDFS
calculation MapReduce -> Mapreduce
Large table BigTable -> HBase
Doug Coutting: Hadoop Father of
<> Two , hadoop Module of :
<> Common module :
<>HDFS:
namenode
datanode
secondarynamenode
<>Yarn:
resourcemanger
nodemanager
application master
container
<>Hadoop major component : distributed file system HDFS and MapReduce computational model
HDFS:
NameNode: Metadata management ( metadata : file name , size , Number of copies , Location of each replica on the node …)
DataNode: For specific data storage .
SecordayNamenode: Synchronization of metadata .
Client: Request for data ( upload , read , write ...)
Yearn:
Resourcemanager: Global task scheduling and resource management (Cpu, Memory )
nodemanager: Management of the node
Client: Request to initiate task
application master: Manage a task , Request resources for app , And assign internal task monitoring and fault tolerance
container: Abstraction of environment , Encapsulated CPU, Memory and other multi-dimensional resources .
<> Three , namenode Start up process
Synchronization of metadata ?
NameNode Metadata information of edits Write in file , When edits File reaches a certain threshold (3600 Seconds or size to 64M) When , The merging process will be started .
Consolidation process
1. When the merger begins ,
SecondaryNameNode I will edits and fsimage Copy to the memory of the server ,
The merge build is named fsimage.ckpt Documents of .
2. take fsimage.ckpt File copy to NameNode upper ,
Delete old fsimage,
And will fsimage.ckpt Rename to fsimage.
3. When SecondaryNameNode take edits and fsimage After copying ,
NameNode Will generate a edits.new file , Used to record new metadata ,
When the merge is complete , Original edits File will be deleted ,
And will edits.new Rename file to edits file ,
Start the next process
4. to configure hdfs-site.xml=
<property> <name>dfs.namenode.checkpoint.period</name> <value>3600</value>
<description>The number of seconds between two periodic
checkpoints.</description> </property> <property>
<name>dfs.namenode.checkpoint.txns</name> <value>1000000</value> </property>
<> Four , HDFS characteristic
advantage : 1. Handling large files The large file here usually refers to 100 MB, hundreds TB File size . Currently in practical application ,HDFS Already available for storage management PB Class data .
2. Streaming access data HDFS Design based on more response " Write once , Multiple reads " Based on the task . In most cases , The analysis task involves most of the data in the data set .
It is more efficient to request to read the whole dataset than to read a record . 3. Running on a cheap cluster of commercial machines Hadoop The design requires less hardware , Without expensive high availability machines .
Low cost commercial aircraft has a high probability of failure . Design HDFS The reliability of data should be fully considered , Security and high availability . shortcoming : 1. Not suitable for low latency data access 2.
Suitable for storing big data sets , Larger storage file utilization HDFS The design goal is to stream large data sets because Namenode Put the metadata of the file system in memory ,
So the number of files the file system can hold is determined by Namenode The memory size of . generally speaking , Every file , Folders and Block Need to occupy 150 Byte space ,
Larger storage file utilization . 3. Random modification is not supported
Technology
Daily Recommendation