<> One , Hadoop The origin and development history of

Doug Cutting The architecture of the full-text search engine of Lucene, In the processing of massive data, we encountered and google The same problem .
Google It's open GFS and Mapreduce thought
Doug Cutting It's for people 2 In his spare time HDFS and Mapreduce mechanism
Goolge Hadoop
file system GFS -> HDFS

calculation MapReduce -> Mapreduce

Large table BigTable -> HBase

Doug Coutting: Hadoop Father of

<> Two , hadoop Module of :

<> Common module :




application master

<>Hadoop major component : distributed file system HDFS and MapReduce computational model

NameNode: Metadata management ( metadata : file name , size , Number of copies , Location of each replica on the node …)
DataNode: For specific data storage .
SecordayNamenode: Synchronization of metadata .
Client: Request for data ( upload , read , write ...)

Resourcemanager: Global task scheduling and resource management (Cpu, Memory )
nodemanager: Management of the node
Client: Request to initiate task
application master: Manage a task , Request resources for app , And assign internal task monitoring and fault tolerance

container: Abstraction of environment , Encapsulated CPU, Memory and other multi-dimensional resources .

<> Three , namenode Start up process

Synchronization of metadata ?
NameNode Metadata information of edits Write in file , When edits File reaches a certain threshold (3600 Seconds or size to 64M) When , The merging process will be started .

Consolidation process
1. When the merger begins ,
SecondaryNameNode I will edits and fsimage Copy to the memory of the server ,
The merge build is named fsimage.ckpt Documents of .
2. take fsimage.ckpt File copy to NameNode upper ,
Delete old fsimage,
And will fsimage.ckpt Rename to fsimage.
3. When SecondaryNameNode take edits and fsimage After copying ,
NameNode Will generate a edits.new file , Used to record new metadata ,
When the merge is complete , Original edits File will be deleted ,
And will edits.new Rename file to edits file ,
Start the next process
4. to configure hdfs-site.xml=
<property> <name>dfs.namenode.checkpoint.period</name> <value>3600</value>
<description>The number of seconds between two periodic
checkpoints.</description> </property> <property>
<name>dfs.namenode.checkpoint.txns</name> <value>1000000</value> </property>
<> Four , HDFS characteristic
advantage : 1. Handling large files    The large file here usually refers to 100 MB, hundreds TB File size .    Currently in practical application ,HDFS Already available for storage management PB Class data .
2. Streaming access data HDFS Design based on more response " Write once , Multiple reads " Based on the task . In most cases , The analysis task involves most of the data in the data set .
It is more efficient to request to read the whole dataset than to read a record . 3. Running on a cheap cluster of commercial machines Hadoop The design requires less hardware , Without expensive high availability machines .
Low cost commercial aircraft has a high probability of failure . Design HDFS The reliability of data should be fully considered , Security and high availability . shortcoming : 1. Not suitable for low latency data access 2.
Suitable for storing big data sets , Larger storage file utilization HDFS The design goal is to stream large data sets because Namenode Put the metadata of the file system in memory ,
So the number of files the file system can hold is determined by Namenode The memory size of . generally speaking , Every file , Folders and Block Need to occupy 150 Byte space ,
Larger storage file utilization . 3. Random modification is not supported

©2019-2020 Toolsou All rights reserved,
Mybatis Error resolution :There is no getter for property named '*' in 'class Java.lang.String Big data tells you , How tired are Chinese women Message quality platform series | Full link troubleshooting Gude Haowen serial - You deserve to be an engineer ( Preface ) Image explanation of over fitting and under fitting Springboot of JPA Common query methods JAVA Detailed explanation of anomalies vue Of v-if And v-show The difference between python To solve the problem of dictionary writing list in Codeup——601 | problem A: task scheduling