distributed file system （HDFS） The high reliability of the system is mainly realized by a variety of strategies and mechanisms .
There are mainly ：
Redundant replica policy
You can specify the number of copies of the data file , The default is 3;
Ensure that all data blocks have copies , Not in one datanode After downtime , Loss of data .
Clusters are generally placed on different racks , The inter rack bandwidth is smaller than the in rack bandwidth ;
HDFS have “ Rack aware ” ability , It can automatically store a copy on this rack , Then store another copy in the other rack , This prevents data loss when the rack fails , Bandwidth utilization can also be improved .
Namenode Periodically from datanode Receive heartbeat signal and block Report ,Namenode Validate metadata against block reports ;
Namenode Yes, they didn't send the heartbeat on time datanode Will be marked as down , It won't be given any more I/O request ;
datanode Failure causes the number of copies to decrease , And it is lower than the preset threshold ,namenode These data blocks will be detected , And replicate again at the right time ;
The reason for the re replication also includes the corruption of the data copy itself , disk error , The replication factor was increased .
Namenode When starting, it will go through a “ safe mode ” stage , The secure mode phase does not generate data writes ;
In the security mode phase Namenode Collect each datanode Report of , When the data block reaches the minimum number of copies , Will be considered “ security ” Of ;
In a certain proportion （ Settable ） The data block of is determined as “ security ” after , A few more minutes , End of safe mode ;
When a block with insufficient copies is detected , The block is replicated until the minimum number of copies is reached .
When the document is created , Each block produces a check sum , The checksums are saved in the .meta Within the document ;
When the client gets the data, it can check whether the checksums are the same , To find out whether the data block is damaged ;
If the data block being read is corrupted , You can continue reading other copies .
When deleting a file , It's actually put in the recycle bin /trash, Files in the recycle bin can be recovered quickly ;
You can set a time threshold , When the storage time of files in the recycle bin exceeds this threshold , It was completely removed , And release the occupied data block .
Image files and transaction logs are Namenode Core data of , Can be configured to have multiple copies ;
Copies will be reduced Namenode Processing speed of , But increase security .
Supports storing images at a point in time , The data can be returned to the state at this point in time when necessary ;