Before we get to the point , Let's first understand what is distributed CAP theorem .

According to Baidu Encyclopedia ,CAP Theorem also known as CAP principle , In a distributed system ,Consistency( uniformity ),
Availability( usability ),Partition tolerance( Partition tolerance ), You can only have two of three properties at the same time , You can't have all three .

One ,CAP Definition of

Consistency ( uniformity ):

“all nodes see the same data at the same
time”, After the update operation succeeds and returns to the client , All nodes have identical data at the same time , This is distributed consistency . Consistency is inevitable in concurrent systems , For clients , Consistency refers to how to obtain the updated data during concurrent access . From the server side , How to copy and distribute updates to the whole system , To ensure the final consistency of data .

Availability ( usability ):

Availability means “Reads and writes always
succeed”, That is, the service is always available , And it's normal response time . Good usability mainly means that the system can serve users well , There is no bad user experience such as user operation failure or access timeout .

Partition Tolerance ( Partition tolerance ):

That is, when the distributed system encounters a node or network partition failure , Still able to provide external services to meet the consistency or availability .

The fault tolerance requirement of partition can make the application be a distributed system , And it looks like it's a working whole . For example, in today's distributed system, one or several machines are down , The rest of the machines are still working properly to meet the needs of the system , No experience impact for users .

Two ,CAP Proof of theorem

Now let's prove it , Why can't three characteristics be satisfied at the same time ?

Suppose there are two servers , One for application A And database V, One for application B And database V, The network between them can be interconnected , It is equivalent to two parts of distributed system .

When consistency is met , Two servers
N1 and N2, At first, the data of the two servers is the same ,DB0=DB0. When availability is met , Whether it's a request or not N1 perhaps N2, Will get immediate response . When the fault tolerance of partition is satisfied ,N1 and N2 Any party down , Or when the network is not available , Will not affect N1 and N2 Normal operation between each other . 

When the user passes N1 In A Apply request data update to server DB0 after , At this time N1 Servers in DB0 Change to DB1, Data synchronization update operation through distributed system ,N2 Database in server V0 Also updated to DB1, At this time , User through B The data obtained by sending a request to the database is the data after immediate update DB1.

This is the normal operation , But in the distributed system , The biggest problem is network transmission , Now suppose an extreme situation ,N1 and N2 The network is disconnected , But we still need to support this kind of network exception , That is to say, it satisfies the fault tolerance of partition , So can consistency and availability be satisfied at the same time ?

hypothesis N1 and N2 When communicating with each other, the network suddenly breaks down , There are users to N1 Send data update request , that N1 Data in DB0 Will be updated to DB1, Because the network is disconnected ,N2 Database in is still DB0;

If this time , There are users to N2 Send data read request , Because the data has not yet been synchronized , The application can't immediately return the latest data to the user DB1, What shall I do? ? There are two options , first , Sacrificing data consistency , Respond to old data DB0 For users ; second , Sacrifice availability , Blocking waiting , Until the network connection is restored , After the data update operation is completed , Then respond to the latest data for users DB1.

The above process is relatively simple , But it also shows that the distributed system should satisfy the fault tolerance of partition , Only in both consistency and availability , Choose one . That is to say, distributed system can not satisfy three characteristics at the same time . This requires us to make a choice when building the system , that , How to choose is a better strategy ?

Three , Trade-off strategy

CAP Only two of the three characteristics can be satisfied , Then there are three strategies to choose between :

CA without P:
If not required P( Partition not allowed ), be C( Strong consistency ) and A( usability ) It's guaranteed . But give up P At the same time, it means giving up the expansibility of the system , That is, distributed nodes are limited , No way to deploy child nodes , This is against the original intention of distributed system design .

CP without A:
If not required A( available ), It means that every request needs to be consistent between servers , and P( partition ) Will cause unlimited synchronization time ( That is, wait for the data synchronization to complete before you can access the service normally ), In case of network failure or message loss, etc , It's about sacrificing the user experience , Wait for all data to be consistent before allowing users to access the system . Designed as CP We have a lot of systems , The most typical is distributed database , as Redis,HBase etc . For these distributed databases , Data consistency is the most basic requirement , Because if you can't even meet this standard , So it's better to use relational database directly , There is no need to waste resources to deploy distributed database .

 AP wihtout C:
To be highly available and allow partitioning , Then consistency needs to be abandoned . Once a partition occurs , Possible loss of contact between nodes , For high availability , Each node can only provide services with local data , This will lead to global data inconsistency . Typical applications are like the scene of a certain meter's mobile phone , Maybe a few seconds ago, when you browse the products, the page prompts that there is inventory , When you are ready to place an order , The system prompts you to fail to place the order , Item sold out . This is actually the first
A( usability ) Ensure the normal service of the system , And then made some sacrifices in terms of data consistency , Although it will affect some user experience , But it will not cause serious blocking of the user's shopping process .

Three , summary

Now , For most large Internet application scenarios , Many hosts , Scattered deployment , And now the cluster size is growing , More and more nodes , So node failure , Network failure is normal , Therefore, the fault tolerance of partition becomes an inevitable problem for a distributed system . Then you can only C and A Choose between . But it may be different for traditional projects , Take the bank's transfer system , No compromise on data consistency involving money ,C Must guarantee , In case of network failure , Prefer to stop service , You can A and P Choose between .

to make a long story short , There is no best strategy , A good system should be designed according to business scenarios , Only the right one is the best .

©2019-2020 Toolsou All rights reserved,
JS How to operate java Realize the function of grabbing red packets C Language programming to find a student's grade The United Nations 《 Glory of Kings 》 Please go to the studio : To save the earth Dialogue between apple and Nissan suspended ,Apple Car How's it going ?CSS architecture design China's longest high speed rail officially opened ! The fastest way to finish the race 30.5 hour First knowledge MySQL Comprehensive review ( dried food )2021 year 1 Monthly programmer salary statistics , average 14915 element How to use it quickly html and css Write static page