Now? , With the increase of the number of Internet users , Some large websites begin to use database cluster to improve the reliability and performance of database . So before introducing database cluster, we need to clarify several problems .
1. Why use database clustering
（1） By using the database cluster, read and write can be separated , Improve the system performance of database .
Everybody knows ,mysql It is distributed .MySQL Proxy One of the most powerful features is implementation “ Separation of reading and writing (Read/Write
Splitting)”. The basic principle is to let the master database handle transactional queries , From the database
reason SELECT query . Database replication is used to synchronize changes caused by transactional queries to the slave database in the cluster , Thus, the data of the slave database and the master database are consistent . of course , The master server can also provide query services .
The biggest effect of using read-write separation is no more than environmental server pressure . Take a look at this picture ：
Why read write separation can improve database performance ?（ Excerpt from the Internet ）
1. Increase of physical servers , Load increase
2. The master and slave are only responsible for their own writing and reading , Great relief X Lock and S Lock contention
3. Configurable from library myisam engine , Improve query performance and save system overhead
4. There is a difference between synchronizing the data from the master database and writing directly from the master database , Sent through the main library binlog Recover data , however , The most important difference is that the master database sends to the slave database binlog It's asynchronous , Recovering data from the library is also asynchronous
5. The separation of reading and writing is suitable for scenarios where reading is far greater than writing , If there is only one server , When select A lot of time ,update and delete It's going to be select Data blocking in access , wait for select end , The concurrency performance is not high .
For applications where the ratio of writing to reading is similar , Dual master replication should be deployed
6. You can add some parameters when starting from the library to improve its reading performance , for example --skip-innodb,--skip-bdb,--low-priority-updates as well as --delay-key-write=ALL. Of course, these settings also need to be determined according to the specific business needs , Not necessarily
7. Allocation read . If we have 1 main 3 from , Do not consider the above 1 From the library mentioned in the unilateral settings , Suppose now 1
In minutes 10 Bar write ,150 Bar read . that ,1 main 3 From equivalent to total 40 Bar write , The total number of reads did not change , So, on average, each server is responsible for 10 Bar write and 50 Bar read （ The main database is not
Undertaking read operations ）. therefore , Although the writing has not changed , But the read is heavily shared , The system performance is improved . in addition , When the read is allocated , It also indirectly improves the performance of writing . therefore , Overall performance improved , Speaking frankly
It's about trading machines and bandwidth for performance .MySQL There are relevant calculation formulas in official documents ： Official documents see 6.9FAQ of “MySQL When and how much can replication improve system performance ”
8.MySQL Another big function of replication is to increase redundancy , Improve availability , When a database server goes down, it can restore the service as quickly as possible by adjusting another slave database , So you can't just look at performance , in other words 1 main 1 It's OK to never .
2. What is the difference between a database cluster and a distributed database ?
in a word ： Distributed works in parallel , Clusters work in series .
1： Distributed refers to the distribution of different services in different places . Clustering refers to clustering several servers together , Realize the same business . Every node in the distributed system , Can be clustered .
And cluster is not necessarily a branch
Cloth type .
give an example ： Like Sina , More people visited , He can make a cluster , Put a response server in front of it , The following servers complete the same business , If there is a business visit , Response server to see which server
The load is not very heavy , Which one is going to finish . And distributed , Understanding in a narrow sense , It's similar to a cluster ,
But its organization is relatively loose , Unlike clusters , There is an organization , One server crashed , Other servers can top up .
Every node in the distribution , All of them have completed different businesses , One node collapsed , The business is not accessible .
2： in brief , Distributed is to shorten the execution time of a single task to improve efficiency , Cluster improves efficiency by increasing the number of tasks per unit time .
give an example ： If a task is created by 10 Subtask composition , Each subtask is executed separately 1 hour , To perform the task on a server, you need to 10 hour .
Distributed scheme is adopted , provide 10 Servers , Each server is responsible for only one subtask , The dependency between subtasks is not considered , It takes only an hour to complete the task .( A typical example of this mode of work is
Hadoop Of Map/Reduce Distributed computing model ）
The cluster scheme is adopted , Also available 10 Servers , Each server can handle this task independently . Suppose there is 10 Tasks arrive at the same time ,10 Servers will work at the same time ,1 Hours later ,10 Tasks completed at the same time , such ,
Look at the whole body , still 1 Finish a task in hours !
Look at the picture below ：