MyISAM and InnoDB The difference between , When to choose MyISAM
InnoDB It's the current MySQL Mainstream version (5.6,5.7,8.0) Default storage engine , Support transactions , Foreign key , Row level lock , Data consistency is required under concurrent conditions , Suitable for scenarios with high requirements for data accuracy .
MyISAM Only table level locks are supported , The data is arranged according to the insertion order , No rule sorting . Suitable for application mainly query and insert , There are only a few update and delete operations , Scenarios with low requirements for transaction integrity and concurrency .
Seeing that many people have no brain to choose storage engine InnoDB, The reasonable point of this choice is that if the data accuracy requirements are not so high , Direct use NoSQL Just fine . use MySQL Just to be reliable .
But in practice , There are usually several databases in my design MyISAM Data sheet for , Usually used to store history , And use InnoDB Storage of real-time record information .
for instance ： For example, a logistics information , The current logistics status is stored in the real-time table ： For example, in distribution . This logistics has passed through in history ： The express company is being informed to pick up the goods ,XXX Collected, etc , This record table is basically only for insertion and query , And the loss of an intermediate state does not affect the current result , That's a good fit MyISAM.
sketch MySQL Of MVCC Multi version concurrency control
MVCC Is read committed for transaction isolation level RC And repeatable reading RR, Implementation of optimistic lock . stay LBCC( Lock based concurrency control )RC,RR And serialization are achieved by adding row locks , Gap lock and table lock based on pessimistic lock . And the principle of optimistic lock is at a specific point in time (RC Every time I read ,RR Is the beginning of the transaction ) Generate a current snapshot , Read data read snapshot , Determine whether there is a conflict only at the time of submission , be similar to git Of branch and commit.
MVCC When a new transaction is opened , Add a current transaction to each row of records contained in the transaction ID And rollback pointer . And includes a Read View,Read
View The current active transaction list is saved in , Recent transactions smaller than these lists ID It's visible . This ensures that all transactions read are committed .
MVCC Not only for databases , It is also a very common means of concurrency control . For example, the order state controlled by finite state automata , Query the current status before updating the order status , For example, the current status is that the order has not been submitted , When updated update
XXX set status=' Order submitted ' where
status=' Order not submitted ', If this statement is executed ,status Has changed , This statement failed to execute . In this way, the MVCC, In business logic MVCC Optimistic lock design of thought .
Implementation of distributed lock
There are three kinds of mainstream
1> Database based
1.1> Based on database primary key ： Insert a piece of data , Specify primary key . If there are two inserts, the primary key will conflict , Concurrent execution failed
1.2> Exclusive lock based on Database ： Submit a update affair , If this transaction is not committed , Others are also performed within the lock range update It will block , Solving concurrency problems
2> Based on cache, for example redis Of setNX
3> be based on zookeeper
It is believed that many people choose the second type of distributed lock , Third, although the concurrency is poor , If it had been introduced zk, Without cache , However, the number of distributed lock applications is not so large , To reduce the risk and maintenance cost of introducing new components , It's also possible to choose zk. A lot of people probably don't think they've used database based distributed locks , Actually not used MVCC This is not the case .
in use spring When doing business development , A common scenario is to use spring Configure transactions . The default level is Repeatable
Read Repeatable reading . In this case, if you use LBCC, Add an exclusive lock as soon as you enter the business , such as insert,update,delete perhaps select XXX for
update. And then do something else , For example, a RPC call . In this case, once concurrency occurs , Only one can carry it out smoothly , Everything else will be blocked . In fact, it's equivalent to using distributed locks .
Why B+ Tree as index structure ?
If used Hash surface , Range lookup requires a full table scan ; If binary search tree is used , Because there is no balance , May degenerate into a linked list ; If balanced binary tree is used , The problem of balance is solved by rotation , But the efficiency of rotation operation is too low ; If red black tree is used , The tree is too high ,IO Many times ; If common B tree , Index and data of node data to be saved , How little data can a memory page store , In addition, range finding also needs many times IO;
and B+Tree There are three characteristics ：
1> Non leaf node does not store data, Store index only ( redundancy ), More indexes can be placed
2> Leaf node contains all index fields
3> Leaf node linked with pointer , Improve range query performance
In a distributed scenario , Our business ID Are globally unique strings . If we only consider the business , Use business ID As the primary key of the database is enough . sure DBA It is often required to use an integer self increasing primary key as the database primary key , And this primary key is a waste for business , No business implications .
It's not hard to understand the underlying structure of indexes
1> Integer takes less space than string
2> At the same time, the size comparison is very fast
3> The reason for self increment is to insert new records every time , For leaf nodes ： Records will be added to the subsequent locations of the current inode in order , When a page is full , Will automatically open a new page . If you use a non auto increment primary key , Move data when it needs to be inserted , Even the target page may have been written back to disk and purged from the cache , Read back at this time . Paging causes a lot of fragmentation , The table must be rebuilt and the page must be populated with optimizations .
What is coverage index ?
It can only be obtained on a secondary index tree SQL All required column data , No need to return .
Some persistence layer frameworks like mybatis Of generator Plug in can be generated automatically sql configuration file , These profiles are often inefficient . But many students just graduated will not change this document , For example, it can be used when only individual columns are needed java Of lambda Logical processing of expressions . Results in some performance problems .
When I'm doing a range search based on some criteria , If you just need to return ID Or individual columns , I will change it myself mybatis Of generator Auto generated files , The reason is to try to use the overlay index , Faster than returning to the meter .
Want to verify that override index is used , It can be used explain Implementation plan , see extra field , If only display Using
index Indicates that the overlay index is used correctly . If extra Empty or except using index also filesort Description triggered back to table .
When does the query stop indexing
Three main situations
1> The condition of index walking is not satisfied , Common situations are
1.1> Does not meet the leftmost matching principle
1.2> Query condition uses function
1.3>or Operation has a field with no index
1.4> use like Condition to % start
2> Index efficiency is lower than full table scanning , Common situations are
2.1> Query condition pair null Make a judgment , and null It's worth a lot
2.2> A field has a very small differentiation , Like gender , state
3> The query result set that needs to be returned to the table is too large , Beyond configuration
Index is used to optimize queries , Data is needed to measure the optimization effect . So we need some tools to measure it , Common ones are ：
1> Slow query log
Open slow query log , Can target slow SQL Analyze to see what can be optimized with indexes
show processlist Statement to view the currently executing SQL, If some SQL Slow execution ,block Other SQL, This is a good tool
3>show profile analysis SQL
Using this tool, we can analyze the time consuming stage . First query whether it supports
Words of support , It can be used select @@profiling Check whether it is on , If the result is 0 Description not open . First set @@profiling=1;
It can be used at this time show profiles View each SQL Statement time
show profile for query XXID You can see the specific stage of consumption
4>Trace Analyze optimizer's execution plan
optimizer_trace='enabled=on',end_markers_in_json=on; Can be opened trace analysis , Want to see the specific optimizer execution plan , Just execute
select * from `information_schema`.optimizer_trace that will do
Click to open each step has a very detailed analysis
Knowledge can be used flexibly as long as it is learned through . Pay attention to measuring the effect when using . A common mistake is that developers have no brains MySQL Upper layer cache , To improve efficiency . But caching is only suitable for the situation of more reads and less writes , For example, in the financial trading system , Data read / write ratio 1：1. The data is always found and updated the next moment , In this case, caching will increase the burden and complexity of the system .
At this time , We can first use tools to query the read-write ratio of the database . such as show global status like
'Com_______' this SQL Can view select,update,insert,delete How many times have they been executed .
perhaps show global status like 'Innodb_row_%' In addition to viewing Innodb Reading and writing of , You can also view the status of locks .
Please search online 「58Mysql Military regulations 」 And then think about the theoretical support behind each military regulation .