The company assigned me a job , Let me give it Kong Gateway makes a site to get settings .Kong Gateway boasts tens of thousands QPS The artifact of , I'm a little flustered , If I'm encumbered by my site Kong I am a sinner .
coordination Kong Must be tested for performance , A very interesting phenomenon was found in the performance test , If I use 25 Threads pressing my site , So it turns out .
If I use 50 Thread decompression station , It turns out like this
The phenomenon is , I increased the number of concurrent , my QPS In fact, nothing has changed , But my single average response time gap doubled . In fact, this phenomenon is better to explain . first , Let's get to know ,IIS Approximate processing logic of .
actually IIS So many things have been maintained , The first is the queue , Used to increase the number of simultaneous requests processed by the server . So to speak , Let's say I have a very primitive program , I can only process one request at a time , that , I am in the process of processing a request , The second request is coming , So I obviously shouldn't have told him at this time , I'm busy right now , No time to talk to him , It's about telling him , You wait , I'll deal with you right away . Let him wait , It's like putting him in the line , We'll deal with it later .
Another concept is called , Simultaneous processing number . My assumption just now is that I can only process one piece of data at a time . But I have multiple cores , Even if a core can only process one piece of data at a time point , that , Now my machine is 4 Nuclear , So I should be able to deal with it at least 4 Data , Let's say it's a one-off 4 Data , So this 4 Data can be considered to be processed at the same time , But if it comes at the same time 8 Data , So it's 4 Processing ,4 Waiting .
Now let's explain , Why does it happen 50 Concurrency ratio 25 Concurrent , Improved waiting time , however QPS No improvement . I think it can be explained in this way , actually QPS stay 25 Concurrency is close to the limit , How to calculate this limit , Maybe it should be 1 second
* Simultaneous processing number /
Actual processing time per request . It can be seen that this limit is not directly related to the number of concurrent clients . that 50 Concurrent , Why is the waiting time longer ? That's because , When the client concurrent number is greater than the server concurrent number , There are a fixed number of requests in the request queue , He has to wait for the part that has entered the processing logic to finish processing , And then deal with yourself , That's why QPS No improvement but longer response time .
Because it's such a multiple stacking pattern , therefore , occasionally , You will find , Your interface , If it's just a few milliseconds , Everyone is quick . But once you slow down , Response time is exponential growth . The reason is simple , There are mainly the following .
* The waiting queue is long （ Because the front is slow , So more and more people are waiting ）
* Waiting for a single side long （ But it's not just you who grow , And the others you're waiting for ）
When we synthesize these cases, it's not multiplication , That's not exponential growth .
put questions to , So how much concurrency , Is the ideal state ?
When I was thinking about it , May take it for granted , This thing , It should be with CPU The number of cores is related , It should be as much as the kernel number is the optimal solution . But reality often slaps . Measured , In general, it is better than CPU How many cores are there CPU Not tired , A state of high efficiency . So why does this happen ? I think it's a big problem , We need to take it apart .
1, Is a core really the fastest thread to execute ?
It's a question , It's also true , It's not right either . He looked at each other because , In fact, if there are multiple threads , When switching between so many threads , In fact, it also consumes resources . But what's the point of multithreading ? I think this problem can be divided into two parts . Two concepts are introduced before disassembling . Computing intensive ,IO Intensive , Computing intensive is when you're doing computing , Add, subtract, multiply and divide , It's a comparison , Encryption and decryption , This mainly depends on CPU It's called compute intensive thread . If your threads spend most of their time reading network data , Read local data , Or the driver hardware waits to return, which is called IO Intensive .
1.1 Is a core really the fastest computing intensive ?
yes . Because threads themselves consume resources , Frequent switching is not good for compute intensive threads , Because the amount of computation is not less, but more .
1.2 A core is really a IO Is intensive thread the fastest ?
incorrect . Multiple IO Intensive threads are definitely better than one IO Fast dense threads , Because most of the time , Actually CPU It doesn't matter. ,CPU Most of the time it's just waiting . So let's CPU Process multiple at one time , On the contrary, it has more advantages .
2, Why is it not the optimal solution when the concurrent amount is equal to the simultaneous processing number .
Real business scenarios , A thread is not pure IO Or calculation , It's more of a case of both . So for this kind of thread . It's not the fastest , Because it exists IO Situation of . They're going to have to deal with a few more to make a deal .
This makes the server wait too long for client requests , If the number of concurrent processes is equal to the number of processes , So for a concurrency , It's equivalent to a client making a request , Send network data , Server processing , Send network data , Client receives network data , Then proceed to the next round . In this way, the client and the server are in the same thread , Single threaded work , And there's a lot of waiting time , So the server's QPS Not coming up .
The ideal state should be , Status of
* Data always exists in the waiting queue （ Do not let processing threads wait for client requests ）
* The client's request is processed immediately after it enters the waiting queue （ The response time will not be too long due to other requests , The waiting queue for the next step is too long ）
According to the conclusion of multithreading summarized above , In general, a core must handle multiple threads , And there's not much data in the waiting queue .
So the best concurrency conclusion should be , Number of cores * N（ Number of single core simultaneous processing threads ） + M（ Waiting for a few requests in the queue ）.
Digression ： Why? Golang Claimed to be able to make better use of synergism CPU To achieve higher efficiency ?
I guess it should be IO Part of the performance of multithreading switching in type B threads is saved , For more CPU Computing to improve overall performance .