Click blue above “ Code ape technology column ”, choice “ Set as star ”

One , theoretical analysis

Two , practical application

To speed up program processing , We will decompose the problem into several concurrent tasks . And create a thread pool , Delegate tasks to threads in the thread pool , So that they can execute concurrently . Using thread pool under high concurrency , It can effectively reduce the time and resource cost of thread creation and release , If thread pool is not used , It may cause the system to create a large number of threads and consume the system memory “ Over switching ”( stay JVM The processing mechanism used in is time slice rotation , Reduces thread to thread switching )

But we have a big problem , That is, we want to create as many tasks as possible , But because of the limited resources, we can't create too many threads . In the case of high concurrency , How can we choose the optimal number of threads ? What is the principle of choice ?

One , theoretical analysis

How to calculate the number of concurrent threads , There are two ways of saying it .

The first ,《Java Concurrency in Practice》 Namely 《java Concurrent programming practice 》8.2 section 170 page

For computing intensive tasks , One has Ncpu A system of processors usually uses a Ncpu +
1 Thread pool of threads for optimal utilization ( Compute intensive threads are suspended at exactly the right time due to a page error or for some other reason , There's just one “ additional ” Thread of , Can ensure that in this case CPU Cycle does not interrupt work ).

For the
I/O And other blocking tasks , Not all threads are scheduled at all times , So you need a bigger pool . To set the length of the thread pool correctly , You have to estimate the ratio of the time the task spends waiting to the time it takes to calculate ; This estimate doesn't have to be very accurate , And it can be obtained through some monitoring tools . You can also choose another way to adjust the size of the thread pool , Under a reference load , use
Several thread pools of different sizes run your application , And observe CPU Utilization level .

Given the following definitions :
Ncpu = CPU Number of Ucpu =  target CPU Usage rate of , 0 <= Ucpu <= 1 W/C =  Ratio of waiting time to calculation time
In order to maintain the expected utilization rate of the processor , The optimal pool size is equal to : Nthreads = Ncpu x Ucpu x (1 + W/C)
You can use Runtime To get CPU Number of :
int N_CPUS = Runtime.getRuntime().availableProcessors();

of course ,CPU Cycles are not the only resources you can use for thread pool management . Other resources that can constrain resource pool size include : Memory , File handle , Socket handle and database connection, etc . Calculating the size constraints for these types of resource pools is very simple : First, the total number of resources required for each task is accumulated , Then divide by the total amount available . The result is the maximum pool size .

When tasks need to use pooled resources , For example, database connection , The length of thread pool and resource pool will affect each other . If each task requires a database connection , The size of the connection pool limits the effective size of the thread pool ; Similarly , When the task in the thread pool is the only consumer of the connection pool , The size of thread pool will limit the effective size of connection pool .

As above , stay 《Java Concurrency in Practice》 In a Book , The formula for estimating the size of thread pool is given :
Nthreads = Ncpu x Ucpu x (1 + W/C), among Ncpu = CPU Number of cores Ucpu = CPU Utilization rate ,0~1
W/C =  Ratio of waiting time to calculation time
The second ,《Programming Concurrency on the JVM Mastering》 Namely 《Java Concurrent programming of virtual machine 》2.1 section 12 page

To solve the above problems , We want to create at least as many threads as there are processor cores . This ensures that as many processor cores as possible can be devoted to problem solving . Through the following code , We can easily get the number of processor cores available in the system :

therefore , The minimum number of threads for an application should be equal to the number of processor cores available . If all tasks are computationally intensive , So many threads can be created . under these circumstances , Creating more threads is not good for program performance . Because when more than one Qianwu is ready , Processor core needs frequent context switching between processes , And this kind of switch has a great loss on program performance . But if the tasks are all IO Intensive , Then we need to open more threads to improve performance .

When a task is executed IO During operation , Its thread will be blocked , The processor can then immediately context switch to handle other ready threads . If we only have as many threads as the number of cores available to the processor , Even the tasks to be performed cannot be handled , Because we can't get more threads for the processor to schedule .

If the task has 50% Time of is blocked , The number of threads required by the program is twice the number of cores available to the processor . If the task is blocked for less than 50%, That is, these tasks are computationally intensive , The number of threads required by the program will be reduced , But at least not less than the number of cores of the processor . If the task is blocked longer than the execution time , That is, the task is IO Intensive , We need to create threads several times larger than the number of processor cores . We can calculate the total number of threads needed by the program , The summary is as follows :

Number of threads = CPU Number of cores available /(1 - Blocking coefficient ), The value of blocking coefficient is 0 and 1 between .

The blocking coefficient for computing intensive tasks is 0, and IO The blocking coefficient of intensive tasks is close to 1. A completely blocked task is doomed to die , So we don't have to worry about the blocking coefficient reaching 1.

To better determine the number of threads required by the program , We need to know the following two key parameters :

Number of cores available for processor ;

Blocking factor of task ;

The first parameter is easy to determine , We can even find this value at runtime using the previous method . But it's a little more difficult to determine the blocking coefficient . We can try to guess first , Or use some performance analysis tools or
API To determine the threads spent on the system IO Operation time and CPU Ratio of time spent on intensive tasks . As above , stay 《Programming Concurrency on the JVM
Mastering》 In a Book , The formula for estimating the size of thread pool is given :

Number of threads = Ncpu /(1 - Blocking coefficient )

For statement 1 , hypothesis CPU 100% work , I.e. leave it alone CPU Utilization factor , Number of threads = Ncpu x (1 + W/C).

Now let's assume that the formula of method 2 is equal to that of method 1 , Namely Ncpu /(1 - Blocking coefficient )= Ncpu x (1 + W/C), Derivation : Blocking coefficient = W / (W +
C), Blocking coefficient = Blocking time /( Blocking time + computing time ), This conclusion is confirmed in the follow-up of method 2 , as follows :

Because of the Web Service requests spend most of their time waiting for the server to respond , So the blocking coefficient will be quite high , Therefore, the number of threads a program needs to open may be several times the number of processor cores . Suppose the blocking coefficient is 0.9, Every task 90% Time is blocked and only 10% I'm working , On a dual core processor, we need to turn on the 20 Threads ( Use the 2.1 Formula calculation of section ). If there are a lot of stocks to deal with , We can 8 On the core processor 80 Threads to process the task .

thus it can be seen , Saying one and saying two are actually a formula .

Two , practical application

How to set the number of concurrent threads in actual use ? Let's look at a topic first :

Suppose a systematic TPS(Transaction Per Second perhaps Task Per
Second) At least 20, Then suppose that each Transaction Done by one thread , Continue to assume that on average, each thread processes one Transaction At 4s. So the problem turns into :

How to design thread pool size , Make it possible to 1s Internal treatment completed 20 individual Transaction?

The calculation process is very simple , The processing power of each thread is 0.25TPS, So to achieve 20TPS, Obviously 20/0.25=80 Threads .

This theory holds , But in reality , The fastest part of a system is CPU, So the upper limit of system throughput is determined by CPU. enhance CPU processing capacity , It can increase the upper limit of system throughput . In consideration, we need to CPU Throughput plus .

The analysis is as follows ( Let's take formula one as an example ):

Nthreads = Ncpu x (1 + W/C)

That is, the higher the percentage of thread waiting time , More threads needed . thread CPU The higher the proportion of time , Fewer threads required . This can be divided into two types of tasks :

IO Intensive Normally , If present IO, So sure W/C >
1( The blocking time is usually many times the computing time ), But we need to consider the limited memory of the system ( Every thread needs memory space ), We need to test how many threads are suitable on the server (CPU Proportion , Number of threads , Total time , Memory consumption ). If you don't want to test , Conservative point 1 that will do ,Nthreads
= Ncpu x (1 + 1) = 2Ncpu. This setting is generally OK.

Computing intensive Assuming no waiting W = 0, be W/C = 0.Nthreads = Ncpu.

According to the short plate effect , Real system throughput cannot be based on CPU To calculate . That will increase system throughput , You need to “ System short board ”( Like network latency ,IO) to start :

Try to improve the parallelism ratio of short board operation , For example, multi thread download technology ;

Enhance the short board ability , For example NIO replace IO;

First, you can contact Amdahl Law , This law defines the formula for calculating the acceleration ratio of a serial system after parallelization : Acceleration ratio = System time before optimization / System time after optimization
The greater the acceleration ratio , It shows that the optimization effect of system parallelization is better .Addahl The law also gives the parallelism of the system ,CPU Relationship between number and acceleration ratio , The acceleration ratio is Speedup, System serialization ratio ( The ratio of serial execution codes ) by F,CPU The number is N:Speedup
<= 1 / (F + (1-F)/N)

When N When big enough , Serialization ratio F The smaller , Acceleration ratio Speedup The bigger .

At this time, the question of whether thread pool is more efficient than thread pool is raised ?

The answer is no , such as Redis It's single threaded , But it's very efficient , Basic operations can reach 100000 /s. From the perspective of threads , Part of the reason is :

Multithreading brings thread context switching overhead , There is no such overhead for a single thread ;

lock ;

of course “Redis Soon ” The more essential reason is :

Redis It's basically memory operations , In this case, single thread can be used efficiently CPU. The multithreading scenario is generally : There is a considerable proportion of IO And network operations .

in general , Different applications , Take multithreading / Different single thread strategies ; In the case of thread pool , Different estimates , The purpose and the starting point are the same .

So far, the conclusion is :

IO Intensive = 2Ncpu( You can control the size after testing ,2Ncpu Generally no problem )( Common in threads : Database data interaction , File upload and download , Network data transmission, etc )

Computing intensive = Ncpu( Common in threads : Complex algorithm )

Of course, there is another saying in the first one :

For computing intensive tasks , One has Ncpu A system of processors usually uses a Ncpu +
1 Thread pool of threads for optimal utilization ( Compute intensive threads are suspended at exactly the right time due to a page error or for some other reason , There's just one “ additional ” Thread of , Can ensure that in this case CPU Cycle does not interrupt work ).

Namely , Computing intensive = Ncpu + 1, But this led to one more CPU Is context switching worth it , Not considered here . Readers can consider it for themselves .

There are hot recommendations ???? Previous recommendation One HTTP The tortuous experience of the request :Redis Distributed lock , Is it really perfect ?Nginx
Why is it so fast that it can't stop ? interview :HashMap Life taking 21 questions !JAVA Online troubleshooting complete set ! Cattle breaking ! Introduction to one article jvm virtual machine If the article helps , Looking , Forward it .
Thank you for your support (*^__^*)

©2019-2020 Toolsou All rights reserved,
Solve in servlet The Chinese output in is a question mark C String function and character function in language MySQL management 35 A small coup optimization Java performance —— Concise article Seven sorting algorithms (java code ) use Ansible Batch deployment SSH Password free login to remote host according to excel generate create Build table SQL sentence Spring Source code series ( sixteen )Spring merge BeanDefinition Principle of Virtual machine installation Linux course What are the common exception classes ?