Redis We usually use it as a cache , Concurrence ; Or for some specific business scenarios , As mentioned earlier redis Usage scenarios of various data types and redis Sentinels and clusters .

It's mainly arranged here redis Used as caching , Some problems , And improvement plan .


The simple process is like this , Generally, please go to the cache first to get , If the cache does not go to the back-end database to query .

1. Cache penetration

Cache penetration refers to , Query a data that does not exist at all , So there is no , You're going to access the storage layer behind you . If there's a lot of such malicious requests coming , All to the storage layer behind . Obviously, our storage layer can't bear the pressure . In this way, the cache loses the significance of protecting the storage behind it .

Solution :

  1. Cache empty objects

For cache penetration , Can cache empty objects , First time in cache and
DB None , Just put an empty object in the cache . But if a lot of malicious requests come , Doing so will result in caching key Sudden increase , Obviously not a good plan .

  2. Bloom filter

For the non-existent data, the bloom filter can generally be filtered out , Don't let requests go back to the back end . When the bloom filter says a value exists , This value may not exist ; however
When it says it doesn't exist , There must be no such thing . The bloom filter is a large set of bits and a few different unbiased hash function . The so-called unbiased is able to hash The value is even . Add to bloon filter
key Time , Will use multiple hash
Function pair key Conduct hash Calculate the index value of an integer and then modulo the length of the bit array to get a position , each hash The function will calculate a different position . Then set all the positions of the digit group to
1 Just It's done add operation .

Ask the bloon filter key If it exists , Follow add equally , Will also bring hash
All the positions of , See if all the positions in the digit group are 1, As long as one bit is 0, So this is the one in the bloon filter key There must be no . But they are all
1, That doesn't mean that key There must be , It's just possible , Because these bits are set to 1 Probably because of something else key Cause of existence .


 guvua Use of baobulong filter , Guide bag

<dependency> <groupId></groupId> <artifactId>guava</artifactId>

Pseudo code :
public void bloomFilterTest() { BloomFilter<CharSequence> bloomFilter =
BloomFilter.create( Funnels.stringFunnel(Charset.forName("UTF-8")), 1000, //
Expected number of data stored 0.001);// Error rate // Add to bloom filter String[] keys = new String[1000]; for (String
key: keys) { bloomFilter.put(key); } String key= "key"; boolean exist =
bloomFilter.mightContain(key);if (!exist) { return; } //todo Cache only if there is }
You can see that there are many hash algorithm

redisson There is also the implementation of the bloon filter .



2. Cache invalidation

Because of the mass
key Simultaneous failure , cause , A large number of requests call to the database at the same time , Cause too much database pressure , Even hang up . When we write to the cache in bulk , Set timeout , It can be a fixed time + Random time to generate , In this way, the failure time can be staggered .

3. Cache avalanche

Cache avalanche means that after the cache layer is hung up , All requests to the database , Database can't hold , Or I might hang up , The corresponding service will be suspended , It also affects the upstream calling service . Such a cascade problem . It's like the beginning of an avalanche , And then it gets bigger and bigger , Cause the whole service to crash .

Solution :

  1. Ensure high availability of cache layer , such as redis Sentinel or redis colony .

  2. Flow restriction between dependent services , Fuse , Demotion, etc. , such as Hystri, Ali sentinel

4. Cache consistency

   After introducing cache , The following problem is when DB When data is updated , The data in the cache will db Inconsistent data . So whether to update the cache first or update the data first DB?

   If you update the cache first , Then update DB fail , Then the next request to read the cache data is not up-to-date . And we're actually going to end up with DB Accurate .

   First update db Updating cache on , This is an update DB The data requested to be read is not up-to-date

   Elimination cache —— To update DB—— Refresh cache , Updating db Yes, there is no data in the cache , Will ask DB, If concurrent It is possible to write multiple requests DB, Then we need to lock it

   Lock up —— Elimination cache —— To update DB—— Refresh cache , So it's relatively safe

5.bigkey problem

Bigkey What's this ? stay redis in , One string maximum 512MB;hash,list,set,zset Can store 2^31 - 1 Element .

Generally speaking, string exceeds 10kb, The number of other elements should not exceed 5000 individual .

have access to src/redis-cli --bigkeys
Check it out. bigkey, I've set up a 30 many K String , Look at the scan results , Swept a string type bigkey,4084 byte .



Bigkey What are the hazards . One is to block other requests when deleting , For example, one. bigkey, Usually nothing , But the expiration time is set , When deletion expires , May block other requests ,4.0 It can be opened later lazyfree-lazy-
yes To delete asynchronously ; Second, network congestion , For example, one. key Data volume reaches 1MB, Assumed concurrency 1000, At this time, acquiring it will produce 1000MB Flow rate , Gigabit Ethernet , The peak rate 128MB/S, It's not that you can't handle concurrency , It will occupy a lot of network bandwidth .

For a great deal
list,set These ones here , We can split the data , Generate a series of key To store data . If it is redis Cluster these key Naturally, it can be divided into different small owners , If it is a single machine , Then you can implement a routing algorithm by yourself , How to get this series key One of .

6. Client use

  1. Avoid multiple services using one redis Example , If there is , You can see how to split the business , Service these public data .

  2. Use connection pool , Control effective connections , It also improves efficiency . Connection pool important parameter settings :

    1 maxActive  Maximum connections in resource pool Default value 8 

    2 maxIdle Resource pool maximum idle allowed Connection number Default value 8 

    3 minIdle Resource pool ensures minimum idle Connection number Default value 0 

    4 blockWhenExhausted When the resource pool is exhausted , Whether the caller has to wait . Only when true Time , Underneath maxWaitMillis It will take effect. , Default value
true Default is recommended

    5 maxWaitMillis When the resource pool connection is exhausted , Maximum waiting time for callers ( In milliseconds ) -1: Indicates never timeout Default is not recommended

    6 testOnBorrow Check the connection validity when borrowing connection from resource pool (ping), Invalid connection will be removed Default value false Suggestions when the business volume is large
Set to false( More than once ping Overhead ).

    7 testOnReturn Check the connection validity when returning the connection to the resource pool (ping), Invalid connection will be removed Default value false Suggestions when the business volume is large
Set to false( More than once ping Overhead ).

    8 jmxEnabled Whether to open jmx Monitor , Can be used for monitoring Default value true Recommended Opening , But the app itself has to be turned on

   The first three parameters are relatively more important , Take it out alone :

     maximum connection maxActive:

       From the concurrency expected by the business , Client execution time ,redis Resource settings ( Application number ( How many instances are deployed in the cluster ) * maxActive <=
maxclients(redis maximum connection ,redis Set in configuration )), Etc .

     For example, the execution time of a client
2ms, So a connected QPS Namely 500, Expected by business QPS yes 3000, So theoretically, the connection pool size 3000/500=60 individual , Actually consider other influences , The general setting is slightly larger than the theoretical value . But it's not the bigger the better , On the one hand, too many connections take up client and server resources , The other side faces      to Redis This high
QPS Server , The blocking of a large command will not help even if the resource pool is set to be larger .

     Maximum number of idle connections maxIdle:

      maxIdle In fact, it is the maximum number of connections required by the business , Free connections are made and put there , Come in, a request can be used directly .maxActive To give the total amount , therefore
maxIdle Do not set too small , Otherwise, when the idle connection is not enough , A new connection will be created , There will be new expenses , The best is maxActive =
      maxIdle. This avoids the performance interference caused by the expansion of the connection pool . But if the concurrency is small or maxActive Set too high , Unnecessary connection resources will be wasted . General recommendation
maxIdle Can be set as per the above business expectation QPS Calculated number of theoretical connections ,maxActive You can zoom in a little bit more .

     Minimum free connections minIdle:

       How many free connections to keep at least , In the process of using the connection , If the number of connections exceeds minIdle, Then continue to establish the connection , If it exceeds
maxIdle, When more than one connection completes the service, it will be moved out of the connection pool and released .

  3. Cache preheating

     For example, launch a rush to buy activity , There must be a lot of requests from the beginning , At this time, the data can be preheated in advance , You can initialize the connection pool , You can also put the data in place .

©2019-2020 Toolsou All rights reserved,
java Comparing attribute values between two objects utilize Python handle Excel data ——xlrd,xlwt library Bidirectional linked list Why? Python Not a future oriented programming language ?Python【 Assignment statement 】 Special lecture , You can't just a=b ah ! Suggest mastering ! utilize Python handle Excel data ——pandas library see SQL-SERVER Data volume and occupied space of database and each table PID The algorithm finally figured out the principle , It was so simple web Two front-end practical games ( Attached source code ) Beginners learn Python Be sure to know what his basic algorithms are ? What is the function ?