ES Process of client reading data
client -> shard -> filesystem cache -> Disk file
Optimization of query performance in massive data retrieval
If memory is large enough , filesystem cache Will cache , If the inquiry goes filesystem cache
The speed is in milliseconds , If the query requests to go to disk file , The minimum query time is in seconds .
If the index data file on the entire disk 3 On machines , All in all 1T Disk capacity of ,ES The amount of data is 1T, The amount of data per machine is 300G.ES Performance best case , Your machine memory can hold at least half of the total data .
Production environment test , Best use ES Store a small amount of data , The indexes used to search , Memory left filesystem cache ,
100G. Data volume controlled at 100G within , Almost all the queried data is searched in memory , Very high performance , Almost all search results 1 You can get results in seconds .
There's another thing to note , It's in ES The real record fields stored in should be the fields you need to query , You should not place all fields in the entire record in the ES in , If all fields are placed ES in , It will lead to your machine filesystem
chche Takes up a lot of space , A lot of records actually need hard disk files for query , This results in poor query performance .
The background system automatically searches for the hot data , Load data to filesystem cache in , When the client queries , Directly from filesystem cache Found in , Very high performance .
Cold and hot separation
be similar to MySQL Separation of heat and cold , Put a large number of infrequently accessed data into a single index . Put queries frequently into an index , Improve query performance .
When writing index , Write the associated data directly , Don't search join, because ES Complex queries in are very performance consuming .
Distributed , check 100 Page 10 Data , Must be from each shard, We're all looking up a batch of data , Then take it and page it in memory , The deeper the page goes , Poor basic query performance .
Optimization strategy ：
1. Deep paging is not allowed
2. Similar to pull-down paging , have access to scroll api
Query . Its pagination principle , Snapshot will be generated at one time , Then scroll down through the cursor one at a time , No matter how many pages , Performance is in milliseconds ,scroll
Smart page by page , Naturally suitable for Weibo , When you pull down .