ES Process of client reading data


client -> shard -> filesystem cache -> Disk file

Optimization of query performance in massive data retrieval


If memory is large enough , filesystem cache Will cache , If the inquiry goes filesystem cache
The speed is in milliseconds , If the query requests to go to disk file , The minimum query time is in seconds .

If the index data file on the entire disk 3 On machines , All in all 1T Disk capacity of ,ES The amount of data is 1T, The amount of data per machine is 300G.ES Performance best case , Your machine memory can hold at least half of the total data .

Production environment test , Best use ES Store a small amount of data , The indexes used to search , Memory left filesystem cache ,
100G. Data volume controlled at 100G within , Almost all the queried data is searched in memory , Very high performance , Almost all search results 1 You can get results in seconds .

There's another thing to note , It's in ES The real record fields stored in should be the fields you need to query , You should not place all fields in the entire record in the ES in , If all fields are placed ES in , It will lead to your machine filesystem
chche Takes up a lot of space , A lot of records actually need hard disk files for query , This results in poor query performance .

Data preheating

The background system automatically searches for the hot data , Load data to filesystem cache in , When the client queries , Directly from filesystem cache Found in , Very high performance .

Cold and hot separation


be similar to MySQL Separation of heat and cold , Put a large number of infrequently accessed data into a single index . Put queries frequently into an index , Improve query performance .

model design

When writing index , Write the associated data directly , Don't search join, because ES Complex queries in are very performance consuming .

Paging query

Distributed , check 100 Page 10 Data , Must be from each shard, We're all looking up a batch of data , Then take it and page it in memory , The deeper the page goes , Poor basic query performance .

Optimization strategy :

1. Deep paging is not allowed

2. Similar to pull-down paging , have access to scroll api
Query . Its pagination principle , Snapshot will be generated at one time , Then scroll down through the cursor one at a time , No matter how many pages , Performance is in milliseconds ,scroll
Smart page by page , Naturally suitable for Weibo , When you pull down .

©2019-2020 Toolsou All rights reserved,
Gude Haowen serial - You deserve to be an engineer ( Preface ) A single key controls multiple water lamp states Bitcoin in ten years ,VDS Opportunity or fraud CSS architecture design Programmer Tanabata Valentine's Day confession code Python+OpenCV Detailed explanation of face recognition technology Bug Can data be used as the basis of technical personnel assessment KPI?Jsp+Ajax+Servlet+Mysql Add, delete, modify and query ( one ) Thorough explanation from Zhongtai Unity Scene loading asynchronously ( Implementation of loading interface )