1. Rational use of index
Index is an important data structure in database , Its basic purpose is to improve the query efficiency . Now most database products adopt IBM First proposed ISAM index structure .
Use the index just right , Its use principle is as follows ：
* Connect frequently , However, there is no index on the column specified as a foreign key , Fields that are not frequently joined are automatically indexed by the optimizer .
* To sort or group frequently （ Immediately group by or order by operation ） Index on the column of .
In the conditional expression often used in the different values of the column to establish a search , Do not index columns with fewer values . For example, in the employee table “ Gender ” There are only “ male ” And “ female ” Two different values , So there's no need to index . If the index is built, it will not improve the query efficiency , It will seriously slow down the update speed .
* If there are more than one column to sort , Composite indexes can be built on these columns （compound index）.
* Using system tools . as Informix The database has one tbcheck tool , You can check on suspicious indexes . On some database servers , The index may be invalid or due to frequent operations
It reduces the reading efficiency , If a query using an index slows down , You can try it tbcheck The tool checks the integrity of the index , Repair if necessary . in addition , When a large number of database tables are updated
After data , Deleting and rebuilding indexes can improve query speed .
(1) In the next two select Statement :
SELECT * FROM table1 WHERE field1<=10000 AND field1>=0;
SELECT * FROM table1 WHERE field1>=0 AND field1<=10000;
If the data in the data table field1 all >=0, The first one select The statement is better than the second one select The sentence efficiency is much higher , Because of the second one select The first condition of the statement consumes a lot of system resources .
First principle ： stay where The most restrictive condition should be put first in the clause .
(2) Below select Statement :
SELECT * FROM tab WHERE a=… AND b=… AND c=…;
If there is an index index(a,b,c), be where The order of the fields in the clause should be consistent with the order of the fields in the index .
The second principle ：where The order of the fields in the clause should be consistent with the order of the fields in the index .
The following assumptions are field1 There is a unique index on I1, stay field2 Non unique index on I2.
(3) SELECT field3,field4 FROM tb WHERE field1='sdf' fast
SELECT * FROM tb WHERE field1='sdf' slow [/cci]
Because the latter is one step more after index scanning ROWID Table access .
(4) SELECT field3,field4 FROM tb WHERE field1>='sdf' fast
SELECT field3,field4 FROM tb WHERE field1>'sdf' slow
Because the former can quickly locate the index .
(5) SELECT field3,field4 FROM tb WHERE field2 LIKE 'R%' fast
SELECT field3,field4 FROM tb WHERE field2 LIKE '%R' slow ,
Because the latter does not use indexes .
(6) Use functions such as ：
SELECT field3,field4 FROM tb WHERE upper(field2)='RMN' Do not use index .
If a table has 20000 records , Functions are not recommended ; If a table has more than 50000 records , Functions are strictly prohibited ! There is no limit to 20000 records .
(7) Null values are not stored in the index , therefore
SELECT field3,field4 FROM tb WHERE field2 IS[NOT] NULL Do not use index .
(8) Inequalities such as
SELECT field3,field4 FROM tb WHERE field2!='TOM' Do not use index .
SELECT field3,field4 FROM tb WHERE field2 NOT IN('M','P') Do not use index .
(9) Multi column index , Only if the first column of the index in the query is used for a condition , The index can only be used .
(10) MAX,MIN Equifunction , Use index .
SELECT max(field2) FROM tb therefore , If the max,min,sum etc. , It should be quoted .
Use only one aggregate function at a time , as ：
SELECT “min”=min(field1), “max”=max(field1) FROM tb
Not as good as ：SELECT “min”=(SELECT min(field1) FROM tb) , “max”=(SELECT max(field1) FROM
(11) Indexes with too many duplicate values are not used by the query optimizer . And because it's indexed , When you change the value of the field, you also change the index , So updating the field is slower than not having an index .
(12) The index value is too large （ As in a char(40) Index on the field of ）, Will cause a lot of I/O expenses （ It will even exceed the table scan I/O expenses ）. therefore , Try to use integer index .
Sp_estspace Table and index overhead can be calculated .
(13) For multi column indexes ,ORDER BY Must be in the same order as the fields in the index .
(14) stay sybase in , If ORDER BY The fields of a cluster form a cluster index , Then there's no need to do it ORDER BY. The order of records is consistent with the cluster index .
(15) Multi table join （ The specific query scheme needs to be tested ）
where Use the associated fields as much as possible for the qualification conditions in the clause , And try to put the associated fields first .
SELECT a.field1,b.field2 FROM a,b WHERE a.field3=b.field3
* field3 If there is no index on the :
Yes a Perform a full table scan , Sorting results
Yes b Perform a full table scan , Sorting results
Results consolidation .
It is suitable for very small tables or large tables .
* field3 Index on
According to the order of table join ,b Is the driver table ,a Is the driven table
Yes b Perform a full table scan
Yes a Scan index range
If it matches , adopt a Of rowid visit
(16) Avoid one to many join. as ：
SELECT tb1.field3,tb1.field4,tb2.field2 FROM tb1,tb2 WHERE
tb1.field2=tb2.field2 AND tb1.field2=‘BU1032’ AND tb2.field2= ‘aaa’
Not as good as ：
declare @a varchar(80)
SELECT @a=field2 FROM tb2 WHERE field2=‘aaa’
SELECT tb1.field3,tb1.field4,@a FROM tb1 WHERE field2= ‘aaa’
use exists/not exists replace in/not in operation
SELECT a.field1 FROM a WHERE a.field2 IN(SELECT b.field1 FROM b WHERE
SELECT a.field1 FROM a WHERE EXISTS( SELECT 1 FROM b WHERE a.field2=b.field1
SELECT field1 FROM a WHERE field1 NOT IN( SELECT field2 FROM b)
SELECT field1 FROM a WHERE NOT EXISTS( SELECT 1 FROM b WHERE b.field2=a.field1)
(17) main , Foreign keys are mainly used for data constraints ,sybase The index is automatically created when the primary key is created in , Foreign keys have nothing to do with indexes , To improve performance, index must be rebuilt .
(18) char The field of type is not indexed int It's worse not to index fields of type . The performance of the index is only slightly worse .
(19) use count(*) Don't use it count(column_name), Avoid use count(DISTINCT column_name).
(20) Try not to use field names to the right of the equal sign , as ：
SELECT * FROM tb WHERE field1 = field3
(21) Avoid use or condition , because or Do not use index .
2. Avoid use order by and group by Words and sentences .
Because the use of these two clauses will take up a lot of temporary space (tempspace), If you have to use , Available views , The method of generating temporary table manually .
If you have to use , Check first memory,tempdb Size of .
Test certificate , In particular, avoid using both in a query join Use it again group by, It's going to be very slow !
3. Try to use less subqueries , Especially the related subquery . Because this will lead to a decline in efficiency .
The label of a column is in both the main query and the where Appears in the query in clause , It is likely that when the column values in the main query change , Subquery must be queried again . More levels of query nesting , The lower the efficiency , Therefore, subqueries should be avoided as much as possible . If subqueries are unavoidable , Filter out as many rows as possible in the subquery .
4． Eliminating sequential access to large table row data
stay Nested query , Sequential access to tables can have a fatal impact on query efficiency .
For example, sequential access strategy is adopted , A nest 3 Layer query , If every layer queries 1000 That's ok , Then this query is to query 10 Billion data .
The main way to avoid this is to index the joined columns .
for example , Two tables ： Student list （ Student number , full name , Age ……） And course schedule （ Student number , Course number , achievement ）. If two Tables should be connected , It's going to be in “ Student number ” Index this join field .
Union can also be used to avoid sequential access . Although there are indexes on all the check columns , But some forms of where Clause forces the optimizer to use sequential access .
The following query forces the orders Table execution sequence operation ：
SELECT ＊ FROM orders WHERE (customer_num=104 AND order_num>1001) OR
Although in the customer_num and order_num Index built on , However, in the above statement, the optimizer still uses the sequential access path to scan the entire table . Because this statement is retrieving a collection of detached rows , So it should be changed to the following statement ：
SELECT ＊ FROM orders WHERE customer_num=104 AND order_num>1001
SELECT ＊ FROM orders WHERE order_num=1008
This allows the query to be processed using the index path .
5． Avoiding difficult normal expressions
MATCHES and LIKE Keywords support wildcard matching , Technically, it's called a normal expression . But this matching is particularly time-consuming . for example ：SELECT ＊ FROM customer WHERE
zipcode LIKE “98_ _ _”
Even in zipcode Field is indexed , In this case, sequential scanning is also used . If you change the sentence to SELECT ＊ FROM customer WHERE
zipcode >“98000”, The index is used to query when the query is executed , Obviously, it will greatly increase the speed .
in addition , Also avoid non starting substrings . For example, statements ：SELECT ＊ FROM customer WHERE zipcode[2,3] >“80”
, stay where A non starting substring is used in clause , Therefore, the statement does not use an index .
6． Using temporary tables to speed up queries
Sort a subset of the table and create a temporary table , Sometimes it can speed up the query . It helps to avoid multiple sort operations , And there are other ways to simplify the work of the optimizer . for example ：
SELECT cust.name,rcvbles.balance,……other COLUMNS
WHERE cust.customer_id = rcvlbes.customer_id
ORDER BY cust.name
If the query is to be executed more than once, but not more than once , You can find all outstanding customers and put them in a temporary file , And sort by customer's name ：
SELECT cust.name,rcvbles.balance,……other COLUMNS
WHERE cust.customer_id = rcvlbes.customer_id
ORDER BY cust.name
INTO TEMP cust_with_balance
Then query in the temporary table in the following way ：
SELECT ＊ FROM cust_with_balance
There are fewer rows in the temporary table than in the main table , And the physical order is the order required , Reduced disk I/O, Therefore, the query workload can be greatly reduced .
be careful ： Changes to the main table will not be reflected after the temporary table is created . In the case of frequent data changes in the main table , Be careful not to lose data .
7． Replacing non sequential access with sorting
Non sequential disk access is the slowest operation , This is reflected in the back and forth movement of the disk access arm .SQL The statement hides this situation , This makes it easy to write queries that require access to a large number of non sequential pages when writing applications .