hello hello everyone , Long time no see. ! Today's 《 Database system of teaching and learning 》 To learn query optimization in database system
. Teaching and learning database , I haven't seen such a cool title ?“ Language startle ”, you 're right , The title is so cool .

My sister, xiaoburi 18 year , The existence of goddess in Campus , Excellent performance, versatile sports , Gentle personality, honest and kind . however , Only I know , The bright burying in the eyes of everyone , It used to be a hamster Cape , Rolling all over the place , Except for eating, sleeping and playing . And the transformation of all this , From that night .

Since then , Xiaoburi often asked me to help her with her lessons . Today she wants to know about query optimization in database system . This tutorial talks about query optimization through my dialogue with xiaoburi .

<> Logical query optimization

<> Algebraic expression of equivalence relation

If two relational algebra expressions have the same result on any database instance , Then these two relational algebraic expressions are equivalent (equivalent)

<> Equivalent transformation rule

<> The equivalent transformation rules about selection

,

<> Select push down

Relation algebra expression tree (expression tree) Select operation in push down (push down) Generally, it can improve the efficiency of query execution

*
Select push down to filter out tuples irrelevant to the result as early as possible

Selectivity (selectivity) Highest selection first

*
Selectivity : Proportion of tuples satisfying the condition

*
Filter out more result independent tuples as early as possible

Decompose complex selection conditions , Then select and push down

<> Override of selection conditions

Rewriting inefficient selection conditions before rewriting : X=Y AND X=3

After rewriting :X=3 AND X=3

Remove unnecessary selection conditions

Before rewriting : SELECT * FROM R WHERE 1=1; After rewriting :SELECT * FROM R;

Check for selection conditions that cannot be met
SELECT * FROM R WHERE 1=0;
Merge selection criteria

Before rewriting :
SELECT*FROMR WHEREABETWEEN1AND100 ORABETWEEN50AND150;
After rewriting :
SELECT*FROMR WHEREABETWEEN1AND150;
When the selection operation can be pushed down to multiple branches of the expression tree , You need to consider which branch to push down

* Index will affect the scheme selected to be pushed down

<> On the equivalent transformation rules of projection

<> Projection pushdown

Push down the projection operation in the expression tree of relational algebra (pushdown) Generally, query execution efficiency can be improved

* Projection pushdown can reduce the size of tuples
In some cases , The projection operation will waste the opportunity of query optimization. There is no index on the result of projection operation

* Hypothetical relationship Student Built on property Sname Secondary index on
* ΠSno,Sname(Student) Make the index unavailable for subsequent selection and connection operations

<> query rewrite

Remove unnecessary projections

Remove unnecessary connections

Suppose all the students have taken courses

<> Cost estimation of logical query plan

<> Measure of query plan cost

The cost of a query plan is measured by the number of tuples of intermediate results generated during the execution of the query plan .

<> Result size estimation of operation

DBMS According to the size of the input relation of the relation algebra operation ( Number of tuples ) To estimate the size of the operation result ( Number of tuples )

*
accuracy

*
Easy to calculate

*
Logical consistency (logicallyconsistent)

Logical consistency

*
Monotonicity : The larger the input of an operation , The larger the estimate of the size of the result of the operation

*
Order independent : When performing the same operation satisfying the exchange law and the Union Law on multiple relationships ( as on,×,∪,∩), The estimated size of the final result is independent of the order in which the operations are performed

<> Statistics required to estimate the size of operation results

System catalog (systemcatalog) Some statistics related to the size of estimated operation results are recorded in

*
T( R ): relationship R Tuples of

*
V(R,A): relationship R Property set for A Number of different values of

*
Assumption of uniform distribution of attribute values , The values of each attribute are uniformly distributed

*
Attribute independence hypothesis , All attributes of a relationship are independent of each other

Collect statistics manually
DB2:RUNSTATS
Estimation of the result size of Cartesian product
T(R×S)=T(R)T(S)
Estimation of projection result size
T(ΠL(R))=T(R)( Projection without de duplication ) T(ΠL(R))=V(R,L)( Projection with de duplication )
Select an estimate of the result size

Is the latter two different

<> Estimation of the result size of two natural connections

Consider two relationships R and S Natural connection of R on S

* Property value contains assumptions
For connection properties K, If V(R,K) ≤ V(S,K), be R Of K Property value set is S Of K Subset of attribute value set

* Property value retention assumptions
about R Any non connection property in A, Yes V(R on S,A) = V(R,A)

situation 2: R and S Yes 2 Connection properties X and Y

Natural connections do not have the same properties , Degenerate to Cartesian connection

<> Logical consistency of size estimation of natural connection results

Two way connection in different order (2-wayjoin), Still going 1 Secondary multiplexing (multi-wayjoin), The estimated values of the results obtained by the above methods are the same

<> Estimation of the result size of set operation

<> Estimation of the size of de duplication results

<> histogram

<> equi-width histogram

* The width of each interval of attribute value is the same
* The occurrence times of attribute values in each interval are different

<> Contour histogram

*
The width of each interval of attribute value is different

*
The occurrence times of attribute values in each interval are basically the same

<> Estimation of connection result size based on histogram

Use histogram : T(R on S)=10×5/10+5×20/10=15

Do not use histogram : T(R on S)=245×245/100=600

<> Heuristic optimization method of logical query plan

If some equivalent transformation can reduce the cost of query plan , Then perform the transformation on the query plan

<> Optimization of connection order

Although the connection operation satisfies the switching law , However, the input relationship of join operation in query plan has different functions

*
stay R on S in ,R It's a left-wing relationship (left relation),S It's right (right relation)

*
If a single connection is used (one-pass join), Then the left relation is to build the relation (build relation), Right relation is probe relation (proberelation)

*
If nested loop connection is used (nested-loop join), Then the left relation is the external relation (outer relation), Right relation is internal relation (inner relation)

*
If using index connection (index-based join), Then the right relation is indexed (indexed relation)

<> Connection tree (JoinTrees)

The order in which the join operations on a set of relationships are performed can be represented by the join tree (join tree) To represent

Left deep connection tree (left-deep join tree): Only one relationship is left (left relation), Other relationships are right (right relation)

Right deep junction tree (right-deep join tree): Only one relationship is right , Other relationships are left-wing (left relation)

Dense tree (bushy tree): Connection trees other than left and right deep connection trees

<> Why the query plan of left deep connection tree is more efficient ?

If the following query plans all use one connection (one-pass join) algorithm , Then in the pipeline query execution model (piplining model) lower , Left deep connection tree query plan at any time
Lower memory requirements than other query plans .

If the following query plans all use nested loop join (nested-loopjoin) algorithm , Then in the pipeline query execution model (pipliningmodel) lower , Left deep connection tree query plan is not required
Build intermediate relationships many times .

<> Dynamic programming method for optimizing connection order

The price is 0, No intermediate results

<> Optimization of physical query plan

<> Determine the execution method of the selection operation

<> Index based selection

Applicable conditions

* The form of the selection condition is K=v or l≤K≤u
* relationship R Built on property K Index of
* method : Use index based selection algorithm
<> Determine how to perform the connect operation

* Index connection (IndexJoin)
Applicable conditions :

* Little left-hand relationship
* Right relation has index on connection property
* Sort merge connection (Sort-MergeJoin)
Applicable conditions :

* At least one relationship is already sorted by connection properties
It is also suitable to use sort merge connection to make multiple connections on the same connection attribute , as R(a,b) on S(a,c) on T(a,d)

* When there are very few pages available in the memory buffer , You can use nested loop connections
One trip connection (One-PassJoin)
Index connection (IndexJoin)
Sort merge connection (Sort-MergeJoin)
HASH join (HashJoin)
Nested loop connection (Nested-LoopJoin)

<> Determine execution model

The number of pages available in memory buffer pool affects the selection of execution model

* Materialized execution (materialization)
* Pipeline execution (piplining)
<> summary

Let's play , To return to trouble , Don't make fun of learning .

This article introduces query optimization ： Including logic query optimization, such as selecting push down, etc , And physical query optimization, such as index based selection . Pay attention to the principle of query optimization in learning , Keep in mind the method of query optimization .

Technology
Daily Recommendation
views 5
views 2