How to understand greedy algorithm
Let's look at an example first
Suppose there is one that can hold 100kg Item Backpack , The backpack can hold all kinds of items , We have the following five kinds of beans , The weight and total value of each bean are different . In order to maximize the total value of the items contained in the backpack , How do we choose which beans to carry in our backpack ? How much should each kind of bean be ?
We can think so , We just need to calculate the unit price of each bean , Load beans according to the price from high to low , First load the beans with the highest unit price , Pretend to be dissatisfied , Refill the beans with relatively low price , Until full .
The solution to this problem is to use the idea of greedy algorithm , Let's first look at the following steps of greedy algorithm to solve the problem ：
First step ： Problem model applying greedy algorithm
： For a set of data , Limits and expectations are defined in advance , You want to select several data from them , If the limit is met , Maximum expected value . For the example just now , The limit value is that the beans in the backpack cannot exceed 100kg, The expected value is the total value of the beans in the backpack . This set of data is 5 Seed beans , Choose from Some beans , Meet the weight not exceeding 100kg, And the total value is the largest .
Step 2 ： Try to solve it with greedy algorithm ：
Each time you select the same contribution to the limit value , Data that contribute the most to expectations . For the example just now , Each time, choose the one with the highest unit price from the remaining beans , That is, when the weight is the same , Beans that contribute the most to value .
Step 3 ： An example is given to verify whether the algorithm is correct ： In most cases , Just give a few examples to verify whether the following short hair can get the optimal solution .
actually , Solving problems with greedy algorithm , It can not give the optimal solution .
Let's take another example , Find a from vertex in a weighted graph S To vertex T Shortest path （ The weight of the edge in the path is the smallest ）. Solution of greedy algorithm ： Each time, select an edge with the lowest weight connected to the current vertex , That is, the edge that contributes the least to the total path length , Until the vertex is found T. According to this idea , The shortest path we find is S->A->E->T, The path length is 1+4+4
however , Based on this greedy algorithm , The resulting path is not the shortest path , Because path S->B->D->T Shorter , Count Reg 6. The main reason why greedy algorithm can not solve this problem is that in the process of greedy selection, the previous choice will affect the later choice . If the first step starts from the vertex S To vertex A, Then the next faces the vertices and edges with the first step from the vertex S To vertex B It's completely different . therefore , Even if we choose the best way to go in the first step , It is also possible that the choice of this step leads to the subsequent steps are very bad .
Application example of greedy algorithm
1. Divide candy We have m A candy and n A child . We're going to give the candy to these children now , But there are few sweets , children （m<n）, So candy can only be distributed to some children .
Each candy varies in size , this m The size of the candy is s1,s2,s3,……,sm. besides , each
Children's demand for candy size is also different , Only the size of candy is greater than or equal to the child's demand for candy size time , Children are satisfied . Suppose this n What are the children's needs for candy size
g1,g2,g3,……, gn. My question is , How to distribute candy , As many children as possible ? We can abstract this problem into , from n
Of the children , Draw some children to distribute candy , Let satisfied children number （ expected value ） Is the biggest . The limit of this problem is the number of sweets m.
Now let's look at how to solve it with greedy algorithm . For a child , If small candy can meet , We
There's no need to use bigger candy , In this way, the larger ones can be left to other children who need more candy size . The other party noodles , Children with a small demand for candy size are more likely to be satisfied , therefore , We can distribute sugar from children with small needs
fruit . Because meeting a child with big needs is different from meeting a child with small needs , The contribution to our expectations is the same .
Every time we get from the rest of the children , Find the one with the least demand for candy size , Then give him the rest of the candy to satisfy him The smallest candy , The resulting distribution scheme , That is, the scheme that satisfies the largest number of children . 2.
Coin change This problem is more common in our daily life . Suppose we have 1 element ,2 element ,5 element ,10 element ,20 element , 50 element ,100 Notes of these denominations , Their number is
c1,c2,c5,c10,c20,c50, c100. We have to pay with this money now K element , How many notes should I use at least ?
In life , We must pay with the largest face value first , If not , Just keep using smaller denominations , In this way reason by analogy , Finally, the rest is used 1 Yuan to make up .
Contribute the same expectations （ Number of notes ） In case of , We hope to contribute more money , This will make the number of notes more less , This is the solution of a greedy algorithm . Intuition tells us , This treatment is the best . 3.
Interval coverage Suppose we have n Interval , The start and end points of the interval are [l1, r1],[l2, r2],[l3, r3], ……,[ln, rn]. We're from here n
Select a part of the interval from the two intervals , This part of the interval is disjoint （ Endpoint phase The situation of intersection is not intersection ）, How many intervals can you choose at most ?
The idea of dealing with this problem is not so easy to understand , however , I suggest you better understand , Because this processing idea is Many greedy algorithm problems are useful , For example, task scheduling , Teacher scheduling and so on .
The solution to this problem is as follows ： Let's assume this n The leftmost end point in the interval is lmin, The rightmost endpoint is rmax. This problem is equivalent to , We choose several disjoint intervals , From left to right
[lmin, rmax] cover upper . Let's start this in the order from small to large n Interval sorting .
Every time we choose , The left endpoint does not coincide with the previous covered interval , The right endpoint is as small as possible , such
The remaining uncovered areas can be made as large as possible , You can place more intervals . This is actually a greedy choice
How to solve Huffman coding with greedy algorithm
Suppose there is a 1000 Character file , Percentage of each character 1B(1B = 8bit), Store this 1000 Characters required 8000bit, So is there a more space saving storage method ?
Assumptions are found through statistical analysis , this 1000 Characters contain only six different characters , The assumptions are a,b,c,d,e,f. And three binary bits (bit) You can say 8 Different characters , Therefore, in order to reduce storage space , Each character is represented by three binary bits ：a by 000,b by 001,c by 010,d by 011,e by 100,f by 101. that , Store this 1000 Characters only 3000bit, It saves a lot of space than the original storage method . Is there a more space saving way ?
Huffman coding is needed at this time , It is a very effective coding method , It is widely used in data compression , The compression ratio is usually 20%-90%.
Huffman coding not only examines how many different characters are in the text , It also examines the frequency of each character , According to frequency
Different , Select codes of different lengths . Huffman coding attempts to use this unequal length coding method , To further increase the pressure
Efficiency of shrinkage . How to choose different length codes for characters with different frequencies ? According to greedy thought , We can take it out
Characters with more frequency , Use a slightly shorter code ; Less frequent characters , Make it up with a slightly longer one code . For equal length coding , It's easy for us to decompress . For example, in the example just now , We use 3
individual bit surface Show one character . When decompressing , Every time we read from the text 3 Bit binary code , Then translate it into the corresponding character . however , Huffman codes are unequal in length , Should be read every time 1 Bit or
2 position ,3 Bits, etc And ? This problem leads to the complexity of Huffman coding decompression . To avoid ambiguity during decompression , Hoff
Man coding requires between the codes of each character , There will be no case where one code is a prefix to another .
Suppose this 6 The frequency of characters from high to low is a,b,c,d,e,f. Let's code them here
a look , The encoding of any one character is not a prefix of another , When decompressing , We read as much as possible every time Long decompressible binary string , Therefore, there will be no ambiguity during decompression . After this coding and compression , this
1000 Characters only 2100bits That's it
Although the idea of Hoffman coding is not difficult to understand , But how to change the frequency of characters , For different characters What about codes of different lengths ? Here's a little tricky .
We think of each character as a node , And put the frequency into the priority queue . We take the frequency out of the queue Two nodes with the lowest rate A,B, Then create a new node
C, Set the frequency to the sum of the frequencies of the two nodes , And put This new node C As node A,B Parent node of . Finally C Put nodes in priority queue . Repeat this process , Until there is no data in the queue .
Now? , We add a weight to each edge , The edges pointing to the left child node are all marked as 0, Point to the right sub section Edge of point , We all marked
1, The path from the root node to the leaf node is the Hoffman code of the characters corresponding to the leaf node code