Compare the binary sort tree discussed in the previous blog Binary balanced tree
Red black tree , They all search from the root node first , Fetching data or index from a node for comparison with a lookup value . that , Is there a function H, Based on this function and search keywords key, You can directly determine the location of the search value , It doesn't need to compare one by one . That's it
Know in advance key Where are you , Find the data directly , Improve efficiency

Hash table (Hash table, Also called hash table ),
It's based on the key (Key) And directly access the data structure in the memory storage location . in other words , It calculates a function about the key value , Map the data of the required query to a location in the table to access the record , This speeds up the search . This mapping function is called a hash function , An array of records is called a hash table

<> Common hash functions

How about a hash function , It depends on three things

* The definition field of hash function must include all the keys to be stored , If the hash table allows m When there is an address , Its range must be in 0 reach m-1 between
* The address calculated by hash function can be evenly distributed in the whole space
* Hash functions should be relatively simple
Division method ( Most commonly used )
function :Hash(key)=key MOD p (p<=m m Is the table length ), It's got to be hash(key) It's just for storage key Subscript of

For example, here's the data {2, 4, 6, 8, 9}
The table length is 10, That is, the array capacity is 10

Direct customization ( Commonly used )
Take a linear function of keyword as hash address (A,B Is a constant ):Hash(Key)= A*Key + B
advantage : simple , uniformity
shortcoming : You need to know the distribution of keywords in advance
Applicable scenarios : It is suitable for finding small data range and continuous data

Mean square ( less )
If every key word has a certain number, the frequency of repetition is very high , You can first find the square value of the keyword , Expanding differences by square , And then take the middle digit as the final storage address .
Use examples
such as key=1234 1234^2=1522756 take 227 do hash address
such as key=4321 4321^2=18671041 take 671 do hash address
Applicable scenarios : The data is unknown in advance and the data length is small

<> Hash Collisions

That is different key The same hash position is generated by the same hash function ,H(key1)=H(key2), For example, our example in the division remainder method , If you insert a 12, his hash(12) by 2, The subscript is 2 There are already elements in the location of , A hash conflict occurs

<> Handling hash conflicts

There are two main solutions to hash conflicts : Closed hash and Hash

<> Closed hash

Closed hash : It's also called open addressing , When a hash conflict occurs , If the hash table is not full , Indicates that there must be an empty position in the hash table , that You can take it key Stored in conflict location “ next ” Go in the empty space

The main processing methods of closed hash are as follows Linear detection and Secondary detection

<> Linear detection

thought : Start at the computed hash position , Find the first free place to store data later

insert
: Insert is to calculate the hash address , Store the data in the calculated hash position , If there is data in this location, the first free location will be found later . But when we have more elements , The more times we have hash conflicts ,

delete
: When we want to delete an element , It cannot be physically deleted directly , For example, we put 15 Deleted , The subscript is 8 The location of is empty , When we want to find out 25 This element , It will also be subscript as 5 This is where we start looking , When 5 This position is not 25 Time , Indicates that there is a hash conflict , And the insertion uses linear detection , That is, the first empty position is inserted . When we look back , If the data exists , The number must exist before the empty position . But at this point, we physically 15 Deleted . Find will find subscript 8 The search ends at the location of , It will not be found at this time 25 That's a lot of data .

So linear detection method is used , Deletion is not deletion in the actual sense , It's a pseudo deletion , We can define three states , namely :EMPTY,EXIST,DELETE
.EMPTY Indicates that the location has never stored data , It's an empty space ;EXIST Indicates that there is data in the location ;DELETE Indicates that the location has stored data before , It's just deleted

Now we want to delete it 8 The data at this location is not available , Set the status of the location to DELETE, Let's look again 25 This number is very small , encounter 8 The location doesn't stop searching , Will continue to search later , Until the state of EMPTY So far . But this method can cause a problem , It's possible that the data is full , If you keep searching at this time , You won't find an empty place , We'll keep searching . And if the data is extreme and there are more and more data , There will be more and more hash conflicts . This does not meet our hash requirements of efficient insertion and search . The solution is to expand the capacity

Expansion
: Capacity expansion does not have to wait until the data is full . We know that when the data gets more and more , The more times hash conflicts occur , So we need to set a threshold , That is, when the data reaches a certain amount , It is necessary to expand the capacity . And what determines the threshold is a
Load factor . Load factor = Actual storage element / Array capacity , The scope is 0~1 between , We usually set the load factor to [0.6,
0.8] between . For example, the size of our array is 10 individual , Because the load is 0.7, If the 8 You need to expand the capacity when you add one element , because 8/10=0.8>.07, That is to say, the load factor larger than ours needs to be expanded . Pay attention to the expansion , We need to move the original data to the new table , But if it is a simple assignment , The conflict has not been resolved , At this time, we should insert the data in the old table into the new table again , So as to reduce the number of hash conflicts

lookup
: One of the advantages of hashing is searching , The efficiency of searching is very high , Just pass key To calculate the hash address , If the value of the calculated hash position is the same as the value of the key identical , It means it has been found , If it is not found after null, it means that the number to be searched does not exist . And it should be noted that , If you find to the end, you need to set the index to 0, Find from scratch

<> Secondary detection

Both quadratic detection and linear detection belong to closed hash , The principle is the same , The main difference between the two is the way of detection , Linear detection is if the position to be inserted already has an element , One by one
Find a new empty position later . The second detection is to find a new location backward by the square of the hash collision times of the location

Scatter the hash conflict data , To keep from piling up . But these two methods have great defects , It's the low utilization of space . On this basis , Introduce a new technology to solve hash conflict ---- Hash

<> Hash

The open hash method is also called chain address method , The hash table stores the head node of the linked list . Having the same hash address will be stored in the same linked list , Each element in the linked list has the same hash address .


The hash represents an array of pointers , Each element in an array is a head pointer to a linked list . We can see from the table that , The element that produces the hash conflict does not occupy the position of other elements , The elements in each linked list are hash conflicting elements

insert
: When we plug in , Calculate the hash address , You can directly locate the location key The head node of the corresponding linked list , However, it is impossible to store the same key, We need to traverse the list to see if the same element exists , Insert if it does not exist . When inserting, we can insert head or tail , It's easier to plug in here , establish key A new node of cur, Let the node point to the head node of the list , And the head node of the list is updated to the newly created new node cur

increase capacity
: The expansion of open hash is not as strict as that of closed hash , Although there is a load factor , But this load factor can be 1, That is to say, there should be 10 Elements , The load factor is 1 Time . here 10 None of the elements has a hash conflict , This is the highest efficiency , So we don't have to limit it . That is, when the element is larger than the element of the hash table, it needs to be expanded , To reduce hash conflicts . Each element in the hash table is the head node of a linked list , We can create a new pointer array , Traversing the old hash table , As long as the traversal chain header node is not empty , Just define one cur, Used to traverse the single linked list , Traversing the single linked list also needs to record the next node of the current node , Otherwise, the next node will be discarded . We are at the current node cur Calculate its hash address in , Then the new pointer array in the specified position of the header . After traversing a single linked list , You need to empty the head node of the old linked list , Because we need to exchange these two pointer arrays later , Then release the old pointer array .

lookup
: Find an element , We can first calculate the hash address of the element to be looked up , Locate directly to the head node of the specified list , Then traverse the list , If the value of the current node is the same as that of the element to be found , Then find , Returns the node found , If not, null
delete
: The deletion here is the actual deletion , We can start by looking up , Check whether the value to be deleted exists in the hash table , If not, return directly false, If it exists, the hash address of the linked list of the deleted node should be calculated first , Find and traverse the linked list , And use one prev Node records the previous node to be deleted , When traversal finds the node we deleted , To determine whether the node is the head node of the list , If it is the head node , The next node of the deleted node is set as the head node , If it is not the head node, the previous node of the record will be deleted prev Of the next node of next Set as the next node to delete . Finally, the valid elements are added -1 And delete the node to be deleted

<> code

code : Closed hash ---- Linear detection
enum STATE { EXIST, DELETE, EMPTY }; // Hashtable : Linear detection for hash conflict resolution template<class K, class V
> struct HashNode { pair<K, V> _kv;// data STATE _state = EMPTY;// state }; // Hash of sequence table
template<class K, class V> class HashTable { public: typedef HashNode<K, V> Node
; HashTable(size_t n = 10) :_hTable(n) ,_size(0) {} bool insert(const pair<K, V>
& kv) { //0. Check the capacity checkCapacity(); //1. The hash position of the current element int idx = kv.first % _hTable.
size(); //2. judge key Does it exist while (_hTable[idx]._state != EMPTY)
// The current location already has data or is a deleted location , They can't be stored { // There is data in the current location and key identical , Cannot be inserted if (_hTable[idx]._state ==
EXIST&& _hTable[idx]._kv.first == kv.first) { return false; } // Continue to search back for empty positions ++idx;
// Go to the end , We need to start from scratch if (idx == _hTable.size()) idx = 0; } _hTable[idx]._kv = kv;
_hTable[idx]._state = EXIST; ++_size; return true; } void checkCapacity() {
// Load factor [0, 1], Here the load factor is set as 0.7 if (_hTable.size() == 0 || _size * 10 / _hTable.size()
>= 7) { // Create a new table int newC = _hTable.size() == 0 ? 10 : 2 * _hTable.size();
HashTable<K, V> newHt(newC); for (size_t i = 0; i < _hTable.size(); ++i) {
// Insert the data from the original table into the new table , if (_hTable[i]._state == EXIST) { newHt.insert(_hTable[i]._kv
); } } // Exchange the contents of two tables Swap(newHt); } } void Swap(HashTable<K, V>& Ht) { swap(_hTable
, Ht._hTable); swap(_size, Ht._size); } Node* find(const K& key) { // Calculation position int
idx= key % _hTable.size(); while (_hTable[idx]._state != EMPTY) { // find if (
_hTable[idx]._state == EXIST && key == _hTable[idx]._kv.first) { return &_hTable
[idx]; } ++idx; if (idx == _hTable.size()) idx = 0; } // If a space is encountered, it means that it is not found , Return to null return
nullptr; } bool erase(const K& key) { Node* node = find(key); if (node) { // Pseudo deletion
--_size; node->_state = DELETE; return true; } return false; } private: vector<
Node> _hTable;// surface size_t _size;// Number of effective elements };
test :

code : Open hash method
#include <vector> #include <iostream> using namespace std; template<class K>
struct HashNode { typedef HashNode<K> Node; K _val; Node* _next; HashNode(const
K& val) :_val(val) , _next(nullptr) {} }; template<class K> class HTable {
public: typedef HashNode<K> Node; HTable(int n = 10) :_ht(n) , _size(0) {} bool
insert(const K& val) { //0. Check the capacity checkCapacity(); //1. calculation hash position int idx = val %
_ht.size(); //2. lookup Node* cur = _ht[idx]; while (cur) { if (cur->_val == val)
return false; cur = cur->_next; } //3. insert -- Head insertion cur = new Node(val); cur->_next =
_ht[idx]; _ht[idx] = cur; ++_size; return true; } void checkCapacity() { if (
_size== _ht.size()) { int newC = _size == 0 ? 10 : 2 * _size; // Create a new pointer array vector<
Node*> newHt(newC); // Traverse old table for (size_t i = 0; i < _ht.size(); ++i) { Node* cur =
_ht[i]; // Traversing single linked list while (cur) { Node* next = cur->_next; // Calculate the new location int idx = cur->
_val% newHt.size(); // Head insertion cur->_next = newHt[idx]; newHt[idx] = cur; cur = next;
} // Old table pointer null _ht[i] = nullptr; } // Exchange new and old tables swap(_ht, newHt); } } Node* find(const
K& val) { int idx = val % _ht.size(); Node* cur = _ht[idx]; while (cur) { if (
cur->_val == val) return cur; cur = cur->_next; } return nullptr; } bool erase(
const K& val) { Node* node = find(val); if (node) { int idx = val % _ht.size();
Node* cur = _ht[idx]; Node* prev = nullptr; while (cur != node) { prev = cur;
cur= cur->_next; } Node* next = cur->_next; if (prev) prev->_next = next; else
_ht[idx] = next; --_size; delete node; return true; } return false; } private:
// Pointer array vector<Node*> _ht; int _size; };
test :

Technology
©2019-2020 Toolsou All rights reserved,
java Four functional interfaces ( a key , simple ) It's unexpected Python Cherry tree (turtle The gorgeous style of Library ) Browser kernel ( understand )HashMap Explain in detail Some East 14 Pay change 16 salary , Sincerity or routine ?html Writing about cherry trees , Writing about cherry trees