- 2021-05-26 16:39
*views 3*- Pointer
- Linked list
- data structure
- New star project

Compare the binary sort tree discussed in the previous blog Binary balanced tree

Red black tree , They all search from the root node first , Fetching data or index from a node for comparison with a lookup value . that , Is there a function H, Based on this function and search keywords key, You can directly determine the location of the search value , It doesn't need to compare one by one . That's it

Know in advance key Where are you , Find the data directly , Improve efficiency

Hash table （Hash table, Also called hash table ）,

It's based on the key （Key） And directly access the data structure in the memory storage location . in other words , It calculates a function about the key value , Map the data of the required query to a location in the table to access the record , This speeds up the search . This mapping function is called a hash function , An array of records is called a hash table

<> Common hash functions

How about a hash function , It depends on three things

* The definition field of hash function must include all the keys to be stored , If the hash table allows m When there is an address , Its range must be in 0 reach m-1 between

* The address calculated by hash function can be evenly distributed in the whole space

* Hash functions should be relatively simple

Division method （ Most commonly used ）

function ：Hash（key）=key MOD p （p<=m m Is the table length ）, It's got to be hash(key) It's just for storage key Subscript of

For example, here's the data {2, 4, 6, 8, 9}

The table length is 10, That is, the array capacity is 10

Direct customization （ Commonly used ）

Take a linear function of keyword as hash address (A,B Is a constant )：Hash（Key）= A*Key + B

advantage ： simple , uniformity

shortcoming ： You need to know the distribution of keywords in advance

Applicable scenarios ： It is suitable for finding small data range and continuous data

Mean square （ less ）

If every key word has a certain number, the frequency of repetition is very high , You can first find the square value of the keyword , Expanding differences by square , And then take the middle digit as the final storage address .

Use examples

such as key=1234 1234^2=1522756 take 227 do hash address

such as key=4321 4321^2=18671041 take 671 do hash address

Applicable scenarios ： The data is unknown in advance and the data length is small

<> Hash Collisions

That is different key The same hash position is generated by the same hash function ,H（key1）=H（key2）, For example, our example in the division remainder method , If you insert a 12, his hash(12) by 2, The subscript is 2 There are already elements in the location of , A hash conflict occurs

<> Handling hash conflicts

There are two main solutions to hash conflicts ： Closed hash and Hash

<> Closed hash

Closed hash ： It's also called open addressing , When a hash conflict occurs , If the hash table is not full , Indicates that there must be an empty position in the hash table , that You can take it key Stored in conflict location “ next ” Go in the empty space

The main processing methods of closed hash are as follows Linear detection and Secondary detection

<> Linear detection

thought ： Start at the computed hash position , Find the first free place to store data later

insert

： Insert is to calculate the hash address , Store the data in the calculated hash position , If there is data in this location, the first free location will be found later . But when we have more elements , The more times we have hash conflicts ,

delete

： When we want to delete an element , It cannot be physically deleted directly , For example, we put 15 Deleted , The subscript is 8 The location of is empty , When we want to find out 25 This element , It will also be subscript as 5 This is where we start looking , When 5 This position is not 25 Time , Indicates that there is a hash conflict , And the insertion uses linear detection , That is, the first empty position is inserted . When we look back , If the data exists , The number must exist before the empty position . But at this point, we physically 15 Deleted . Find will find subscript 8 The search ends at the location of , It will not be found at this time 25 That's a lot of data .

So linear detection method is used , Deletion is not deletion in the actual sense , It's a pseudo deletion , We can define three states , namely ：EMPTY,EXIST,DELETE

.EMPTY Indicates that the location has never stored data , It's an empty space ;EXIST Indicates that there is data in the location ;DELETE Indicates that the location has stored data before , It's just deleted

Now we want to delete it 8 The data at this location is not available , Set the status of the location to DELETE, Let's look again 25 This number is very small , encounter 8 The location doesn't stop searching , Will continue to search later , Until the state of EMPTY So far . But this method can cause a problem , It's possible that the data is full , If you keep searching at this time , You won't find an empty place , We'll keep searching . And if the data is extreme and there are more and more data , There will be more and more hash conflicts . This does not meet our hash requirements of efficient insertion and search . The solution is to expand the capacity

Expansion

： Capacity expansion does not have to wait until the data is full . We know that when the data gets more and more , The more times hash conflicts occur , So we need to set a threshold , That is, when the data reaches a certain amount , It is necessary to expand the capacity . And what determines the threshold is a

Load factor . Load factor = Actual storage element / Array capacity , The scope is 0~1 between , We usually set the load factor to [0.6,

0.8] between . For example, the size of our array is 10 individual , Because the load is 0.7, If the 8 You need to expand the capacity when you add one element , because 8/10=0.8>.07, That is to say, the load factor larger than ours needs to be expanded . Pay attention to the expansion , We need to move the original data to the new table , But if it is a simple assignment , The conflict has not been resolved , At this time, we should insert the data in the old table into the new table again , So as to reduce the number of hash conflicts

lookup

： One of the advantages of hashing is searching , The efficiency of searching is very high , Just pass key To calculate the hash address , If the value of the calculated hash position is the same as the value of the key identical , It means it has been found , If it is not found after null, it means that the number to be searched does not exist . And it should be noted that , If you find to the end, you need to set the index to 0, Find from scratch

<> Secondary detection

Both quadratic detection and linear detection belong to closed hash , The principle is the same , The main difference between the two is the way of detection , Linear detection is if the position to be inserted already has an element , One by one

Find a new empty position later . The second detection is to find a new location backward by the square of the hash collision times of the location

Scatter the hash conflict data , To keep from piling up . But these two methods have great defects , It's the low utilization of space . On this basis , Introduce a new technology to solve hash conflict ---- Hash

<> Hash

The open hash method is also called chain address method , The hash table stores the head node of the linked list . Having the same hash address will be stored in the same linked list , Each element in the linked list has the same hash address .

The hash represents an array of pointers , Each element in an array is a head pointer to a linked list . We can see from the table that , The element that produces the hash conflict does not occupy the position of other elements , The elements in each linked list are hash conflicting elements

insert

： When we plug in , Calculate the hash address , You can directly locate the location key The head node of the corresponding linked list , However, it is impossible to store the same key, We need to traverse the list to see if the same element exists , Insert if it does not exist . When inserting, we can insert head or tail , It's easier to plug in here , establish key A new node of cur, Let the node point to the head node of the list , And the head node of the list is updated to the newly created new node cur

increase capacity

： The expansion of open hash is not as strict as that of closed hash , Although there is a load factor , But this load factor can be 1, That is to say, there should be 10 Elements , The load factor is 1 Time . here 10 None of the elements has a hash conflict , This is the highest efficiency , So we don't have to limit it . That is, when the element is larger than the element of the hash table, it needs to be expanded , To reduce hash conflicts . Each element in the hash table is the head node of a linked list , We can create a new pointer array , Traversing the old hash table , As long as the traversal chain header node is not empty , Just define one cur, Used to traverse the single linked list , Traversing the single linked list also needs to record the next node of the current node , Otherwise, the next node will be discarded . We are at the current node cur Calculate its hash address in , Then the new pointer array in the specified position of the header . After traversing a single linked list , You need to empty the head node of the old linked list , Because we need to exchange these two pointer arrays later , Then release the old pointer array .

lookup

： Find an element , We can first calculate the hash address of the element to be looked up , Locate directly to the head node of the specified list , Then traverse the list , If the value of the current node is the same as that of the element to be found , Then find , Returns the node found , If not, null

delete

： The deletion here is the actual deletion , We can start by looking up , Check whether the value to be deleted exists in the hash table , If not, return directly false, If it exists, the hash address of the linked list of the deleted node should be calculated first , Find and traverse the linked list , And use one prev Node records the previous node to be deleted , When traversal finds the node we deleted , To determine whether the node is the head node of the list , If it is the head node , The next node of the deleted node is set as the head node , If it is not the head node, the previous node of the record will be deleted prev Of the next node of next Set as the next node to delete . Finally, the valid elements are added -1 And delete the node to be deleted

<> code

code ： Closed hash ---- Linear detection

enum STATE { EXIST, DELETE, EMPTY }; // Hashtable ： Linear detection for hash conflict resolution template<class K, class V

> struct HashNode { pair<K, V> _kv;// data STATE _state = EMPTY;// state }; // Hash of sequence table

template<class K, class V> class HashTable { public: typedef HashNode<K, V> Node

; HashTable(size_t n = 10) :_hTable(n) ,_size(0) {} bool insert(const pair<K, V>

& kv) { //0. Check the capacity checkCapacity(); //1. The hash position of the current element int idx = kv.first % _hTable.

size(); //2. judge key Does it exist while (_hTable[idx]._state != EMPTY)

// The current location already has data or is a deleted location , They can't be stored { // There is data in the current location and key identical , Cannot be inserted if (_hTable[idx]._state ==

EXIST&& _hTable[idx]._kv.first == kv.first) { return false; } // Continue to search back for empty positions ++idx;

// Go to the end , We need to start from scratch if (idx == _hTable.size()) idx = 0; } _hTable[idx]._kv = kv;

_hTable[idx]._state = EXIST; ++_size; return true; } void checkCapacity() {

// Load factor [0, 1], Here the load factor is set as 0.7 if (_hTable.size() == 0 || _size * 10 / _hTable.size()

>= 7) { // Create a new table int newC = _hTable.size() == 0 ? 10 : 2 * _hTable.size();

HashTable<K, V> newHt(newC); for (size_t i = 0; i < _hTable.size(); ++i) {

// Insert the data from the original table into the new table , if (_hTable[i]._state == EXIST) { newHt.insert(_hTable[i]._kv

); } } // Exchange the contents of two tables Swap(newHt); } } void Swap(HashTable<K, V>& Ht) { swap(_hTable

, Ht._hTable); swap(_size, Ht._size); } Node* find(const K& key) { // Calculation position int

idx= key % _hTable.size(); while (_hTable[idx]._state != EMPTY) { // find if (

_hTable[idx]._state == EXIST && key == _hTable[idx]._kv.first) { return &_hTable

[idx]; } ++idx; if (idx == _hTable.size()) idx = 0; } // If a space is encountered, it means that it is not found , Return to null return

nullptr; } bool erase(const K& key) { Node* node = find(key); if (node) { // Pseudo deletion

--_size; node->_state = DELETE; return true; } return false; } private: vector<

Node> _hTable;// surface size_t _size;// Number of effective elements };

test ：

code ： Open hash method

#include <vector> #include <iostream> using namespace std; template<class K>

struct HashNode { typedef HashNode<K> Node; K _val; Node* _next; HashNode(const

K& val) :_val(val) , _next(nullptr) {} }; template<class K> class HTable {

public: typedef HashNode<K> Node; HTable(int n = 10) :_ht(n) , _size(0) {} bool

insert(const K& val) { //0. Check the capacity checkCapacity(); //1. calculation hash position int idx = val %

_ht.size(); //2. lookup Node* cur = _ht[idx]; while (cur) { if (cur->_val == val)

return false; cur = cur->_next; } //3. insert -- Head insertion cur = new Node(val); cur->_next =

_ht[idx]; _ht[idx] = cur; ++_size; return true; } void checkCapacity() { if (

_size== _ht.size()) { int newC = _size == 0 ? 10 : 2 * _size; // Create a new pointer array vector<

Node*> newHt(newC); // Traverse old table for (size_t i = 0; i < _ht.size(); ++i) { Node* cur =

_ht[i]; // Traversing single linked list while (cur) { Node* next = cur->_next; // Calculate the new location int idx = cur->

_val% newHt.size(); // Head insertion cur->_next = newHt[idx]; newHt[idx] = cur; cur = next;

} // Old table pointer null _ht[i] = nullptr; } // Exchange new and old tables swap(_ht, newHt); } } Node* find(const

K& val) { int idx = val % _ht.size(); Node* cur = _ht[idx]; while (cur) { if (

cur->_val == val) return cur; cur = cur->_next; } return nullptr; } bool erase(

const K& val) { Node* node = find(val); if (node) { int idx = val % _ht.size();

Node* cur = _ht[idx]; Node* prev = nullptr; while (cur != node) { prev = cur;

cur= cur->_next; } Node* next = cur->_next; if (prev) prev->_next = next; else

_ht[idx] = next; --_size; delete node; return true; } return false; } private:

// Pointer array vector<Node*> _ht; int _size; };

test ：

Technology

- Java423 articles
- Python236 articles
- Vue127 articles
- Linux119 articles
- MySQL100 articles
- javascript76 articles
- SpringBoot72 articles
- C++68 articles
- more...

Daily Recommendation

views 4

views 3

©2019-2020 Toolsou All rights reserved,

Huawei 2021 session Hardware Engineer Logical post （FPGA） Super detailed surface !!!Vue-element-admin upgrade ui edition virtual machine VMware Download and install the most detailed tutorial !C++ Move constructor and copy constructor sound of dripping water java Backstage interview pygame Realize full screen mode and adjustable window size mysql Database setting character set configuration modification my.ini file （windows）30 What's the experience of being a junior programmer at the age of 20 C++ Multithreading programming （ Summary of common functions and parameters ）python_ cherry tree