<>

<> abstract

Hash table (Hash table, Also called hash table ), Is based on the key value (Key
value) And directly access the data structure . in other words , It accesses records by mapping key values to a location in the table , To speed up discovery . This mapping function is called a hash function , The array in which records are stored is called a hash table .

HashMap yes Java The most frequently used by programmers is for key value pairs (key value) Container for data processing , stay JDK1.7(Java Developmet
Kit) Time HashMap Take an array + Store data in the form of linked list ,JDK1.8 yes HashMap The storage structure is optimized , The red black tree data structure is introduced , Greatly enhanced HashMap Access performance ! Why did red and black trees be introduced ? because HashMap There is a problem , Even if the load factor and Hash Re rationalization of algorithm design , It can't avoid the problem that the zipper on the linked list is too long , If there is serious in extreme cases Hash conflict , Will seriously affect HashMap Access performance , therefore HashMap stay jdk1.8 Time , Red black tree was introduced , It is optimized by using the characteristics of rapid addition, deletion, modification and query of red black tree HashMap Performance of !

<> brief introduction

Java Mapping for data structures ( Key value pair ) An interface is defined java.util.Map, This interface has four implementation classes that we use frequently in development , namely HashMap<K,
V>,Hashtable<K, V>,LinkedHashMap<K,V>,TreeMap<K,V>

Serial number implementation class features
1HashMap<K, V>1, According to the key hashcode Store data
2, Allow a record key by null, Allow multiple records value by null
3, Non thread safe
2Hashtable<K, V>1, Thread safety
2, Low performance , The scenes that need to be used can be used ConcurrentHashMap replace
3LinkedHashMap<K,V>1, Access is sequential , The insertion order is saved
2, It can be sorted in access order by constructor arguments
4TreeMap<K,V>1, Orderly Map, You can customize the collation of stored data through the sorting comparator , By default key Ordered arrangement of
2, When used key Need to implement Comparable Interface or pass in custom through constructor Comparator
<> storage structure

jdk1.8 in the future HashMap Adopt array + Linked list / Red black tree to store data

<> Source code analysis

<> trunk
// HashMap Backbone of , That's the green part above , It's a Node<K,V> array , each Node Contains one K-V Key value pair transient Node<K,V>[]
table;
<> Node element
// Node<K,V> yes HashMap Static inner class , Realized Map Internal in interface Entry Interface static class Node<K,V>
implements Map.Entry<K,V> { // Record current Node Yes key Yes hash value , Double counting can be avoided , Space for time final int hash;
// key final K key; // value V value; // Store to next Entry Reference to , It is a one-way linked list structure Node<K,V> next; //
... }
<> Other important fields
// Default initial capacity 16 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; //
Maximum capacity ,1 Shift left 30 position static final int MAXIMUM_CAPACITY = 1 << 30; // Default expansion factor 0.75 static
final float DEFAULT_LOAD_FACTOR = 0.75f; // Length of linked list to red black tree static final int
TREEIFY_THRESHOLD= 8; // Threshold value of red black tree to linked list static final int UNTREEIFY_THRESHOLD = 6; //
Array length of linked list to red black tree static final int MIN_TREEIFY_CAPACITY = 64; // Actual storage K-V Number of key value pairs
transient int size; // record HashMap Number of changes , because HashMap Non thread safe ,modCount Available for FailFast mechanism
transient int modCount; // Capacity expansion threshold , default 16*0.75=12, When filled to 13 When multiple elements , After capacity expansion, it will become 32, int threshold
; // Load factor loadFactor=capacity*threshold,HashMap Expansion needs reference loadFactor Value of final float
loadFactor;
<> Constructor
// Compare all arguments of the constructor , Not given in constructor table Allocate memory space , This constructor HashMap(Map<? extends K, ? extends
V> m) Will give table Allocate memory space public HashMap(int initialCapacity, float loadFactor) { //
Judge whether the initialization capacity is legal , If <0 An exception is thrown if (initialCapacity < 0) throw new
IllegalArgumentException("Illegal initial capacity: " + initialCapacity); //
judge initialCapacity Greater than 1<<30, If greater than, take 1<<30 = 2^30 if (initialCapacity >
MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; //
Judge whether the load factor is legal , If less than or equal to 0 perhaps isNaN,loadFactor!=loadFactor, An exception is thrown if (loadFactor <= 0 ||
Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load
factor: " +loadFactor); // assignment loadFactor this.loadFactor = loadFactor; //
By bit operation threshold Set value to closest initialCapacity One of 2 Power of ( It's very important here ) this.threshold =
tableSizeFor(initialCapacity); }
<>hash Algorithm implementation

HashMap Medium hash The implementation of the algorithm is divided into three steps . The second step is to use hash High value 16 Bit participation bit operation , Is to ensure that in the array table Yes length When I was younger , Can guarantee high and low bit Are involved hash In operation , While ensuring uniform distribution, bit operation is adopted , There will not be too much performance consumption ; The third step , When n yes 2 The power of an integer is ,hash&(n-1), Equivalent to hash Value modulus , Bit operation is more efficient than modulo operation ; The specific process can be viewed through the diagram .

First step : adopt key.hashCode() obtain key Yes hashcode;

Step 2 : adopt (h = key.hashCode()) ^ (h >>> 16) Proceed high 16 Bit operation of bit ;

Step 3 : adopt (n - 1) & hash For calculation hash Value modulo operation , Get the position of the array where the node is inserted .

<>HashMap of put method

First step : Judgment key value pair array table[i] Is it empty /null, Execute if yes resize() Capacity expansion .

Step 2 : According to key key calculation hash The index that is worth inserting into the array i, If tab[i]== null Insert directly , Perform step 6 ; If tab[i] != null, Perform step 3 .

Step 3 : judge tab[i] The first element of the and the insert element key Yes hashcode&equals Are they equal , Equal coverage , Otherwise, proceed to step 4 .

Step 4 : judge tab[i] Is it a red black tree node TreeNode, If yes, insert the node in the red black tree , Otherwise, proceed to step 5 .

Step 5 : ergodic tab[i] Determine whether the linked list is greater than 8, greater than 8 It may turn into a red black tree ( Array needs to be greater than 64), If satisfied, insert a node in the red black tree ; Otherwise, insert in the linked list ; In the process of traversing the linked list, if there is key Yes hashcode&equals If equal, replace it .

Step 6 : Insert successful , judge hashmap Yes size Whether it exceeds threshold Value of , Capacity expansion if exceeding .

put(K key, V value)
public V put(K key, V value) { // hash(key) according to key Calculate a hash value , The specific implementation of the following functions return
putVal(hash(key), key, value, false, true); }
calculation key Yes hash value
static final int hash(Object key) { int h; // If key be equal to null, return 0 //
Otherwise take key Yes hashcode value h, take h unsigned right shift 16 Bit, that is, get high 16 position , Returns two values after XOR operation return (key == null) ? 0 :
(h = key.hashCode()) ^ (h >>> 16); }
putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i; // judge table Is it empty , If it is empty resize() establish if ((tab =
table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // (n -
1) & hash Calculates the index of the inserted node in the array index, If it is blank, it will be inserted directly if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; //
If node key existence , The current element is overwritten if (p.hash == hash && ((k = p.key) == key || (key != null &&
key.equals(k)))) e = p; // The node does not exist and the chain on the current array node is RBT Red black tree , The node is inserted as a red black tree else if (p
instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key,
value); // The chain is a linked list else { for (int binCount = 0; ; ++binCount) { //
The next node of the node is null, Description traverses to the last node of the linked list , The of the last node traversed by the current next Point to the newly inserted node if ((e = p.next) == null
) { p.next = newNode(hash, key, value, null); //
Linked list length greater than 8 Red black tree conversion may be required , It depends treeifyBin() Determine whether the length of the array is greater than 64 if (binCount >=
TREEIFY_THRESHOLD- 1) // -1 for 1st treeifyBin(tab, hash); break; } // Returns if the node exists
if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
break; // Does not exist and write a node is not null, The next node is assigned to the current node , Continue with the next cycle p = e; } } //
The node exists and returns directly after replacement , No record HashmMap structural changes if (e != null) { // existing mapping for key V
oldValue= e.value; if (!onlyIfAbsent || oldValue == null) e.value = value;
afterNodeAccess(e); return oldValue; } } // record HashMap Structural change of ++modCount; //
Judge whether the node needs to be updated after insertion hashMap Expand the capacity of the array if (++size > threshold) resize(); afterNodeInsertion(
evict); return null; }
Capacity expansion
final Node<K,V>[] resize() { // Assign the array before expansion to oldTab Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int
newCap, newThr = 0; if (oldCap > 0) { // Judge whether the array length is greater than or equal to the maximum allowed value , If it exceeds, the maximum value shall be taken , Direct return if (
oldCap>= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } //
Moves the original array size one bit to the left , That is to expand twice , Before capacity expansion, you need to judge whether the size exceeds the maximum allowed value else if ((newCap = oldCap << 1) <
MAXIMUM_CAPACITY&& oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; //
double threshold } //
If the array size is less than or equal to 0 And threshold greater than 0( such as HashMap All elements in have been deleted ), Will oldThr Value assigned to newCap else if (
oldThr> 0) newCap = oldThr; // First initialization , Give default values else { newCap =
DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR *
DEFAULT_INITIAL_CAPACITY); } // If the new expansion threshold is still 0, Need to give newThr Calculate a value if (newThr == 0) {
float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft
< (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // Assign a new array expansion threshold
threshold= newThr; @SuppressWarnings({"rawtypes","unchecked"}) //
Instantiate a new according to the new capacity Node array Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table =
newTab; if (oldTab != null) { // Traverse the old array and assign values to the new array for (int j = 0; j < oldCap; ++j) {
Node<K,V> e; // If the node is not null, Then assign the node to e if ((e = oldTab[j]) != null) { //
Empty node reference , wait for GC recovery oldTab[j] = null; //
If the node's next Point to null , According to the node record hash value & Size of expansion -1, Recalculate the index position of the node in the array if (e.next == null) newTab[e
.hash & (newCap - 1)] = e; // If the node is a red black tree node , Insert as red black tree node else if (e instanceof
TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); //
This indicates that the chain on the current node is a linked list , Traversal linked list , Transfer the linked list to the new array else { // Create two chains lo Chain at original array node position Node<K,V> loHead =
null, loTail = null; // hi Chain insert original array index +oldCap position Node<K,V> hiHead = null, hiTail =
null; Node<K,V> next; do { next = e.next; // a key //
Gets the name of the current node hash value , And oldCap Do bitwise and operations , If the result is 0, Then join lo chain if ((e.hash & oldCap) == 0) { if (
loTail== null) loHead = e; else loTail.next = e; loTail = e; } // Otherwise join hi chain else {
if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e
= next) != null); // The original position of the new array points to lo Head node of chain if (loTail != null) { loTail.next = null;
newTab[j] = loHead; } // Of the new array j+oldCap point hi Head node of chain if (hiTail != null) { hiTail.next
= null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
<> Picture and text chat jdk1.8 After medium expansion resize() Those things

stay jdk1.8 in , When occurs HashMap During capacity expansion, we can find it through the source code ,HashMap The position of the node is not determined by the new array length , But through (e.hash &
oldCap) == 0, primary hash Value and the length of the original array to determine the position of the node in the new array , When (e.hash & oldCap) ==
0 Time , The position of the node in the new array is the same as that in the old array ; When (e.hash & oldCap) != 0 Time , The position of the node in the new array is j +
oldCap, That is, the original position plus the length of the old array .

Here are two questions , Bring the problem to analysis ,

Question one : Why can this be achieved ?

Question two : before hash Nodes with the same value are at the same index position of the array , Now it will be randomly assigned to two new index positions , Will this lead to get(key) Can't get the element ?

<> Three preconditions

1,HashMap During initialization, the default size of the array is 16, And if the initial value of our set value is not 2 Integer power of , Then it will automatically calculate the value closest to the initialization value 2 The initial value of the integer power of , The specific implementation can be viewed in the source code tableSizeFor() method .
static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n
>>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >=
MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }
2,HashMap The expansion method is oldCap << 1, It must be expanded to the original 2 times , Ensure that it is still after capacity expansion 2 Integer power of

3,HashMap The array index is calculated by (n - 1) & hash

The initial size of the array is 16, The expansion factor is 0.75, Insert three nodes in turn , Nodal key Respectively key1,key2,key3, Suppose the number of nodes in the array reaches after the third node is inserted 12 individual , Capacity expansion is required at this time

here HashMap The element distribution assumption in is shown in the figure below ,size=12, Capacity expansion

The expansion algorithm is e.hash & oldCap, Found after calculation key2 The value of is 0, Maintain subscript position in old array ,key1 and key3 Need to transfer to j +
oldCap Location of , that is 1+16=17 Subscript position

Node distribution after capacity expansion

At this point, let's look at the basis after capacity expansion key Gets the code of the element
public V get(Object key) { Node<K,V> e; // According to the current key calculation hash value , This calculation method has been described in detail above
return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V>
getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// When the array is not empty , also key The corresponding array subscript element does not exist if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) { // Check whether the acquired node is the first node , Yes, return if (first.hash
== hash && ((k = first.key) == key || (key != null && key.equals(k)))) return
first; // If it is not the head node, judge whether the next node exists if ((e = first.next) != null) { //
If the node is a red black tree , Get the element in the red black tree if (first instanceof TreeNode) return ((TreeNode<K,V>)first)
.getTreeNode(hash, key); // If it is a linked list, traverse the nodes in the linked list , matching key Then return , When e If there is no next node, the loop ends and returns null do {
if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
return e; } while ((e = e.next) != null); } } // If it is a match, return null return null; }
From the above source code, we can see very clearly , The same is true for getting the subscript in the array where the node is located (n - 1) &
hash, It is the same as calculating the array subscript when inserting a node , Then why can we find it accurately at this time key1,key2,key3 And ? The reason is n The value of is doubled , And node hash The value will not change , The position of the calculated node in the array , With us during capacity expansion , Node move operation performed ( It can also be understood as recalculating the index in the array ) Is consistent ; We can pass get Time , calculation key1,key2,key3 Index to show the effect , as follows

<> Summary

* HashMap stay 1.8 Great optimization has been made in , Whether in calculation hash Value algorithm , Or is the underlying data structure from the array + The linked list becomes an array + Linked list / Red black tree , Both greatly improve the performance
*
HashMap Is thread unsafe , In the actual production development process, if we need to use thread safety key-value storage , Can choose ConcurrentHashMap perhaps Collections Yes synchronizedMap yes HashMap Thread safe capability , Of course, I recommend it ConcurrentHashMap
* To ensure HashMap Performance of , Reduce performance loss caused by capacity expansion , In the process of using , The size of the stored value should be estimated in advance , It is also necessary to set a reasonable initial size and expansion factor
* in use HashMap Time , We must rewrite it if necessary key Yes hashcode Methods and equals method

Technology
©2019-2020 Toolsou All rights reserved,
【Java8 New features 1】Lambda Expression summary What is a process , Concept of process ?hdfs dfs Common basic commands java When creating objects, you must _Java Basic test questions Generation of random numbers + Figure guessing game When will we enter the old generation ?HDFS Common commands ( summary ) It is never recommended to spend a lot of time studying for employment pythonmacOS Big Sur The installation could not be completed Big Sur Why can't I install it ?Python pyttsx3| Text reading ( Various languages )