Java Strings are a series of Unicode Character sequence , however , It's often mistaken for char sequence . therefore , We often traverse strings like this :

package testchar; public class TestChar2 { public static void main(String[]
args) { String s = "\u0041\u00DF\u6771\ud801\uDC00"; for(int i = 0; i <
s.length(); i++) { System.out.println(s.charAt(i)); } } } then , We got unexpected results :

A

ß

East

?

?


That's why , Because Unicode Characters and Java Of char Type cannot be equated . actually ,Java In char The characters that a type can represent are Unicode A subset of characters , because char only 16 position , in other words , It can only represent 65536(2 Of 16 Power ) Characters , But actually Unicode The number of characters exceeds this number . stay Java in , use UTF-16 code char and String Characters in , The encoded value corresponding to a character is called a code point . Some code points 16 Bit encoding , Called a code unit , image char Those characters represented ; Some code points 32 Bit encoding , That is to code with two consecutive code units , As mentioned above \ud801\uDC00. actually , We traverse a string , All code points in this string are traversed , and

s.length()

String returned s Number of code units in . When i The corresponding code unit is only one 32 When part of a bit code point ,

s.charAt(i)

We can't work as we want .

Here are several ways to traverse a string correctly :

package testchar; /** * Correct traversal String * * @author yuncong * */ public class
TestChar { public static void main(String[] args) { String s =
"\u0041\u00DF\u6771\ud801\uDC00"; // Get the number of code points in the string int cpCount =
s.codePointCount(0, s.length()); for (int i = 0; i < cpCount; i++) { int index
= s.offsetByCodePoints(0, i); int cp = s.codePointAt(index); if
(!Character.isSupplementaryCodePoint(cp)) { System.out.println((char) cp); }
else { System.out.println(cp); } } System.out.println("-------------------");
for (int i = 0; i < s.length(); i++) { int cp = s.codePointAt(i); if
(!Character.isSupplementaryCodePoint(cp)) { System.out.println((char) cp); }
else { System.out.println(cp); i++; } }
System.out.println("-------------------"); // Reverse traversal string for(int i = s.length() -
1; i >= 0; i--) { int cp = 0; // When i be equal to 0 When , There is only one code unit left , Can't be a secondary character if (i == 0) { cp =
s.codePointAt(0); System.out.println((char)cp); } else { // Only in i greater than 0 You can only leave when , also
// Because the remaining code units are larger than 2, So next // The two code units to be accessed may represent auxiliary // character ; // Back to a code unit i--; cp =
s.codePointAt(i); if (Character.isSupplementaryCodePoint(cp)) {
System.out.println(cp); } else { // If cp Not a secondary character , Go back to the normal position of traversal i++; cp =
s.codePointAt(i); System.out.println((char)cp); } } } } }

( Tiankeng , Cannot appear in blog Java Auxiliary characters in )

Technology
©2019-2020 Toolsou All rights reserved,
JS How to operate java Realize the function of grabbing red packets C Language programming to find a student's grade The United Nations 《 Glory of Kings 》 Please go to the studio : To save the earth Dialogue between apple and Nissan suspended ,Apple Car How's it going ?CSS architecture design China's longest high speed rail officially opened ! The fastest way to finish the race 30.5 hour First knowledge MySQL Comprehensive review ( dried food )2021 year 1 Monthly programmer salary statistics , average 14915 element How to use it quickly html and css Write static page