标签:而且 spec 对象 source 文本 jdk xtend 使用 replace
本篇文章内的方法介绍,在方法的上面的注释讲解的很清楚,这里只阐述一些要点。
Java中的String类的定义如下:
1 public final class String 2 implements java.io.Serializable, Comparable<String>, CharSequence { ...}
可以看到,String是final的,而且继承了Serializable、Comparable和CharSequence接口。
正是因为这个特性,字符串对象可以被共享,例如下面两个字符串是等价的:
1 String str = "abc"; 2 3 is equivalent to: 4 5 char data[] = {‘a‘, ‘b‘, ‘c‘}; 6 String str = new String(data);
String类中定义了一个final的字符数组value[],用来存储字符:
/** The value is used for character storage. */ private final char value[];
注意value是final的,所以赋值之后不能被修改,这就是字符串不能被修改的原因。
我们要区分这个final与前面提到的String类前面的final的作用,String前的final表示类String不能被继承、不能被修改。
下面我们就看下String类的具体实现。
1.1 无参构造函数
/** * Initializes a newly created {@code String} object so that it represents * an empty character sequence. Note that use of this constructor is * unnecessary since Strings are immutable. */ public String() { this.value = "".value; }
调用这个构造函数可以构造一个空的字符串对象,可是字符串是不可变的,调用这个构造函数没有什么意义。
1.2
1 /** 2 * Initializes a newly created {@code String} object so that it represents 3 * the same sequence of characters as the argument; in other words, the 4 * newly created string is a copy of the argument string. Unless an 5 * explicit copy of {@code original} is needed, use of this constructor is 6 * unnecessary since Strings are immutable. 7 * 8 * @param original 9 * A {@code String} 10 */ 11 public String(String original) { 12 this.value = original.value; 13 this.hash = original.hash; 14 }
这个构造函数接收一个字符串对象orginal作为参数,并用它初始化一个新创建的字符串对象,使其表示一个与参数相同的字符序列;换句话说,新创建的字符串是该参数字符串的副本。
除非需要original的显式副本,否则不要使用此构造函数。
1.3
1 /** 2 * Allocates a new {@code String} so that it represents the sequence of 3 * characters currently contained in the character array argument. The 4 * contents of the character array are copied; subsequent modification of 5 * the character array does not affect the newly created string. 6 * 7 * @param value 8 * The initial value of the string 9 */ 10 public String(char value[]) { 11 this.value = Arrays.copyOf(value, value.length); 12 }
该构造函数接受一个字符数组value作为初始值来构造一个新的字符串,以表示字符数组参数中当前包含的字符序列。字符数组value的内容已被复制到字符串对象中,因此后续对字符数组的修改不会影响新创建的字符串。
1.4
/** * Allocates a new {@code String} that contains characters from a subarray * of the character array argument. The {@code offset} argument is the * index of the first character of the subarray and the {@code count} * argument specifies the length of the subarray. The contents of the * subarray are copied; subsequent modification of the character array does * not affect the newly created string. * * @param value Array that is the source of characters * @param offset The initial offset * @param count The length * @throws IndexOutOfBoundsException * If the {@code offset} and {@code count} arguments index * characters outside the bounds of the {@code value} array */ public String(char value[], int offset, int count) { if (offset < 0) { throw new StringIndexOutOfBoundsException(offset); } if (count <= 0) { if (count < 0) { throw new StringIndexOutOfBoundsException(count); } if (offset <= value.length) { this.value = "".value; return; } } // Note: offset or count might be near -1>>>1. if (offset > value.length - count) { throw new StringIndexOutOfBoundsException(offset + count); } this.value = Arrays.copyOfRange(value, offset, offset+count); }
该构造函数会分配一个新的字符串,初始值取自字符数组value,offset参数是子数组第一个字符的索引,count参数指定子数组的长度。
当count=0且offset<=value.length时,会返回一个空的字符串。
1.5
1 /** 2 * Allocates a new {@code String} that contains characters from a subarray 3 * of the <a href="Character.html#unicode">Unicode code point</a> array 4 * argument. The {@code offset} argument is the index of the first code 5 * point of the subarray and the {@code count} argument specifies the 6 * length of the subarray. The contents of the subarray are converted to 7 * {@code char}s; subsequent modification of the {@code int} array does not 8 * affect the newly created string. 9 * 10 * @param codePoints Array that is the source of Unicode code points 11 * @param offset The initial offset 12 * @param count The length 13 * @throws IllegalArgumentException If any invalid Unicode code point is found in codePoints 14 * @throws IndexOutOfBoundsException If the offset and count arguments index characters outside the bounds of the codePoints array 15 * @since 1.5 16 */ 17 public String(int[] codePoints, int offset, int count) { 18 if (offset < 0) { 19 throw new StringIndexOutOfBoundsException(offset); 20 } 21 if (count <= 0) { 22 if (count < 0) { 23 throw new StringIndexOutOfBoundsException(count); 24 } 25 if (offset <= codePoints.length) { 26 this.value = "".value; 27 return; 28 } 29 } 30 // Note: offset or count might be near -1>>>1. 31 if (offset > codePoints.length - count) { 32 throw new StringIndexOutOfBoundsException(offset + count); 33 } 34 35 final int end = offset + count; 36 37 // Pass 1: Compute precise size of char[] 38 int n = count; 39 for (int i = offset; i < end; i++) { 40 int c = codePoints[i]; 41 if (Character.isBmpCodePoint(c)) 42 continue; 43 else if (Character.isValidCodePoint(c)) 44 n++; 45 else throw new IllegalArgumentException(Integer.toString(c)); 46 } 47 48 // Pass 2: Allocate and fill in char[] 49 final char[] v = new char[n]; 50 51 for (int i = offset, j = 0; i < end; i++, j++) { 52 int c = codePoints[i]; 53 if (Character.isBmpCodePoint(c)) 54 v[j] = (char)c; 55 else 56 Character.toSurrogates(c, v, j++); 57 } 58 59 this.value = v; 60 }
该构造函数从代码点数组构造字符串:
先对offset、count等做判断,看是否超出界限,然后计算字符数组大的精确大小,最后将代码点数组的内容拷贝到数组v中并返回(这里涉及到字符编码的知识,会在Character源码解析中详细叙述)。
1.6
1 /* Common private utility method used to bounds check the byte array 2 * and requested offset & length values used by the String(byte[],..) 3 * constructors. 4 */ 5 private static void checkBounds(byte[] bytes, int offset, int length) { 6 if (length < 0) 7 throw new StringIndexOutOfBoundsException(length); 8 if (offset < 0) 9 throw new StringIndexOutOfBoundsException(offset); 10 if (offset > bytes.length - length) 11 throw new StringIndexOutOfBoundsException(offset + length); 12 }
这个方法只是单纯的进行边界检查,length、offset不能小于零,而且offset+lenght不能超出字节数组的长度。
1 /** 2 * Constructs a new {@code String} by decoding the specified subarray of 3 * bytes using the specified charset. The length of the new {@code String} 4 * is a function of the charset, and hence may not be equal to the length 5 * of the subarray. 6 * 7 * <p> The behavior of this constructor when the given bytes are not valid 8 * in the given charset is unspecified. The {@link 9 * java.nio.charset.CharsetDecoder} class should be used when more control 10 * over the decoding process is required. 11 * 12 * @param bytes 13 * The bytes to be decoded into characters 14 * 15 * @param offset 16 * The index of the first byte to decode 17 * 18 * @param length 19 * The number of bytes to decode 20 21 * @param charsetName 22 * The name of a supported {@linkplain java.nio.charset.Charset 23 * charset} 24 * 25 * @throws UnsupportedEncodingException 26 * If the named charset is not supported 27 * 28 * @throws IndexOutOfBoundsException 29 * If the {@code offset} and {@code length} arguments index 30 * characters outside the bounds of the {@code bytes} array 31 * 32 * @since JDK1.1 33 */ 34 public String(byte bytes[], int offset, int length, String charsetName) 35 throws UnsupportedEncodingException { 36 if (charsetName == null) 37 throw new NullPointerException("charsetName"); 38 checkBounds(bytes, offset, length); 39 this.value = StringCoding.decode(charsetName, bytes, offset, length); 40 } 41 42 /** 43 * Constructs a new {@code String} by decoding the specified subarray of 44 * bytes using the specified {@linkplain java.nio.charset.Charset charset}. 45 * The length of the new {@code String} is a function of the charset, and 46 * hence may not be equal to the length of the subarray. 47 * 48 * <p> This method always replaces malformed-input and unmappable-character 49 * sequences with this charset‘s default replacement string. The {@link 50 * java.nio.charset.CharsetDecoder} class should be used when more control 51 * over the decoding process is required. 52 * 53 * @param bytes 54 * The bytes to be decoded into characters 55 * 56 * @param offset 57 * The index of the first byte to decode 58 * 59 * @param length 60 * The number of bytes to decode 61 * 62 * @param charset 63 * The {@linkplain java.nio.charset.Charset charset} to be used to 64 * decode the {@code bytes} 65 * 66 * @throws IndexOutOfBoundsException 67 * If the {@code offset} and {@code length} arguments index 68 * characters outside the bounds of the {@code bytes} array 69 * 70 * @since 1.6 71 */ 72 public String(byte bytes[], int offset, int length, Charset charset) { 73 if (charset == null) 74 throw new NullPointerException("charset"); 75 checkBounds(bytes, offset, length); 76 this.value = StringCoding.decode(charset, bytes, offset, length); 77 }
这两个构造函数使用指定的字符集解码字节数组,构造一个新的字符串。解码的字符集可以使用字符集名指定或者直接将字符集传入。
decode方法在StringCoding源码解析中说明。
注意,如果给定的字符集无效,构造函数的行为没有指定。
1.7
/** * Constructs a new {@code String} by decoding the specified array of bytes * using the specified {@linkplain java.nio.charset.Charset charset}. The * length of the new {@code String} is a function of the charset, and hence * may not be equal to the length of the byte array. * * <p> The behavior of this constructor when the given bytes are not valid * in the given charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param charsetName * The name of a supported {@linkplain java.nio.charset.Charset * charset} * * @throws UnsupportedEncodingException * If the named charset is not supported * * @since JDK1.1 */ public String(byte bytes[], String charsetName) throws UnsupportedEncodingException { this(bytes, 0, bytes.length, charsetName); } /** * Constructs a new {@code String} by decoding the specified array of * bytes using the specified {@linkplain java.nio.charset.Charset charset}. * The length of the new {@code String} is a function of the charset, and * hence may not be equal to the length of the byte array. * * <p> This method always replaces malformed-input and unmappable-character * sequences with this charset‘s default replacement string. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param charset * The {@linkplain java.nio.charset.Charset charset} to be used to * decode the {@code bytes} * * @since 1.6 */ public String(byte bytes[], Charset charset) { this(bytes, 0, bytes.length, charset); } /** * Constructs a new {@code String} by decoding the specified subarray of * bytes using the platform‘s default charset. The length of the new * {@code String} is a function of the charset, and hence may not be equal * to the length of the subarray. * * <p> The behavior of this constructor when the given bytes are not valid * in the default charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param offset * The index of the first byte to decode * * @param length * The number of bytes to decode * * @throws IndexOutOfBoundsException * If the {@code offset} and the {@code length} arguments index * characters outside the bounds of the {@code bytes} array * * @since JDK1.1 */ public String(byte bytes[], int offset, int length) { checkBounds(bytes, offset, length); this.value = StringCoding.decode(bytes, offset, length); } /** * Constructs a new {@code String} by decoding the specified array of bytes * using the platform‘s default charset. The length of the new {@code * String} is a function of the charset, and hence may not be equal to the * length of the byte array. * * <p> The behavior of this constructor when the given bytes are not valid * in the default charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @since JDK1.1 */ public String(byte bytes[]) { this(bytes, 0, bytes.length); }
上面这几个构造函数很简单,就不再多说了。
1.8
1 /** 2 * Allocates a new string that contains the sequence of characters 3 * currently contained in the string buffer argument. The contents of the 4 * string buffer are copied; subsequent modification of the string buffer 5 * does not affect the newly created string. 6 * 7 * @param buffer 8 * A {@code StringBuffer} 9 */ 10 public String(StringBuffer buffer) { 11 synchronized(buffer) { 12 this.value = Arrays.copyOf(buffer.getValue(), buffer.length()); 13 } 14 } 15 16 /** 17 * Allocates a new string that contains the sequence of characters 18 * currently contained in the string builder argument. The contents of the 19 * string builder are copied; subsequent modification of the string builder 20 * does not affect the newly created string. 21 * 22 * <p> This constructor is provided to ease migration to {@code 23 * StringBuilder}. Obtaining a string from a string builder via the {@code 24 * toString} method is likely to run faster and is generally preferred. 25 * 26 * @param builder 27 * A {@code StringBuilder} 28 * 29 * @since 1.5 30 */ 31 public String(StringBuilder builder) { 32 this.value = Arrays.copyOf(builder.getValue(), builder.length()); 33 }
除了前面所示的,可以从字符串、字符数组、代码点数组、字节数组构造字符串外,也可以使用StringBuffer和StringBuilder构造字符串。
1 /** 2 * Returns the length of this string. 3 * The length is equal to the number of <a href="Character.html#unicode">Unicode 4 * code units</a> in the string. 5 * 6 * @return the length of the sequence of characters represented by this 7 * object. 8 */ 9 public int length() { 10 return value.length; 11 }
length()方法返回字符串的长度,即字符串中Unicode代码单元的数量。
1 /** 2 * Returns {@code true} if, and only if, {@link #length()} is {@code 0}. 3 * 4 * @return {@code true} if {@link #length()} is {@code 0}, otherwise 5 * {@code false} 6 * 7 * @since 1.6 8 */ 9 public boolean isEmpty() { 10 return value.length == 0; 11 }
判断字符串是否为空。
1 /** 2 * Returns the {@code char} value at the 3 * specified index. An index ranges from {@code 0} to 4 * {@code length() - 1}. The first {@code char} value of the sequence 5 * is at index {@code 0}, the next at index {@code 1}, 6 * and so on, as for array indexing. 7 * 8 * <p>If the {@code char} value specified by the index is a 9 * <a href="Character.html#unicode">surrogate</a>, the surrogate 10 * value is returned. 11 * 12 * @param index the index of the {@code char} value. 13 * @return the {@code char} value at the specified index of this string. 14 * The first {@code char} value is at index {@code 0}. 15 * @exception IndexOutOfBoundsException if the {@code index} 16 * argument is negative or not less than the length of this 17 * string. 18 */ 19 public char charAt(int index) { 20 if ((index < 0) || (index >= value.length)) { 21 throw new StringIndexOutOfBoundsException(index); 22 } 23 return value[index]; 24 }
返回指定索引处的字符,索引范围为从0到lenght()-1。
如果索引指定的char值是代理项,则返回代理项值。
1 /** 2 * Returns the character (Unicode code point) at the specified 3 * index. The index refers to {@code char} values 4 * (Unicode code units) and ranges from {@code 0} to 5 * {@link #length()}{@code - 1}. 6 * 7 * <p> If the {@code char} value specified at the given index 8 * is in the high-surrogate range, the following index is less 9 * than the length of this {@code String}, and the 10 * {@code char} value at the following index is in the 11 * low-surrogate range, then the supplementary code point 12 * corresponding to this surrogate pair is returned. Otherwise, 13 * the {@code char} value at the given index is returned. 14 * 15 * @param index the index to the {@code char} values 16 * @return the code point value of the character at the 17 * {@code index} 18 * @exception IndexOutOfBoundsException if the {@code index} 19 * argument is negative or not less than the length of this 20 * string. 21 * @since 1.5 22 */ 23 public int codePointAt(int index) { 24 if ((index < 0) || (index >= value.length)) { 25 throw new StringIndexOutOfBoundsException(index); 26 } 27 return Character.codePointAtImpl(value, index, value.length); 28 }
现在只需记住返回的是索引index处的代码点即可。
1 /** 2 * Returns the character (Unicode code point) before the specified 3 * index. The index refers to {@code char} values 4 * (Unicode code units) and ranges from {@code 1} to {@link 5 * CharSequence#length() length}. 6 * 7 * <p> If the {@code char} value at {@code (index - 1)} 8 * is in the low-surrogate range, {@code (index - 2)} is not 9 * negative, and the {@code char} value at {@code (index - 10 * 2)} is in the high-surrogate range, then the 11 * supplementary code point value of the surrogate pair is 12 * returned. If the {@code char} value at {@code index - 13 * 1} is an unpaired low-surrogate or a high-surrogate, the 14 * surrogate value is returned. 15 * 16 * @param index the index following the code point that should be returned 17 * @return the Unicode code point value before the given index. 18 * @exception IndexOutOfBoundsException if the {@code index} 19 * argument is less than 1 or greater than the length 20 * of this string. 21 * @since 1.5 22 */ 23 public int codePointBefore(int index) { 24 int i = index - 1; 25 if ((i < 0) || (i >= value.length)) { 26 throw new StringIndexOutOfBoundsException(index); 27 } 28 return Character.codePointBeforeImpl(value, index, 0); 29 }
现在只需记住返回的是索引index之前的代码点即可,在类Character源码解析时详细介绍代码点相关内容。
1 /** 2 * Returns the number of Unicode code points in the specified text 3 * range of this {@code String}. The text range begins at the 4 * specified {@code beginIndex} and extends to the 5 * {@code char} at index {@code endIndex - 1}. Thus the 6 * length (in {@code char}s) of the text range is 7 * {@code endIndex-beginIndex}. Unpaired surrogates within 8 * the text range count as one code point each. 9 * 10 * @param beginIndex the index to the first {@code char} of 11 * the text range. 12 * @param endIndex the index after the last {@code char} of 13 * the text range. 14 * @return the number of Unicode code points in the specified text 15 * range 16 * @exception IndexOutOfBoundsException if the 17 * {@code beginIndex} is negative, or {@code endIndex} 18 * is larger than the length of this {@code String}, or 19 * {@code beginIndex} is larger than {@code endIndex}. 20 * @since 1.5 21 */ 22 public int codePointCount(int beginIndex, int endIndex) { 23 if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) { 24 throw new IndexOutOfBoundsException(); 25 } 26 return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex); 27 }
返回此字符串的指定文本范围中的 Unicode 代码点数。文本范围从beginIndex开始,到endIndex结束,长度(用char表示)为endIndex-beginIndex。该文本范围内每个未配对的代理项计为一个代码点。
1 /** 2 * Returns the index within this {@code String} that is 3 * offset from the given {@code index} by 4 * {@code codePointOffset} code points. Unpaired surrogates 5 * within the text range given by {@code index} and 6 * {@code codePointOffset} count as one code point each. 7 * 8 * @param index the index to be offset 9 * @param codePointOffset the offset in code points 10 * @return the index within this {@code String} 11 * @exception IndexOutOfBoundsException if {@code index} 12 * is negative or larger then the length of this 13 * {@code String}, or if {@code codePointOffset} is positive 14 * and the substring starting with {@code index} has fewer 15 * than {@code codePointOffset} code points, 16 * or if {@code codePointOffset} is negative and the substring 17 * before {@code index} has fewer than the absolute value 18 * of {@code codePointOffset} code points. 19 * @since 1.5 20 */ 21 public int offsetByCodePoints(int index, int codePointOffset) { 22 if (index < 0 || index > value.length) { 23 throw new IndexOutOfBoundsException(); 24 } 25 return Character.offsetByCodePointsImpl(value, 0, value.length, 26 index, codePointOffset); 27 }
返回字符串中从给定的index处偏移codePointOffset个代码点的索引。
标签:而且 spec 对象 source 文本 jdk xtend 使用 replace
原文地址:http://www.cnblogs.com/songwenlong/p/6917117.html