String源码解析（一）

时间：2017-05-29 23:28:36 阅读：339 评论：0 收藏：0 [点我收藏+]

标签：而且 spec 对象 source 文本 jdk xtend 使用 replace

本篇文章内的方法介绍，在方法的上面的注释讲解的很清楚，这里只阐述一些要点。

Java中的String类的定义如下：

1 public final class String
2     implements java.io.Serializable, Comparable<String>, CharSequence { ...}

可以看到，String是final的，而且继承了Serializable、Comparable和CharSequence接口。

正是因为这个特性，字符串对象可以被共享，例如下面两个字符串是等价的：

1      String str = "abc";
2  
3 is equivalent to:
4 
5      char data[] = {‘a‘, ‘b‘, ‘c‘};
6      String str = new String(data);

String类中定义了一个final的字符数组value[]，用来存储字符：

    /** The value is used for character storage. */
    private final char value[];

注意value是final的，所以赋值之后不能被修改，这就是字符串不能被修改的原因。

我们要区分这个final与前面提到的String类前面的final的作用，String前的final表示类String不能被继承、不能被修改。

下面我们就看下String类的具体实现。

1.构造函数

1.1 无参构造函数

    /**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

调用这个构造函数可以构造一个空的字符串对象，可是字符串是不可变的，调用这个构造函数没有什么意义。

1.2

 1     /**
 2      * Initializes a newly created {@code String} object so that it represents
 3      * the same sequence of characters as the argument; in other words, the
 4      * newly created string is a copy of the argument string. Unless an
 5      * explicit copy of {@code original} is needed, use of this constructor is
 6      * unnecessary since Strings are immutable.
 7      *
 8      * @param  original
 9      *         A {@code String}
10      */
11     public String(String original) {
12         this.value = original.value;
13         this.hash = original.hash;
14     }

这个构造函数接收一个字符串对象orginal作为参数，并用它初始化一个新创建的字符串对象，使其表示一个与参数相同的字符序列；换句话说，新创建的字符串是该参数字符串的副本。

除非需要original的显式副本，否则不要使用此构造函数。

1.3

 1     /**
 2      * Allocates a new {@code String} so that it represents the sequence of
 3      * characters currently contained in the character array argument. The
 4      * contents of the character array are copied; subsequent modification of
 5      * the character array does not affect the newly created string.
 6      *
 7      * @param  value
 8      *         The initial value of the string
 9      */
10     public String(char value[]) {
11         this.value = Arrays.copyOf(value, value.length);
12     }

该构造函数接受一个字符数组value作为初始值来构造一个新的字符串，以表示字符数组参数中当前包含的字符序列。字符数组value的内容已被复制到字符串对象中，因此后续对字符数组的修改不会影响新创建的字符串。

1.4

    /**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the character array argument. The {@code offset} argument is the
     * index of the first character of the subarray and the {@code count}
     * argument specifies the length of the subarray. The contents of the
     * subarray are copied; subsequent modification of the character array does
     * not affect the newly created string.
     *
     * @param  value    Array that is the source of characters
     * @param  offset   The initial offset
     * @param  count    The length
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code value} array
     */
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

该构造函数会分配一个新的字符串，初始值取自字符数组value，offset参数是子数组第一个字符的索引，count参数指定子数组的长度。

当count=0且offset<=value.length时，会返回一个空的字符串。

1.5

 1    /**
 2      * Allocates a new {@code String} that contains characters from a subarray
 3      * of the <a href="Character.html#unicode">Unicode code point</a> array
 4      * argument.  The {@code offset} argument is the index of the first code
 5      * point of the subarray and the {@code count} argument specifies the
 6      * length of the subarray.  The contents of the subarray are converted to
 7      * {@code char}s; subsequent modification of the {@code int} array does not
 8      * affect the newly created string.
 9      *
10      * @param  codePoints     Array that is the source of Unicode code points
11      * @param  offset     The initial offset
12      * @param  count     The length
13      * @throws  IllegalArgumentException      If any invalid Unicode code point is found in codePoints
14      * @throws  IndexOutOfBoundsException    If the offset and count arguments index characters outside the bounds of the codePoints array
15      * @since  1.5
16      */
17     public String(int[] codePoints, int offset, int count) {
18         if (offset < 0) {
19             throw new StringIndexOutOfBoundsException(offset);
20         }
21         if (count <= 0) {
22             if (count < 0) {
23                 throw new StringIndexOutOfBoundsException(count);
24             }
25             if (offset <= codePoints.length) {
26                 this.value = "".value;
27                 return;
28             }
29         }
30         // Note: offset or count might be near -1>>>1.
31         if (offset > codePoints.length - count) {
32             throw new StringIndexOutOfBoundsException(offset + count);
33         }
34 
35         final int end = offset + count;
36 
37         // Pass 1: Compute precise size of char[]
38         int n = count;
39         for (int i = offset; i < end; i++) {
40             int c = codePoints[i];
41             if (Character.isBmpCodePoint(c))
42                 continue;
43             else if (Character.isValidCodePoint(c))
44                 n++;
45             else throw new IllegalArgumentException(Integer.toString(c));
46         }
47 
48         // Pass 2: Allocate and fill in char[]
49         final char[] v = new char[n];
50 
51         for (int i = offset, j = 0; i < end; i++, j++) {
52             int c = codePoints[i];
53             if (Character.isBmpCodePoint(c))
54                 v[j] = (char)c;
55             else
56                 Character.toSurrogates(c, v, j++);
57         }
58 
59         this.value = v;
60     }

该构造函数从代码点数组构造字符串：

先对offset、count等做判断，看是否超出界限，然后计算字符数组大的精确大小，最后将代码点数组的内容拷贝到数组v中并返回（这里涉及到字符编码的知识，会在Character源码解析中详细叙述）。

1.6

 1     /* Common private utility method used to bounds check the byte array
 2      * and requested offset & length values used by the String(byte[],..)
 3      * constructors.
 4      */
 5     private static void checkBounds(byte[] bytes, int offset, int length) {
 6         if (length < 0)
 7             throw new StringIndexOutOfBoundsException(length);
 8         if (offset < 0)
 9             throw new StringIndexOutOfBoundsException(offset);
10         if (offset > bytes.length - length)
11             throw new StringIndexOutOfBoundsException(offset + length);
12     }

这个方法只是单纯的进行边界检查，length、offset不能小于零，而且offset+lenght不能超出字节数组的长度。

 1     /**
 2      * Constructs a new {@code String} by decoding the specified subarray of
 3      * bytes using the specified charset.  The length of the new {@code String}
 4      * is a function of the charset, and hence may not be equal to the length
 5      * of the subarray.
 6      *
 7      * <p> The behavior of this constructor when the given bytes are not valid
 8      * in the given charset is unspecified.  The {@link
 9      * java.nio.charset.CharsetDecoder} class should be used when more control
10      * over the decoding process is required.
11      *
12      * @param  bytes
13      *         The bytes to be decoded into characters
14      *
15      * @param  offset
16      *         The index of the first byte to decode
17      *
18      * @param  length
19      *         The number of bytes to decode
20 
21      * @param  charsetName
22      *         The name of a supported {@linkplain java.nio.charset.Charset
23      *         charset}
24      *
25      * @throws  UnsupportedEncodingException
26      *          If the named charset is not supported
27      *
28      * @throws  IndexOutOfBoundsException
29      *          If the {@code offset} and {@code length} arguments index
30      *          characters outside the bounds of the {@code bytes} array
31      *
32      * @since  JDK1.1
33      */
34     public String(byte bytes[], int offset, int length, String charsetName)
35             throws UnsupportedEncodingException {
36         if (charsetName == null)
37             throw new NullPointerException("charsetName");
38         checkBounds(bytes, offset, length);
39         this.value = StringCoding.decode(charsetName, bytes, offset, length);
40     }
41 
42     /**
43      * Constructs a new {@code String} by decoding the specified subarray of
44      * bytes using the specified {@linkplain java.nio.charset.Charset charset}.
45      * The length of the new {@code String} is a function of the charset, and
46      * hence may not be equal to the length of the subarray.
47      *
48      * <p> This method always replaces malformed-input and unmappable-character
49      * sequences with this charset‘s default replacement string.  The {@link
50      * java.nio.charset.CharsetDecoder} class should be used when more control
51      * over the decoding process is required.
52      *
53      * @param  bytes
54      *         The bytes to be decoded into characters
55      *
56      * @param  offset
57      *         The index of the first byte to decode
58      *
59      * @param  length
60      *         The number of bytes to decode
61      *
62      * @param  charset
63      *         The {@linkplain java.nio.charset.Charset charset} to be used to
64      *         decode the {@code bytes}
65      *
66      * @throws  IndexOutOfBoundsException
67      *          If the {@code offset} and {@code length} arguments index
68      *          characters outside the bounds of the {@code bytes} array
69      *
70      * @since  1.6
71      */
72     public String(byte bytes[], int offset, int length, Charset charset) {
73         if (charset == null)
74             throw new NullPointerException("charset");
75         checkBounds(bytes, offset, length);
76         this.value =  StringCoding.decode(charset, bytes, offset, length);
77     }

这两个构造函数使用指定的字符集解码字节数组，构造一个新的字符串。解码的字符集可以使用字符集名指定或者直接将字符集传入。

decode方法在StringCoding源码解析中说明。

注意，如果给定的字符集无效，构造函数的行为没有指定。

1.7

    /**
     * Constructs a new {@code String} by decoding the specified array of bytes
     * using the specified {@linkplain java.nio.charset.Charset charset}.  The
     * length of the new {@code String} is a function of the charset, and hence
     * may not be equal to the length of the byte array.
     *
     * <p> The behavior of this constructor when the given bytes are not valid
     * in the given charset is unspecified.  The {@link
     * java.nio.charset.CharsetDecoder} class should be used when more control
     * over the decoding process is required.
     *
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     *
     * @throws  UnsupportedEncodingException
     *          If the named charset is not supported
     *
     * @since  JDK1.1
     */
    public String(byte bytes[], String charsetName)
            throws UnsupportedEncodingException {
        this(bytes, 0, bytes.length, charsetName);
    }

    /**
     * Constructs a new {@code String} by decoding the specified array of
     * bytes using the specified {@linkplain java.nio.charset.Charset charset}.
     * The length of the new {@code String} is a function of the charset, and
     * hence may not be equal to the length of the byte array.
     *
     * <p> This method always replaces malformed-input and unmappable-character
     * sequences with this charset‘s default replacement string.  The {@link
     * java.nio.charset.CharsetDecoder} class should be used when more control
     * over the decoding process is required.
     *
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  charset
     *         The {@linkplain java.nio.charset.Charset charset} to be used to
     *         decode the {@code bytes}
     *
     * @since  1.6
     */
    public String(byte bytes[], Charset charset) {
        this(bytes, 0, bytes.length, charset);
    }

    /**
     * Constructs a new {@code String} by decoding the specified subarray of
     * bytes using the platform‘s default charset.  The length of the new
     * {@code String} is a function of the charset, and hence may not be equal
     * to the length of the subarray.
     *
     * <p> The behavior of this constructor when the given bytes are not valid
     * in the default charset is unspecified.  The {@link
     * java.nio.charset.CharsetDecoder} class should be used when more control
     * over the decoding process is required.
     *
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  offset
     *         The index of the first byte to decode
     *
     * @param  length
     *         The number of bytes to decode
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and the {@code length} arguments index
     *          characters outside the bounds of the {@code bytes} array
     *
     * @since  JDK1.1
     */
    public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
    }

    /**
     * Constructs a new {@code String} by decoding the specified array of bytes
     * using the platform‘s default charset.  The length of the new {@code
     * String} is a function of the charset, and hence may not be equal to the
     * length of the byte array.
     *
     * <p> The behavior of this constructor when the given bytes are not valid
     * in the default charset is unspecified.  The {@link
     * java.nio.charset.CharsetDecoder} class should be used when more control
     * over the decoding process is required.
     *
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @since  JDK1.1
     */
    public String(byte bytes[]) {
        this(bytes, 0, bytes.length);
    }

上面这几个构造函数很简单，就不再多说了。

1.8

 1     /**
 2      * Allocates a new string that contains the sequence of characters
 3      * currently contained in the string buffer argument. The contents of the
 4      * string buffer are copied; subsequent modification of the string buffer
 5      * does not affect the newly created string.
 6      *
 7      * @param  buffer
 8      *         A {@code StringBuffer}
 9      */
10     public String(StringBuffer buffer) {
11         synchronized(buffer) {
12             this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
13         }
14     }
15 
16     /**
17      * Allocates a new string that contains the sequence of characters
18      * currently contained in the string builder argument. The contents of the
19      * string builder are copied; subsequent modification of the string builder
20      * does not affect the newly created string.
21      *
22      * <p> This constructor is provided to ease migration to {@code
23      * StringBuilder}. Obtaining a string from a string builder via the {@code
24      * toString} method is likely to run faster and is generally preferred.
25      *
26      * @param   builder
27      *          A {@code StringBuilder}
28      *
29      * @since  1.5
30      */
31     public String(StringBuilder builder) {
32         this.value = Arrays.copyOf(builder.getValue(), builder.length());
33     }

除了前面所示的，可以从字符串、字符数组、代码点数组、字节数组构造字符串外，也可以使用StringBuffer和StringBuilder构造字符串。

2. length()

 1     /**
 2      * Returns the length of this string.
 3      * The length is equal to the number of <a href="Character.html#unicode">Unicode
 4      * code units</a> in the string.
 5      *
 6      * @return  the length of the sequence of characters represented by this
 7      *          object.
 8      */
 9     public int length() {
10         return value.length;
11     }

length()方法返回字符串的长度，即字符串中Unicode代码单元的数量。

3.isEmpty()

 1     /**
 2      * Returns {@code true} if, and only if, {@link #length()} is {@code 0}.
 3      *
 4      * @return {@code true} if {@link #length()} is {@code 0}, otherwise
 5      * {@code false}
 6      *
 7      * @since 1.6
 8      */
 9     public boolean isEmpty() {
10         return value.length == 0;
11     }

判断字符串是否为空。

4.charAt(int index)

 1     /**
 2      * Returns the {@code char} value at the
 3      * specified index. An index ranges from {@code 0} to
 4      * {@code length() - 1}. The first {@code char} value of the sequence
 5      * is at index {@code 0}, the next at index {@code 1},
 6      * and so on, as for array indexing.
 7      *
 8      * <p>If the {@code char} value specified by the index is a
 9      * <a href="Character.html#unicode">surrogate</a>, the surrogate
10      * value is returned.
11      *
12      * @param      index   the index of the {@code char} value.
13      * @return     the {@code char} value at the specified index of this string.
14      *             The first {@code char} value is at index {@code 0}.
15      * @exception  IndexOutOfBoundsException  if the {@code index}
16      *             argument is negative or not less than the length of this
17      *             string.
18      */
19     public char charAt(int index) {
20         if ((index < 0) || (index >= value.length)) {
21             throw new StringIndexOutOfBoundsException(index);
22         }
23         return value[index];
24     }

返回指定索引处的字符，索引范围为从0到lenght()-1。

如果索引指定的char值是代理项，则返回代理项值。

5.codePointAt(int index)

 1     /**
 2      * Returns the character (Unicode code point) at the specified
 3      * index. The index refers to {@code char} values
 4      * (Unicode code units) and ranges from {@code 0} to
 5      * {@link #length()}{@code  - 1}.
 6      *
 7      * <p> If the {@code char} value specified at the given index
 8      * is in the high-surrogate range, the following index is less
 9      * than the length of this {@code String}, and the
10      * {@code char} value at the following index is in the
11      * low-surrogate range, then the supplementary code point
12      * corresponding to this surrogate pair is returned. Otherwise,
13      * the {@code char} value at the given index is returned.
14      *
15      * @param      index the index to the {@code char} values
16      * @return     the code point value of the character at the
17      *             {@code index}
18      * @exception  IndexOutOfBoundsException  if the {@code index}
19      *             argument is negative or not less than the length of this
20      *             string.
21      * @since      1.5
22      */
23     public int codePointAt(int index) {
24         if ((index < 0) || (index >= value.length)) {
25             throw new StringIndexOutOfBoundsException(index);
26         }
27         return Character.codePointAtImpl(value, index, value.length);
28     }

现在只需记住返回的是索引index处的代码点即可。

6.codePointBefore(int index)

 1     /**
 2      * Returns the character (Unicode code point) before the specified
 3      * index. The index refers to {@code char} values
 4      * (Unicode code units) and ranges from {@code 1} to {@link
 5      * CharSequence#length() length}.
 6      *
 7      * <p> If the {@code char} value at {@code (index - 1)}
 8      * is in the low-surrogate range, {@code (index - 2)} is not
 9      * negative, and the {@code char} value at {@code (index -
10      * 2)} is in the high-surrogate range, then the
11      * supplementary code point value of the surrogate pair is
12      * returned. If the {@code char} value at {@code index -
13      * 1} is an unpaired low-surrogate or a high-surrogate, the
14      * surrogate value is returned.
15      *
16      * @param     index the index following the code point that should be returned
17      * @return    the Unicode code point value before the given index.
18      * @exception IndexOutOfBoundsException if the {@code index}
19      *            argument is less than 1 or greater than the length
20      *            of this string.
21      * @since     1.5
22      */
23     public int codePointBefore(int index) {
24         int i = index - 1;
25         if ((i < 0) || (i >= value.length)) {
26             throw new StringIndexOutOfBoundsException(index);
27         }
28         return Character.codePointBeforeImpl(value, index, 0);
29     }

现在只需记住返回的是索引index之前的代码点即可，在类Character源码解析时详细介绍代码点相关内容。

7.codePointCount(int beginIndex, int endIndex)

 1     /**
 2      * Returns the number of Unicode code points in the specified text
 3      * range of this {@code String}. The text range begins at the
 4      * specified {@code beginIndex} and extends to the
 5      * {@code char} at index {@code endIndex - 1}. Thus the
 6      * length (in {@code char}s) of the text range is
 7      * {@code endIndex-beginIndex}. Unpaired surrogates within
 8      * the text range count as one code point each.
 9      *
10      * @param beginIndex the index to the first {@code char} of
11      * the text range.
12      * @param endIndex the index after the last {@code char} of
13      * the text range.
14      * @return the number of Unicode code points in the specified text
15      * range
16      * @exception IndexOutOfBoundsException if the
17      * {@code beginIndex} is negative, or {@code endIndex}
18      * is larger than the length of this {@code String}, or
19      * {@code beginIndex} is larger than {@code endIndex}.
20      * @since  1.5
21      */
22     public int codePointCount(int beginIndex, int endIndex) {
23         if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {
24             throw new IndexOutOfBoundsException();
25         }
26         return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);
27     }

返回此字符串的指定文本范围中的 Unicode 代码点数。文本范围从beginIndex开始，到endIndex结束，长度（用char表示）为endIndex-beginIndex。该文本范围内每个未配对的代理项计为一个代码点。

8.offsetByCodePoints(int index, int codePointOffset)

 1     /**
 2      * Returns the index within this {@code String} that is
 3      * offset from the given {@code index} by
 4      * {@code codePointOffset} code points. Unpaired surrogates
 5      * within the text range given by {@code index} and
 6      * {@code codePointOffset} count as one code point each.
 7      *
 8      * @param index the index to be offset
 9      * @param codePointOffset the offset in code points
10      * @return the index within this {@code String}
11      * @exception IndexOutOfBoundsException if {@code index}
12      *   is negative or larger then the length of this
13      *   {@code String}, or if {@code codePointOffset} is positive
14      *   and the substring starting with {@code index} has fewer
15      *   than {@code codePointOffset} code points,
16      *   or if {@code codePointOffset} is negative and the substring
17      *   before {@code index} has fewer than the absolute value
18      *   of {@code codePointOffset} code points.
19      * @since 1.5
20      */
21     public int offsetByCodePoints(int index, int codePointOffset) {
22         if (index < 0 || index > value.length) {
23             throw new IndexOutOfBoundsException();
24         }
25         return Character.offsetByCodePointsImpl(value, 0, value.length,
26                 index, codePointOffset);
27     }

返回字符串中从给定的index处偏移codePointOffset个代码点的索引。

String源码解析（一）

标签：而且 spec 对象 source 文本 jdk xtend 使用 replace

原文地址：http://www.cnblogs.com/songwenlong/p/6917117.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行