标签:beginning ast 移除 ini star most substr nta gre
这里面主要介绍一下关于String类中的split方法的使用以及原理。
split函数java docs的说明:
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.A zero-width match at the beginning however never produces such empty leading substring. The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array‘s length will be no greater than n, and the array‘s last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
split函数的工作原理大概可以分为以下的几步:
1、遍历查找到regex,把regex前面到上一次的位置中间部分添加到list。这是split函数的核心部分 2、如果没有找到,则返回自身的一维数组 3、是否添加剩余的内容到list中 4、是否去除list里面的空字符串 5、从上面的list里面返回成数组
对于split函数limit的值可能会出现以下的几种情况:
1、Limit < 0, e.g. limit = -1 2、limit = 0,不传默认是0 3、Limit > 0,e.g. limit = 3 4、limit > size,e.g. limit = 20
我们通过以下的例子来分析一下split函数的原理。
public void test() { String string = "linux---abc-linux-"; splitStringWithLimit(string, -1); splitStringWithLimit(string, 0); splitStringWithLimit(string, 3); splitStringWithLimit(string, 20); } public void splitStringWithLimit(String string, int limit) { String[] arrays = string.split("-", limit); String result = MessageFormat.format("arrays={0}, length={1}", Arrays.toString(arrays), arrays.length); System.out.println(result); } // arrays=[linux, , , abc, linux, ], length=6 // arrays=[linux, , , abc, linux], length=5 // arrays=[linux, , -abc-linux-], length=3 // arrays=[linux, , , abc, linux, ], length=6
if (((regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) || (regex.length() == 2 && regex.charAt(0) == ‘\\‘ && (((ch = regex.charAt(1))-‘0‘)|(‘9‘-ch)) < 0 && ((ch-‘a‘)|(‘z‘-ch)) < 0 && ((ch-‘A‘)|(‘Z‘-ch)) < 0)) && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE))
{ int off = 0; int next = 0; boolean limited = limit > 0; ArrayList<String> list = new ArrayList<>(); while ((next = indexOf(ch, off)) != -1) { if (!limited || list.size() < limit - 1) { list.add(substring(off, next)); off = next + 1; } else { // last one //assert (list.size() == limit - 1); list.add(substring(off, value.length)); off = value.length; break; } } // If no match was found, return this if (off == 0) return new String[]{this}; // Add remaining segment if (!limited || list.size() < limit) list.add(substring(off, value.length)); // Construct result int resultSize = list.size(); if (limit == 0) { while (resultSize > 0 && list.get(resultSize - 1).length() == 0) { resultSize--; } } String[] result = new String[resultSize]; return list.subList(0, resultSize).toArray(result); }
return Pattern.compile(regex).split(this, limit);
使用正则表达式的mather函数,查找到regex的位置。维护着index变量,相当于上述的off。而matcher查找到的m.start()则相当于上述的next。每次查找之后把index到m.start()中间的内容添加到list中。最后更新off的值为m.end()。以供下一次的查找。
1 public String[] split(CharSequence input, int limit) { 2 int index = 0; 3 boolean matchLimited = limit > 0; 4 ArrayList<String> matchList = new ArrayList<>(); 5 Matcher m = matcher(input); 6 7 // Add segments before each match found 8 while(m.find()) { 9 if (!matchLimited || matchList.size() < limit - 1) { 10 if (index == 0 && index == m.start() && m.start() == m.end()) { 11 // no empty leading substring included for zero-width match 12 // at the beginning of the input char sequence. 13 continue; 14 } 15 String match = input.subSequence(index, m.start()).toString(); 16 matchList.add(match); 17 index = m.end(); 18 } else if (matchList.size() == limit - 1) { // last one 19 String match = input.subSequence(index, 20 input.length()).toString(); 21 matchList.add(match); 22 index = m.end(); 23 } 24 } 25 26 // If no match was found, return this 27 if (index == 0) 28 return new String[] {input.toString()}; 29 30 // Add remaining segment 31 if (!matchLimited || matchList.size() < limit) 32 matchList.add(input.subSequence(index, input.length()).toString()); 33 34 // Construct result 35 int resultSize = matchList.size(); 36 if (limit == 0) 37 while (resultSize > 0 && matchList.get(resultSize-1).equals("")) 38 resultSize--; 39 String[] result = new String[resultSize]; 40 return matchList.subList(0, resultSize).toArray(result); 41 }
标签:beginning ast 移除 ini star most substr nta gre
原文地址:https://www.cnblogs.com/huhx/p/baseusejavastringsplit.html