Java String.split（）有时会给出空白字符串_随笔

Java String.split（）有时会给出空白字符串

深入研究源代码，我发现了此行为背后的确切问题。

该

String.split()

方法在内部使用

Pattern.split()

。在返回结果数组之前，split方法将检查最后一个匹配的索引或是否确实存在匹配项。如果最后一个匹配的索引是

，则意味着您的模式在字符串的开头仅匹配了一个空字符串，或者根本不匹配，在这种情况下，返回的数组是包含相同元素的单个元素数组。

这是源代码：

public String[] split(CharSequence input, int limit) {        int index = 0;        boolean matchLimited = limit > 0;        ArrayList<String> matchList = new ArrayList<String>();        Matcher m = matcher(input);        // Add segments before each match found        while(m.find()) { if (!matchLimited || matchList.size() < limit - 1) {     String match = input.subSequence(index, m.start()).toString();     matchList.add(match);     // Consider this assignment. For a single empty string match     // m.end() will be 0, and hence index will also be 0     index = m.end(); } else if (matchList.size() == limit - 1) { // last one     String match = input.subSequence(index,     input.length()).toString();     matchList.add(match);     index = m.end(); }        }        // If no match was found, return this        if (index == 0) return new String[] {input.toString()};        // Rest of them is not required

如果以上代码中的最后一个条件-

index == 0

为true，则返回包含输入字符串的单个元素数组。

现在，考虑

index

可以为的情况

。

当根本没有匹配项时。（如该条件上方的注释中所述）
如果在开头找到匹配项，并且匹配的字符串的长度为
```
0
```
，则该
```
if
```
块中（
```
while
```
循环内）的index值-
```
index = m.end();
```

将为0。唯一可能的匹配字符串是一个 空字符串 （长度= 0）。这就是这里的情况。并且也不应再有其他匹配项，否则

index

将更新为其他索引。

因此，请考虑您的情况：

对于
```
d%
```
，在第一个模式之前只有一个匹配项
```
d
```
。因此，索引值为
```
0
```
。但是由于没有其他匹配项，索引值不会更新，
```
if
```
条件变为
```
true
```
，并返回具有原始字符串的单个元素数组。
因为
```
d20+2
```
将有两场比赛，一场比赛之前
```
d
```
，一场比赛之前
```
+
```
。因此索引值将被更新，因此
```
ArrayList
```
将返回上述代码中的，其中包含空字符串，这是由于分隔符分割而导致的，该分隔符是字符串的第一个字符，如@Stema的答案中所述。

因此，要获得所需的行为（仅当分隔符不在开头时才在分隔符上拆分，可以在正则表达式模式中添加负向后看）：

"(?<!^)(?=[dk+-])"  // You don't need to escape + and hyphen(when at the end)

这将拆分为空字符串，后跟您的字符类，但不以字符串开头。

考虑

"ad%"

在正则表达式模式-
上拆分字符串的情况

"a(?=[dk+-])"

。这将为您提供一个数组，其中第一个元素为空字符串。唯一的变化是，空字符串替换为

：

"ad%".split("a(?=[dk+-])");  // Prints - `[, d%]`

为什么？这是因为匹配的字符串的长度为

。因此，第一个匹配项之后的索引值-

m.end()

不会是

but

，因此不会返回单个元素数组。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5478169.html

Java String.split（）有时会给出空白字符串

发表评论

评论列表（0条）