(更新的答案)
我可以说,无论您的程序速度出现什么问题,选择分词器都不是其中之一。最初运行每种方法以使初始化古怪均匀之后,我可以在毫秒内解析1000000行的“ 12
34”。如果愿意,可以改用indexOf,但我确实认为您需要查看瓶颈的其他代码,而不是这种微优化。拆分对我来说是一个惊喜-
与其他方法相比,它真的非常慢。我已经在Guava拆分测试中添加了它,它比String.split快,但比StringTokenizer慢。
- 分割:371ms
- IndexOf:48毫秒
- StringTokenizer:92毫秒
- Guava Splitter.split():108毫秒
- CsvMapper构建一个csv文档并解析为POJOS:237ms(如果将这些行构建为一个文档则为175ms!)
即使在数百万行中,此处的差异也可以忽略不计。
现在我的博客上有这样的文章:http : //demeranville.com/battle-of-the-
tokenizers-delimited-text-parser-performance/
我运行的代码是:
import java.util.StringTokenizer;import org.junit.Test;public class TestSplitter {private static final String line = "12 34";private static final int RUNS = 1000000;//000000;public final void testSplit() { long start = System.currentTimeMillis(); for (int i=0;i<RUNS;i++){ String[] st = line.split(" "); int x = Integer.parseInt(st[0]); int y = Integer.parseInt(st[1]); } System.out.println("Split: "+(System.currentTimeMillis() - start)+"ms");}public final void testIndexOf() { long start = System.currentTimeMillis(); for (int i=0;i<RUNS;i++){ int index = line.indexOf(' '); int x = Integer.parseInt(line.substring(0,index)); int y = Integer.parseInt(line.substring(index+1)); }System.out.println("IndexOf: "+(System.currentTimeMillis() - start)+"ms"); }public final void testTokenizer() { long start = System.currentTimeMillis(); for (int i=0;i<RUNS;i++){ StringTokenizer st = new StringTokenizer(line, " "); int x = Integer.parseInt(st.nextToken()); int y = Integer.parseInt(st.nextToken()); } System.out.println("StringTokenizer: "+(System.currentTimeMillis() - start)+"ms");}@Testpublic final void testAll() { this.testSplit(); this.testIndexOf(); this.testTokenizer(); this.testSplit(); this.testIndexOf(); this.testTokenizer();}}
eta:这是番石榴代码:
public final void testGuavaSplit() { long start = System.currentTimeMillis(); Splitter split = Splitter.on(" "); for (int i=0;i<RUNS;i++){ Iterator<String> it = split.split(line).iterator(); int x = Integer.parseInt(it.next()); int y = Integer.parseInt(it.next()); } System.out.println("GuavaSplit: "+(System.currentTimeMillis() - start)+"ms");}
更新
我也添加了CsvMapper测试:
public static class CSV{ public int x; public int y;}public final void testJacksonSplit() throws JsonProcessingException, IOException { CsvMapper mapper = new CsvMapper(); CsvSchema schema = CsvSchema.builder().addColumn("x", ColumnType.NUMBER).addColumn("y", ColumnType.NUMBER).setColumnSeparator(' ').build(); long start = System.currentTimeMillis(); StringBuilder builder = new StringBuilder(); for (int i = 0; i < RUNS; i++) { builder.append(line); builder.append('n'); }String input = builder.toString(); MappingIterator<CSV> it = mapper.reader(CSV.class).with(schema).readValues(input); while (it.hasNext()){ CSV csv = it.next(); } System.out.println("CsvMapperSplit: " + (System.currentTimeMillis() - start) + "ms");}
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)