您无法使用dict-
comprehension有效地(至少在内存方面)做到这一点,因为这样一来,您就必须在另一本词典中跟踪当前计数,即更多的内存消耗。这是使用dict-
comprehension(完全不推荐使用:-)的方法:
>>> words = list('asdsadDASDFASCSAASAS')>>> dct = {}>>> {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}>>> dct{'a': 2, 'A': 5, 's': 2, 'd': 2, 'F': 1, 'C': 1, 'S': 5, 'D': 2}
另一种方法是先对单词列表进行排序,然后使用对其进行分组
itertools.groupby,然后计算每个组的长度。如果需要,可以将dict-
comprehension转换为生成器,但是是的,这将需要首先读取内存中的所有单词:
from itertools import groupbywords.sort()dct = {k: sum(1 for _ in g) for k, g in groupby(words)}
请注意,其中 最快的一项 是
collections.defaultdict:
d = defaultdict(int)for w in words: d[w] += 1
时序比较:
>>> from string import ascii_letters, digits>>> %timeit words = list(ascii_letters+digits)*10**4; words.sort(); {k: sum(1 for _ in g) for k, g in groupby(words)}10 loops, best of 3: 131 ms per loop>>> %timeit words = list(ascii_letters+digits)*10**4; Counter(words)10 loops, best of 3: 169 ms per loop>>> %timeit words = list(ascii_letters+digits)*10**4; dct = {}; {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}1 loops, best of 3: 315 ms per loop>>> %%timeit... words = list(ascii_letters+digits)*10**4... d = defaultdict(int)... for w in words: d[w] += 1... 10 loops, best of 3: 57.1 ms per loop>>> %%timeitwords = list(ascii_letters+digits)*10**4d = {}for w in words: d[w] = d.get(w, 0) + 1... 10 loops, best of 3: 108 ms per loop#Increase input size>>> %timeit words = list(ascii_letters+digits)*10**5; words.sort(); {k: sum(1 for _ in g) for k, g in groupby(words)}1 loops, best of 3: 1.44 s per loop>>> %timeit words = list(ascii_letters+digits)*10**5; Counter(words)1 loops, best of 3: 1.7 s per loop>>> %timeit words = list(ascii_letters+digits)*10**5; dct = {}; {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}1 loops, best of 3: 3.19 s per loop>>> %%timeitwords = list(ascii_letters+digits)*10**5d = defaultdict(int)for w in words: d[w] += 1... 1 loops, best of 3: 571 ms per loop>>> %%timeitwords = list(ascii_letters+digits)*10**5d = {}for w in words: d[w] = d.get(w, 0) + 1... 1 loops, best of 3: 1.1 s per loop
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)