Python:使用dict理解生成器计算列表中的出现次数

Python:使用dict理解生成器计算列表中的出现次数,第1张

Python:使用dict理解/生成器计算列表中的出现次数

您无法使用dict-
comprehension有效地(至少在内存方面)做到这一点,因为这样一来,您就必须在另一本词典中跟踪当前计数,即更多的内存消耗。这是使用dict-
comprehension(完全不推荐使用:-)的方法:

>>> words = list('asdsadDASDFASCSAASAS')>>> dct = {}>>> {w: 1 if w not in dct and not dct.update({w: 1})       else dct[w] + 1       if not dct.update({w: dct[w] + 1}) else 1 for w in words}>>> dct{'a': 2, 'A': 5, 's': 2, 'd': 2, 'F': 1, 'C': 1, 'S': 5, 'D': 2}

另一种方法是先对单词列表进行排序,然后使用对其进行分组

itertools.groupby
,然后计算每个组的长度。如果需要,可以将dict-
comprehension转换为生成器,但是是的,这将需要首先读取内存中的所有单词:

from itertools import groupbywords.sort()dct = {k: sum(1 for _ in g) for k, g in groupby(words)}

请注意,其中 最快的一项

collections.defaultdict

d = defaultdict(int)for w in words: d[w] += 1

时序比较:

>>> from string import ascii_letters, digits>>> %timeit words = list(ascii_letters+digits)*10**4; words.sort(); {k: sum(1 for _ in g) for k, g in groupby(words)}10 loops, best of 3: 131 ms per loop>>> %timeit words = list(ascii_letters+digits)*10**4; Counter(words)10 loops, best of 3: 169 ms per loop>>> %timeit words = list(ascii_letters+digits)*10**4; dct = {}; {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}1 loops, best of 3: 315 ms per loop>>> %%timeit... words = list(ascii_letters+digits)*10**4... d = defaultdict(int)... for w in words: d[w] += 1... 10 loops, best of 3: 57.1 ms per loop>>> %%timeitwords = list(ascii_letters+digits)*10**4d = {}for w in words: d[w] = d.get(w, 0) + 1... 10 loops, best of 3: 108 ms per loop#Increase input size>>> %timeit words = list(ascii_letters+digits)*10**5; words.sort(); {k: sum(1 for _ in g) for k, g in groupby(words)}1 loops, best of 3: 1.44 s per loop>>> %timeit words = list(ascii_letters+digits)*10**5; Counter(words)1 loops, best of 3: 1.7 s per loop>>> %timeit words = list(ascii_letters+digits)*10**5; dct = {}; {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}1 loops, best of 3: 3.19 s per loop>>> %%timeitwords = list(ascii_letters+digits)*10**5d = defaultdict(int)for w in words: d[w] += 1... 1 loops, best of 3: 571 ms per loop>>> %%timeitwords = list(ascii_letters+digits)*10**5d = {}for w in words: d[w] = d.get(w, 0) + 1... 1 loops, best of 3: 1.1 s per loop


欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5645328.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存