Python – 在字典列表中查找重复项并对其进行分组

Python – 在字典列表中查找重复项并对其进行分组,第1张

概述我不是程序员,也不是 python的新手,我有一个来自json文件的dicts列表: # JSON file (film.json)[{"year": ["1999"], "director": ["Wachowski"], "film": ["The Matrix"], "price": ["19,00"]},{"year": ["1994"], "director": ["Tarantin 我不是程序员,也不是 python的新手,我有一个来自Json文件的dicts列表:
# JsON file (film.Json)[{"year": ["1999"],"director": ["Wachowski"],"film": ["The Matrix"],"price": ["19,00"]},{"year": ["1994"],"director": ["Tarantino"],"film": ["Pulp Fiction"],"price": ["20,{"year": ["2003"],"film": ["Kill Bill vol.1"],"price": ["10,"film": ["The Matrix Reloaded"],"price": ["9,99"]},"film": ["Pulp Fyction"],"price": ["15,"director": ["E. de Souza"],"film": ["Street fighter"],"price": ["2,{"year": ["1999"],{"year": ["1982"],"director": ["RIDley Scott"],"film": ["Blade Runner"],99"]}]

我可以导入Json文件:

import JsonJson_file = open('film.Json')f = Json.load(Json_file)

但在那之后,我无法在f中找到事件,并按电影片名分组.
这就是我想要实现的目标:

## result grouped by 'film'#group 1{"year": ["1999"],00"]}{"year": ["1999"],00"]}#group 2{"year": ["1994"],00"]}{"year": ["1994"],00"]}#group X ...

或更好:

new_dict = { 'group1':[[],[],...],'group2':[[],'groupX':[...] }

目前我正在测试嵌套,但没有运气..

谢谢.

注意:“纸浆fyction”是未来实现的模糊字符串匹配的错误,现在我只需要一个’重复的石斑鱼’

note2:使用python 2.x.

解决方法 由于您的数据未排序,请使用 collections.defaultdict() object为新密钥显示列表,然后按电影标题键入:
from collections import defaultdictgrouped = defaultdict(List)for film in f:    grouped[film['film'][0]].append(film)

电影[‘电影’] [0]值用于分组电影.如果您想使用更复杂的标题分组,则必须创建该密钥的规范版本.

演示:

>>> from collections import defaultdict>>> import Json>>> with open('film.Json') as film_file:...     f = Json.load(film_file)... >>> grouped = defaultdict(List)>>> for film in f:...     grouped[film['film'][0]].append(film)... >>> groupeddefaultdict(<type 'List'>,{u'Street fighter': [{u'director': [u'E. de Souza'],u'price': [u'2,00'],u'film': [u'Street fighter'],u'year': [u'1994']}],u'Pulp Fiction': [{u'director': [u'Tarantino'],u'price': [u'20,u'film': [u'Pulp Fiction'],u'Pulp Fyction': [{u'director': [u'Tarantino'],u'price': [u'15,u'film': [u'Pulp Fyction'],u'The Matrix': [{u'director': [u'Wachowski'],u'price': [u'19,u'film': [u'The Matrix'],u'year': [u'1999']},{u'director': [u'Wachowski'],u'year': [u'1999']}],u'Blade Runner': [{u'director': [u'RIDley Scott'],99'],u'film': [u'Blade Runner'],u'year': [u'1982']}],u'Kill Bill vol.1': [{u'director': [u'Tarantino'],u'price': [u'10,u'film': [u'Kill Bill vol.1'],u'year': [u'2003']}],u'The Matrix Reloaded': [{u'director': [u'Wachowski'],u'price': [u'9,u'film': [u'The Matrix Reloaded'],u'year': [u'2003']}]})>>> from pprint import pprint>>> pprint(dict(grouped)){u'Blade Runner': [{u'director': [u'RIDley Scott'],u'Street fighter': [{u'director': [u'E. de Souza'],u'year': [u'2003']}]}

使用SoundEx分组电影将如下:

from itertools import groupby,islice,ifilter_codes = ('bfpv','cgjkqsxz','dt','l','mn','r')_sounds = {c: str(i) for i,code in enumerate(_codes,1) for c in code}_sounds.update(dict.fromkeys('aeIoUy'))def soundex(word,_sounds=_sounds):    grouped = groupby(_sounds[c] for c in word.lower() if c in _sounds)    if _sounds.get(word[0].lower()):        next(grouped)  # remove first group.    sdx = ''.join([k for k,g in islice((g for g in grouped if g[0]),3)])    return word[0].upper() + format(sdx,'<03')grouped_by_soundex = defaultdict(List)for film in f:    grouped_by_soundex[soundex(film['film'][0])].append(film)

导致:

>>> pprint(dict(grouped_by_soundex)){u'B436': [{u'director': [u'RIDley Scott'],u'K414': [{u'director': [u'Tarantino'],u'P412': [{u'director': [u'Tarantino'],u'year': [u'1994']},{u'director': [u'Tarantino'],u'S363': [{u'director': [u'E. de Souza'],u'T536': [{u'director': [u'Wachowski'],u'year': [u'2003']},u'year': [u'1999']}]}
总结

以上是内存溢出为你收集整理的Python – 在字典列表中查找重复项并对其进行分组全部内容,希望文章能够帮你解决Python – 在字典列表中查找重复项并对其进行分组所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/1207049.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-04
下一篇 2022-06-04

发表评论

登录后才能评论

评论列表(0条)

保存