Python统计字符单词汉字个数(字典)的三种方法_随笔

Python统计字符/单词/汉字个数(字典)的三种方法 1.利用字典和列表表达式统计英文单词个数

sentence="""Travel policies for this year's Spring Festival, which will fall in early February, should be devised based on COVID-19 risk appraisals of different regions and groups, health officials and experts said on Saturday."Whether or not it is necessary to stay put for the Spring Festival should be based on risk evaluations and should not be uniform across the country," said Liang Wannian, a national-level disease control expert, during a news briefing.He said increased movement during the Spring Festival travel rush, coupled with diminished immunity against respiratory diseases during winter, will indeed lead to heightened risk of the spread of COVID-19 and other infectious illnesses.However, China has also gained several advantages, such as its high COVID-19 vaccination coverage, and its prompt control of local outbreaks."Most recent outbreaks are linked to imported cases," he said. "As long as we can strictly implement policies to control imported cases, and the public can practice personal protective measures, we are capable of stemming the virus' spread."He said key regions and populations should abide by strict virus control measures."High risk groups, such as the elderly, people with chronic diseases and pregnant women should cut unnecessary trips and avoid gatherings, while the remaining population should adhere to protection measures," he said."""

dic={char:sentence.count(char) for char in set(sentence.split()) if char.isalpha()} #生成字典，并且去除了标点符号。如果是根据字母来排序的话把sentence.split()改为sentence即可。
print(sorted(dic.items(),key=lambda x:x[1],reverse=True)) #根据键值进行排序。

显示结果：
[('a', 94), ('in', 22), ('on', 18), ('he', 18), ('as', 15), ('is', 12), ('the', 11), ('and', 9), ('or', 7), ('of', 6), ('should', 6), ('said', 6), ('to', 6), ('it', 6), ('control', 4), ('risk', 4), ('Festival', 3), ('be', 3), ('disease', 3), ('during', 3), ('we', 3), ('for', 3), ('Spring', 3), ('strict', 2), ('He', 2), ('diseases', 2), ('imported', 2), ('based', 2), ('spread', 2), ('regions', 2), ('policies', 2), ('its', 2), ('outbreaks', 2), ('necessary', 2), ('can', 2), ('such', 2), ('virus', 2), ('with', 2), ('not', 2), ('will', 2), ('population', 2), ('are', 2), ('practice', 1), ('strictly', 1), ('travel', 1), ('movement', 1), ('uniform', 1), ('protection', 1), ('public', 1), ('chronic', 1), ('immunity', 1), ('against', 1), ('trips', 1), ('cut', 1), ('Travel', 1), ('adhere', 1), ('health', 1), ('recent', 1), ('respiratory', 1), ('also', 1), ('while', 1), ('this', 1), ('linked', 1), ('devised', 1), ('personal', 1), ('other', 1), ('populations', 1), ('capable', 1), ('diminished', 1), ('increased', 1), ('stay', 1), ('long', 1), ('different', 1), ('Liang', 1), ('China', 1), ('several', 1), ('heightened', 1), ('put', 1), ('news', 1), ('indeed', 1), ('vaccination', 1), ('unnecessary', 1), ('appraisals', 1), ('gained', 1), ('prompt', 1), ('people', 1), ('protective', 1), ('pregnant', 1), ('by', 1), ('officials', 1), ('stemming', 1), ('women', 1), ('evaluations', 1), ('infectious', 1), ('avoid', 1), ('early', 1), ('key', 1), ('experts', 1), ('local', 1), ('implement', 1), ('across', 1), ('coupled', 1), ('lead', 1), ('fall', 1), ('which', 1), ('remaining', 1), ('high', 1), ('has', 1), ('abide', 1)]

2. 利用字典和列表表达式统计汉字个数

sentence="""3.算法的基本要素：一是对数据对象的运算和 *** 作；二是算法的控制结构。
4.指令系统：一个计算机系统能执行的所有指令的集合。
5.基本运算包括：算术运算、逻辑运算、关系运算、数据传输。
6.算法的控制结构：顺序结构、选择结构、循环结构。
7.算法基本设计方法：列举法、归纳法、递推、递归、减斗递推技术、回溯法。
8.算法复杂度：算法时间复杂度和算法空间复杂度。两个之间没有联系的。
9.算法时间复杂度是指执行算法所需要的计算工作量。
10.算法空间复杂度是指执行这个算法所需要的内存空间。"""
d={char:sentence.count(char) for char in set(sentence) if "u4e00"<=char<="u9fff"}
print(sorted(d.items(),key=lambda x:x[1],reverse=True))

运算结果：

[('算', 19), ('法', 15), ('、', 10), ('的', 9), ('。', 9), ('间', 6), ('运', 5), ('构', 5), ('度', 5), ('结', 5), ('杂', 5), ('复', 5), ('系', 4), ('是', 4), ('指', 4), ('要', 3), ('执', 3), ('计', 3), ('本', 3), ('基', 3), ('所', 3), ('个', 3), ('空', 3), ('行', 3), ('递', 3), ('和', 2), ('归', 2), ('统', 2), ('制', 2), ('控', 2), ('作', 2), ('推', 2), ('需', 2), ('有', 2), ('令', 2), ('术', 2), ('对', 2), ('一', 2), ('据', 2), ('数', 2), ('时', 2), ('回', 1), ('存', 1), ('传', 1), ('环', 1), ('斗', 1), ('方', 1), ('素', 1), ('量', 1), ('集', 1), ('内', 1), ('之', 1), ('这', 1), ('辑', 1), ('没', 1), ('象', 1), (' *** ', 1), ('技', 1), ('减', 1), ('二', 1), ('举', 1), ('设', 1), ('包', 1), ('两', 1), ('输', 1), ('顺', 1), ('联', 1), ('序', 1), ('工', 1), ('择', 1), ('列', 1), ('括', 1), ('选', 1), ('循', 1), ('合', 1), ('逻', 1), ('能', 1), ('机', 1), ('关', 1), ('纳', 1), ('溯', 1)]

3. 运用字典的Get命令和循环来统计汉字个数。

sentence="""人民社会国家科学人民社会好好好好"""
lst=sentence.split()
dic={}
for i in lst:
dic[i]=dic.get(i,0)+1 #用dict的get方法，找到键名为i的就返回其对应的值，否则就返回0
print(sorted(dic.items(),key=lambda x:x[1],reverse=True))

显示结果：

[('好', 4), ('人民', 2), ('社会', 2), ('国家', 1), ('科学', 1)]

4. 用NLTK中的FreqDist函数

import nltk
from nltk import FreqDist
s="life is short and I like python."
freq = FreqDist(list(s))
for key in freq:
if key.isalpha():
print(key,freq[key])

结果展示：

i 3
l 2
e 2
s 2
h 2
o 2
t 2
n 2
f 1
r 1
a 1
d 1
I 1
k 1
p 1
y 1

5. 通过交换键值对的位置进行排序

把字典转化为元组以后再排序：

lst=sentence.split()
dic={}
for i in lst:
dic[i]=dic.get(i,0)+1 #用dict的get方法，找到键名为i的就返回其对应的值，否则就返回0

ls=sorted([(p[1],p[0]) for p in dic.items()],reverse=True)

print([(p[1],p[0]) for p in ls])

[('好', 4), ('人民', 2), ('社会', 2), ('国家', 1), ('科学', 1)]

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5680513.html

Python统计字符单词汉字个数(字典)的三种方法

发表评论

评论列表（0条）