Python分词及词性划分

Python分词及词性划分,第1张

下载Python

Python官网 python.org/download/
下载python,下图打勾, 可以自动配置PATH

验证:
安装成功后,打开命令提示符窗口(win+R,在输入cmd回车),敲入python

如果失败 配置环境变量有问题 手动配一下

下载开发工具

安装PyCharm工具
PyCharm官网 jetbrains.com/pycharm/download/
下载好之后可以下载中文插件


新建项目
然后新建Python文件

英文分词

下载nltk

在终端输入
pip install nltk

下载之后还要下载nltk的包

会科学上网的话,直接:
import nltk
nltk.download('punkt')

不会科学上网就手动下好了
nltk包下载 github.com/nltk/nltk_data
下载packages就行

下载好后把,packages放在随意一个 报错 的位置
将 packages 重命名成 nltk_data

Searched in:
    - 'C:\Users\86187/nltk_data'
    - 'D:\ProgramData\Anaconda3\envs\emotional_analysis\nltk_data'
    - 'D:\ProgramData\Anaconda3\envs\emotional_analysis\share\nltk_data'
    - 'D:\ProgramData\Anaconda3\envs\emotional_analysis\lib\nltk_data'
    - 'C:\Users\86187\AppData\Roaming\nltk_data'
    - 'C:\nltk_data'
    - 'D:\nltk_data'
    - 'E:\nltk_data'
    - ''

记得把要用到的包解压

from nltk import word_tokenize, pos_tag

english = "When someone asked me about my favorite season, my answer certainly is spring. Because all the plants turn green and come into leaf in spring. And some kinds of flowers also become in bloom. "
# 英文分词
english = "When someone asked me about my favorite season, my answer certainly is spring. Because all the plants turn " \
          "green and come into leaf in spring. And some kinds of flowers also become in bloom. The spring makes the " \
          "world colorful. For too many people, spring means the beginning of a new year, and the green color of " \
          "spring represents hope. As far as I’m concerned, spring has the meaning of fresh and newly born. The " \
          "newborn seems to bring me energy and enthusiasm all the time. "
english_list = word_tokenize(english)
print("英文分词:" + ",".join(english_list))

# 提取名词/动词
words = pos_tag(english_list)
noun = "名词:"
verb = "动词:"
for word in words:
    if word[1] in {"NN", "NNP", "NNS"}:
        noun = noun + "".join(word[0]) + ","
    if word[1] in {"VBD", "VBN", "VB"}:
        verb = verb + "".join(word[0]) + ","
# print([word for word in words ])
print(noun)
print(verb)

从外部读取文件分词

# 读取外部文件
f = open("English.txt", "r")  # 设置文件对象
str_e = f.read()  # 将txt文件的所有内容读入到字符串str中
f.close()
str_e_list = word_tokenize(str_e)
print("英文分词:" + ",".join(str_e_list))

# 提取名词/动词
words_str = pos_tag(str_e_list)
noun_str = "名词:"
verb_str = "动词:"
for word in words_str:
    if word[1] in {"NN", "NNP", "NNS"}:
        noun_str = noun_str + "".join(word[0]) + ","
    if word[1] in {"VBD", "VBN", "VB"}:
        verb_str = verb_str + "".join(word[0]) + ","
# print([word for word in words ])
print(noun_str)
print(verb_str)

中文分词-jieba

下载jieba

在终端输入
pip install jieba
import jieba
import jieba.posseg as pseg

# 中文分词
s = "如何才能更早发现疫情?在当前条件下应该采取什么样的管理措施,才能够快速找到密切接触者,让他们配合进行医学观察?会上,针对记者提问,中国疾控中心流行病学首席专家吴尊友予以回应。


" s_list = jieba.cut(s) print ("Default Mode:", ",".join(s_list)) # 提取名词 noun = "" verb = "" words = pseg.cut(s) for w in words: if w.flag == "n": noun = noun + "".join(w.word) + "," if w.flag == "v": verb = verb + "".join(w.word) + "," # print (w.word,w.flag) print("名词:"+noun) print("动词:"+verb)

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/570604.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-04-09
下一篇 2022-04-09

发表评论

登录后才能评论

评论列表(0条)

保存