自用篇之NLTK入门_python

自用篇之NLTK入门

调用现有的收费api使用的是http报文格式进行参数传递的，看着不大舒服，我先给自己挖个坑吧，要是能帮到大家我也替大家感到快乐。

搭建 NLTK (Neutral Language ToolKit)环境 with Python

# 首先安装nltk
pip install nltk
安装之后，如果有外网加持的话就于python命令行中输入：
import nltk
nltk.download()

所以还是本地安装省事
「nltk_data-gh-pages_2.exe」https://www.aliyundrive.com/s/DTwAxaEv9RK 提取码: td54
本人现在喜欢偷懒，用的anaconda3环境包，把压缩包里的packages解压至envs的对应环境文件夹内，然后更改名称为nltk_data。
nltk.download()一下，全绿显示installed就ojbk了。

nltk.word_tokenize(txt)分词器

def word_tokenize(text, language=“english”, preserve_line=False):
:param text: text to split into words
:type text: str
:param language: the model name in the Punkt corpus
:type language: str
:param preserve_line: A flag to decide whether to sentence tokenize the text or not.
:type preserve_line: bool

nltk.sent_tokenize(txt)分句器

def sent_tokenize(text, language=“english”):
:param text: text to split into sentences
:param language: the model name in the Punkt corpus

现在没有心思写，后续再完善内容。。。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/725132.html