在Python中删除停用词的更快方法

矛盾的作品 • 2022-12-15 • 随笔 • 阅读 22

在Python中删除停用词的更快方法

尝试缓存停用词对象，如下所示。每次调用函数时都要构造它，这似乎是瓶颈。

    from nltk.corpus import stopwords    cachedStopWords = stopwords.words("english")    def testFuncOld():        text = 'hello bye the the hi'        text = ' '.join([word for word in text.split() if word not in stopwords.words("english")])    def testFuncNew():        text = 'hello bye the the hi'        text = ' '.join([word for word in text.split() if word not in cachedStopWords])    if __name__ == "__main__":        for i in xrange(10000): testFuncOld() testFuncNew()

：我通过探查跑这 蟒蛇-m CPROFILE -s累计test.py 。相关行如下。

nCalls累积时间

10000 7.723个单词.py：7（testFuncOld）

10000 0.140个单词。py：11（testFuncNew）

因此，缓存停用词实例可以使速度提高约70倍。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5617006.html

用词缓存单词探查蟒蛇

打赏

微信扫一扫

支付宝扫一扫

矛盾的作品一级用户组

0 0

SQLAlchemy和django，生产准备好了吗？

上一篇 2022-12-15

TypeError：“ zip”对象不可下标

下一篇 2022-12-15

发表评论

登录后才能评论

在Python中删除停用词的更快方法

发表评论

评论列表（0条）