首先是安装,看了网上各种教程,需要先按顺序安装numpy、scipy以及smartopen,最后才是gensim,另外有博主说numpy需要mkl版本。
不过我自己电脑上已经有各种所需要的库了,直接pip install gensim就行了。
中途碰到过问题:
①模型训练参数没有“size”的属性,目前是采取去掉这一参数
②gensim导入出现scipy报错:cannot import name '_ccallback_c' from 'scipy._lib';反复卸载重装都没用,最终将E盘(Python所在的盘)中安装的上述三个库全部卸载,同时保有对应Python project中虚拟环境中三个库,就能好好运行了。
简单使用:
from gensim.models import word2vec import gensim sentences = word2vec.LineSentence("E:文档研二小论文医药制造业医药制造业年报gensim整合.txt") model = word2vec.Word2Vec(sentences, hs=1, min_count=1, window=3) model.save('model') # 保存模型 model = word2vec.Word2Vec.load('model') # 加载模型 for val in model.wv.similar_by_word("化工企业", topn=100): val_list = [val] print(val_list) pass
最终得到我想要的结果:“化工企业”对应的前100个相似词
[('具体意见', 0.6623771786689758)] [('钙胺', 0.6078303456306458)] [('环丙沙星', 0.5947628021240234)] [('齐飞', 0.5804009437561035)] [('或甲氧苄', 0.5790262222290039)] [('安乃近', 0.5770139098167419)] [('母仔', 0.5748052597045898)] [('恒大', 0.5634693503379822)] [('肌松药', 0.5609520077705383)] [('地瑞', 0.5579310059547424)] [('有赖于', 0.5578207969665527)] [('.%.%.%.%.%', 0.546883761882782)] [('相互合作', 0.5424500107765198)] [('新颖', 0.5307881832122803)] [('西莱美片', 0.530393660068512)] [('内多', 0.5262453556060791)] [('工作思路', 0.5248793959617615)] [('宝贵财富', 0.5232816338539124)] [('芙朴', 0.5214694738388062)] [('吴以', 0.5199228525161743)] [('右佐匹', 0.5179560780525208)] [('样板工程', 0.5154135227203369)] [('内外科', 0.5133787393569946)] [('铬', 0.5131563544273376)] [('矢志不移', 0.5130568146705627)] [('明白', 0.5120072364807129)] [('活酶', 0.5114578008651733)] [('转折点', 0.5108917355537415)] [('创收', 0.5102124810218811)] [('推力', 0.5097854137420654)] [('以商', 0.5085864067077637)] [('重报', 0.5077459812164307)] [('引进技术', 0.5050455927848816)] [('车间主任', 0.5039330720901489)] [('百余年', 0.5008108019828796)] [('肌松', 0.5000325441360474)] [('立足点', 0.49821069836616516)] [('装车', 0.4976477026939392)] [('吡嗪', 0.49611279368400574)] [('天济嘉鑫', 0.4932640492916107)] [('证明文件', 0.4918726086616516)] [('重要文件', 0.4908128082752228)] [('卡马西平', 0.49078837037086487)] [('片未', 0.4895298480987549)] [('发粒', 0.48796600103378296)] [('肝贝科能', 0.4866732954978943)] [('进他', 0.4864782691001892)] [('前三大', 0.4860941469669342)] [('孕中', 0.48556211590766907)] [('响水', 0.4852599501609802)] [('胃肠炎', 0.4847312867641449)] [('韦仑', 0.48326951265335083)] [('长期性', 0.48318877816200256)] [('原名', 0.4831697642803192)] [('糖衣', 0.48189377784729004)] [('救人', 0.4817085266113281)] [('不以', 0.48155736923217773)] [('招股', 0.48098814487457275)] [('大禹', 0.48022300004959106)] [('公楼', 0.4799407422542572)] [('皮肤科', 0.47966402769088745)] [('AG', 0.4795415997505188)] [('脉冲', 0.4790913164615631)] [('文飞', 0.4769313633441925)] [('五官科', 0.47676241397857666)] [('抗艾', 0.47615835070610046)] [('奥通', 0.4759964942932129)] [('OneStepOvulationUrineTest', 0.47592708468437195)] [('妇', 0.474235475063324)] [('代言人', 0.4742341935634613)] [('止损', 0.47192856669425964)] [('硫唑嘌呤', 0.4717518985271454)] [('交房', 0.47154587507247925)] [('围着', 0.4698885977268219)] [('东指', 0.46862363815307617)] [('版起', 0.46847641468048096)] [('战略意义', 0.4684240520000458)] [('壅', 0.46834033727645874)] [('天伟', 0.4683018922805786)] [('推介会', 0.4680977463722229)] [('苯丙氨酸', 0.4680188000202179)] [('比中', 0.4679011404514313)] [('天利应', 0.4678548574447632)] [('司太立', 0.4677242338657379)] [('附加税', 0.4671434164047241)] [('天舒片', 0.46707549691200256)] [('紧紧抓住', 0.46669015288352966)] [('或服', 0.4663790464401245)] [('同防', 0.4661034047603607)] [('比较突出', 0.46559906005859375)] [('兴医', 0.4654051959514618)] [('WondfoCocaineUrine', 0.4652007222175598)] [('出口商', 0.46511998772621155)] [('昔洛', 0.4649103581905365)] [('阵列', 0.46476802229881287)] [('恋康', 0.4641551077365875)] [('优良传统', 0.46265724301338196)] [('兴钱', 0.4626224637031555)] [('尼群地平', 0.4624794125556946)] [('已登记', 0.4621630012989044)]
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)