编辑3已结束,但是您缺少将实体添加到文档中的步骤。这应该工作:
import spacyimport srslyfrom spacy.gold import docs_to_json, biluo_tags_from_offsets, spans_from_biluo_tagsTRAIN_DATA = [ ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}), ("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}),]nlp = spacy.load('en_core_web_sm')docs = []for text, annot in TRAIN_data: doc = nlp(text) tags = biluo_tags_from_offsets(doc, annot['entities']) entities = spans_from_biluo_tags(doc, tags) doc.ents = entities docs.append(doc)srsly.write_json("spacy_format.json", [docs_to_json(docs)])
最好添加一个内置函数来执行此转换,因为通常希望从示例脚本(这只是简单的演示)转移到火车CLI。
编辑 :
您还可以略过间接使用内置BILUO转换器,而使用上面的功能:
doc.ents = [doc.char_span(start_idx, end_idx, label=label) for start_idx, end_idx, label in annot["entities"]]
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)