我的CSV片段如下所示:
"geonameID","name","asciiname","alternatenames","loc","feature_class","feature_code","country_code","cc2","admin1_code","admin2_code","admin3_code","admin4_code"3,"Zamīn Sūkhteh","Zamin Sukhteh","Zamin Sukhteh,Zamīn Sūkhteh","[48.91667,32.48333]","P","PPL","IR","15",5,"Yekāhī","Yekahi","Yekahi,Yekāhī","[48.9,32.5]",7,"Tarvīḩ ‘Adāī","Tarvih `Adai","Tarvih `Adai,Tarvīḩ ‘Adāī","[48.2,32.1]",
与mongoimport一起使用的所需JsON输出(charset除外)如下:
{"geonameID":3,"name":"Zamin Sukhteh","asciiname":"Zamin Sukhteh","alternatenames":"Zamin Sukhteh,Zamin Sukhteh","loc":[48.91667,32.48333],"feature_class":"P","feature_code":"PPL","country_code":"IR","cc2":null,"admin1_code":15,"admin2_code":null,"admin3_code":null,"admin4_code":null}{"geonameID":5,"name":"Yekahi","asciiname":"Yekahi","alternatenames":"Yekahi,Yekahi","loc":[48.9,32.5],"admin4_code":null}{"geonameID":7,"name":"Tarvi? ‘Adai","asciiname":"Tarvih `Adai","alternatenames":"Tarvih `Adai,Tarvi? ‘Adai","loc":[48.2,32.1],"admin4_code":null}
我已经尝试了所有可用的在线CSV-JsON转换器,但由于文件大小,它们无法正常工作.我得到的最接近的是Mr Data Converter(如上图所示),在删除文档之间的开始和结束括号和逗号后将导入MongoDb.不幸的是,该工具不适用于300 MB的文件.
上面的JsON设置为以UTF-8编码但仍然存在字符集问题,很可能是由于转换错误?
我花了最近三天学习Python,尝试使用Python csvkit,尝试堆栈溢出中的所有CSV-JsON脚本,将CSV导入MongoDB并将“loc”字符串更改为数组(这不幸地保留了引号)甚至尝试手动一次复制并粘贴30,000条记录.很多逆向工程,反复试验等等.
有没有人知道如何实现上面的JsON,同时保持编码正确,如上面的CSV?我完全停顿了.
解决方法 Python标准库(加上用于十进制编码支持的simpleJson)可满足您的所有需求:import csv,simpleJson,decimal,codecsdata = open("in.csv")reader = csv.DictReader(data,delimiter=",",quotechar='"')with codecs.open("out.Json","w",enCoding="utf-8") as out: for r in reader: for k,v in r.items(): # make sure nulls are generated if not v: r[k] = None # parse and generate decimal arrays elif k == "loc": r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")] # generate a number elif k == "geonameID": r[k] = int(v) out.write(simpleJson.dumps(r,ensure_ascii=False,use_decimal=True)+"\n")
其中“in.csv”包含您的大csv文件.以上代码经过测试,可用于Python 2.6& 2.7,带有大约100MB的csv文件,生成一个正确编码的UTF-8文件.没有包围括号,数组引用或逗号分隔符,如请求.
还值得注意的是,传递ensure_ascii和use_decimal参数是编码正常工作所必需的(在本例中).
最后,作为based on simplejson,python stdlib Json包迟早也会获得十进制编码支持.因此,最终只需要stdlib.
总结以上是内存溢出为你收集整理的使用Python将CSV转换为mongoimport友好的JSON全部内容,希望文章能够帮你解决使用Python将CSV转换为mongoimport友好的JSON所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)