å æ¤éè¦å°docxä¸ææ¬è¯»ååºæ¥ï¼ç¶åä¿å为txtæ ¼å¼å³å¯
éè¦çpython模å为 python-docx ï¼ https://python-docx.readthedocs.io/en/latest/index.html, å¯¼å ¥æ¨¡åæ¶åå¯¼å ¥docx
ï¼åªè½è¯»å.docxæ件ï¼ä¸è½è¯»å.docæ件ï¼
注æï¼å¨PyPiéè¿æä¸ä¸ªå«docxçåºï¼å·²ç»åæ¢æ´æ°ï¼ä¸å»ºè®®ä½¿ç¨ï¼
http://www.cnblogs.com/geek-arking/p/9300617.html
ä¸é¢çæ¹æ³åªè½è¯»ådocxæ件ï¼å¦æ读ådocä¼æ¥é
ç»ææ¥éï¼docx.opc.exceptions.PackageNotFoundError: Package not foundãè¿æ¯æ æ³è¯å«doc
âæ¹åæå±å并没ææ¹åå ¶ç¼ç æ¹å¼ï¼å æ¤æ æ³è¯»åææ¬å 容ï¼éå°docæ件ç¨wordå¦å为docxååç¨python-docx读åå ¶å 容â
对äºè¦è½¬æ¢çdocæ件ï¼ç½ä¸çèµæé½æ¯ä½¿ç¨win32ï¼éè¦å®è£ pypiwin32
https://www.cnblogs.com/AlgorithmDot/p/3386918.html
éè¿ä¸é¢çæ¹æ³ï¼ææ¶å¯ä»¥ç´æ¥å°doc转æ¢ä¸ºtxtæ件ï¼ææ¶åä¼æ¥éã
è¿éæ们å¯ä»¥èèå°docæ件ç´æ¥è½¬æ¢ä¸ºdocxç¶ååéè¿ä¸é¢çæ¹æ³è¯»å为txtï¼å¦ææå¨å°docä¿®æ¹ä¸ºtxtæè docxï¼æå¼æ件ä¼æ¾ç¤ºä¹±ç ï¼ä½æ¯å¯ä»¥ç¨å ¶æä¾çSaveAsæ¹æ³å°.docææ¡£å©ç¨æå¨çæ¹å¼âå¦å为â.docxææ¡£ï¼å°±è½å¤æåæå¼è½¬ååç.docxææ¡£ï¼
doc.SaveAs(tmp +'.docx', 16)
å ¶ä¸16çå«ä¹å¦ä¸ï¼
å©ç¨win32comæ¥å£ç´æ¥è°ç¨office APIï¼å¥½å¤æ¯ç®åãå ¼å®¹æ§å¥½ï¼åªè¦officeè½å¤ççï¼pythoné½å¯ä»¥å¤çï¼å¤çåºæ¥çç»æåoffice wordéé¢âå¦å为âä¸è´ã
ä¸é¢æ¯office 2007æ¯æçå ¨é¨æä»¶æ ¼å¼å¯¹åºè¡¨ï¼
wdFormatDocument = 0
wdFormatDocument97 = 0
wdFormatDocumentDefault = 16
wdFormatDOSText = 4
wdFormatDOSTextLineBreaks = 5
wdFormatEncodedText = 7
wdFormatFilteredHTML = 10
wdFormatFlatXML = 19
wdFormatFlatXMLMacroEnabled = 20
wdFormatFlatXMLTemplate = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML = 8
wdFormatPDF = 17
wdFormatRTF = 6
wdFormatTemplate = 1
wdFormatTemplate97 = 1
wdFormatText = 2
wdFormatTextLineBreaks = 3
wdFormatUnicodeText = 7
wdFormatWebArchive = 9
wdFormatXML = 11
wdFormatXMLDocument = 12
wdFormatXMLDocumentMacroEnabled = 13
wdFormatXMLTemplate = 14
wdFormatXMLTemplateMacroEnabled = 15
wdFormatXPS = 18
ç §çåé¢ææåºè¯¥è½å¯¹åºå°ç¸åºçæä»¶æ ¼å¼ã
1ãæ°å»ºææå¼æ件ãè¿ä¸ªæ¯è¾ç®åç¨docxçDocumentç±»ï¼è¥æå®è·¯å¾åæ¯æå¼ææ¡£ï¼è¥æ²¡ææå®è·¯å¾åæ¯æ°å»ºææ¡£
2ãä¿åæ件ãææå¼ï¼å°±æä¿åãç¨Documentç±»çsaveæ¹æ³ï¼å ¶ä¸åæ°æ¯ä¿åçæ件路å¾ï¼æè è¦ä¿åçæ件æµãä¸è¬æå®è·¯å¾å³å¯ã
doc.save(path_or_stream)
3ã对象éåãpython-docxå å«äºwordææ¡£çç¸å ³å¯¹è±¡éåã
4ãæå ¥æ®µè½ã段è½æ¯wordæåºæ¬ç对象ä¹ä¸ã
5ãæ°å¢æ ·å¼ãè¿ä¸ªå¸®å©ææ¡£éé¢è¯´å¾ä¸ä»ç»ï¼èä¸è¿æ¯è±æçãææ头ä¸ç项ç®ç¨å°è¿ä¸ªï¼å°±èªå·±ç¢ç£¨åºæä¹ä½¿ç¨ï¼å¦ä¸ã
6ãåºç¨åç¬¦æ ·å¼ãå符èªç¶æ¯å¨æ®µè½éé¢çï¼å¯ä»¥éç¨ä¸é¢æ¹æ³ç»æ®µè½è¿½å æåå设置åç¬¦æ ·å¼ã
#æå ¥ä¸ä¸ªç©ºç½æ®µè½
p = doc.add_paragraph('')
p.add_run('123', style="Heading 1 Char")
p.add_run('456')
p.add_run('789', style="Heading 2 Char")
#è¿æ ·ä¸ä¸ªæ®µè½å°±åºç¨äºä¸¤ä¸ªåç¬¦æ ·å¼ï¼ä¸é´â456â就没åºç¨æ ·å¼
printp.text#è¾åºç»ææ¯u'123456789' ä¹è¿æ¯è¿ç»ç
7ã设置åä½ãå½ç¶å¯ä»¥ä¸ç¨éè¿è®¾ç½®æ ·å¼å¯¹æäºåè¿è¡è®¾ç½®ï¼ä¹å¯ä»¥ç´æ¥è®¾ç½®ã
p = doc.add_paragraph('')
r = p.add_run('123')
r.font.bold =True#å ç²
r.font.italic =True#å¾æ çç...
8ãè¡¨æ ¼æä½ãè¡¨æ ¼ä¹æ¯ç»å¸¸ç¨å°çä¸ç§å¯¹è±¡ç±»åã
name =['a1','a2','a3']seq=['seq11111','seqs22222','seq33333']
f = open("F:/1.txt", "w+")
f.write("name\tseq\n")
for i in range(0, len(name)):
f.write(name[i] + "\t" + seq[i] + "\n")
f.close()
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)