python去掉html标签

探险小说 • 2023-4-3 • 随笔 • 阅读 49

s = '开始1~3<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>'

import re

d = re.sub('<[^>]+>','',s)

print d

开始1~3

import re

test='陈细妹'

test=re.sub(r'(<[^>\s]+)\s[^>]+?(>)', r'\1\2', test)

print(test)

试试这个：

with open('aa.html') as f:

s=f.read()

import re

s1=re.sub('<.+?>',' ',s)

with open('bb.html') as wf:

wf.write(s1)

欢迎分享，转载请注明来源：内存溢出

标签 urn schemas NS namespace

打赏

微信扫一扫

支付宝扫一扫

上一篇 2023-04-03

下一篇 2023-04-03

登录后才能评论