您可以简单地去除所有标签:
>>> import re>>> txt = """<bookstore>... <book category="COOKING">... <title lang="english">Everyday Italian</title>... <author>Giada De Laurentiis</author>... <year>2005</year>... <price>300.00</price>... </book>...... <book category="CHILDREN">... <title lang="english">Harry Potter</title>... <author>J K. Rowling </author>... <year>2005</year>... <price>625.00</price>... </book>... </bookstore>""">>> exp = re.compile(r'<.*?>')>>> text_only = exp.sub('',txt).strip()>>> text_only'Everyday Italiann Giada De Laurentiisn 2005n 300.00n nn n Harry Pottern J K. Rowling n 2005n 625.00'
但是,如果您只想在Linux中搜索文件中的某些文本,则可以使用
grep:
burhan@sandbox:~$ grep "Harry Potter" file.xml <title lang="english">Harry Potter</title>
如果要搜索文件,请使用
grep上面的命令,或打开文件并在Python中搜索:
>>> import re>>> exp = re.compile(r'<.*?>')>>> with open('file.xml') as f:... lines = ''.join(line for line in f.readlines())... text_only = exp.sub('',lines).strip()...>>> if 'Harry Potter' in text_only:... print 'It exists'... else:... print 'It does not'...It exists
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)