在Python中如何用正则表达式提取xml中的之间的内容_框架

# 代码

html_text = '''

When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the

<xref ref-type="bibr" rid="pone0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,

,<xref ref-type="bibr" rid="pone0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells

(A) R1 cells were cultured for 5 days in the presence of

<xref ref-type="bibr" rid="pone0000015-Rogers1">[1]</xref> and <italic>nanog</italic>

<xref ref-type="bibr" rid="pone0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml)

'''

pattern = r'()'

html_text = resub('\n', '', html_text)

text = refindall(pattern, html_text)

print(text)# 输出

['When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the <xref ref-type="bibr" rid="pone0000015-Rogers1">[17]</xref> and <italic>nanog</italic> ,,<xref ref-type="bibr" rid="pone0000015-Chambers1">[19]</xref> well-known markers for undifferentiated ES cells ',

'(A) R1 cells were cultured for 5 days in the presence of <xref ref-type="bibr" rid="pone0000015-Rogers1">[1]</xref> and <italic>nanog</italic> <xref ref-type="bibr" rid="pone0000015-Mitsui1">[2]</xref>, <xref ref-type="bibr" rid="pone0000015-Chambers1">[3]</xref> various doses of LIF (0–1,000 units/ml) ']

python是一款应用非常广泛的脚本程序语言，谷歌公司的网页就是用python编写。python在生物信息、统计、网页制作、计算等多个领域都体现出了强大的功能。python和其他脚本语言如java、R、Perl 一样，都可以直接在命令行里运行脚本程序。工具/原料

python；CMD命令行；windows *** 作系统

方法/步骤

1、首先下载安装python，建议安装27版本以上，30版本以下，由于30版本以上不向下兼容，体验较差。

2、打开文本编辑器，推荐editplus，notepad等，将文件保存成 py格式，editplus和notepad支持识别python语法。

脚本第一行一定要写上 #!usr/bin/python

表示该脚本文件是可执行python脚本

如果python目录不在usr/bin目录下，则替换成当前python执行程序的目录。

3、编写完脚本之后注意调试、可以直接用editplus调试。调试方法可自行百度。脚本写完之后，打开CMD命令行，前提是python 已经被加入到环境变量中，如果没有加入到环境变量，请百度

4、在CMD命令行中，输入 “python” + “空格”，即 ”python “；将已经写好的脚本文件拖拽到当前光标位置，然后敲回车运行即可。

正则表达式是一种文本模式匹配工具，可以用来提取文本中的指定信息。如果你想使用正则表达式来提取上面的文本中的Default9，可以使用下面的正则表达式：

这个表达式会匹配文本中以 Dialogue: 开头，后面跟着任意字符，然后以一个逗号结尾的部分。它会提取括号中的内容，即Default9。

例如，在使用 Python 的 re 模块时，可以这样使用这个正则表达式：

运行上面的代码，会输出：

请注意，这里的正则表达式并不能匹配所有情况，如果你想要更精确地匹配，可能需要修改正则表达式。

import re

pattern = '<ahref="(+)">()</a>'

with open("testhtml", "r") as fp:

for line in fp:

ret = research(pattern, line)

if ret:

for x in retgroups(): print x

不知道具体格式是怎样的，我这里也就简单举个例子。

groups获取到的就是正则pattern里面( )中的内容，以元组形式返回。

以上就是关于在Python中如何用正则表达式提取xml中的之间的内容全部的内容，包括:在Python中如何用正则表达式提取xml中的之间的内容、在python中使用正则表达式提取excel单元格中需要的信息、正则提取中间的内容等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/web/10145301.html

在Python中如何用正则表达式提取xml中的<p>之间的内容

发表评论

评论列表（0条）

在Python中如何用正则表达式提取xml中的&lt;p&gt;之间的内容

发表评论

评论列表（0条）

在Python中如何用正则表达式提取xml中的<p>之间的内容