查找带有Beautifulsoup的特定链接_随笔

查找带有Beautifulsoup的特定链接

首先设置一个测试文档，并使用BeautifulSoup打开解析器：

>>> from BeautifulSoup import BeautifulSoup>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'>>> soup = BeautifulSoup(doc)>>> print soup.prettify()<html> <body>  <div>   <a href="something">    yep   </a>  </div>  <div>   <a href="http://www.nhl.com/ice/boxscore.htm?id=3">    somelink   </a>  </div>  <a href="http://www.nhl.com/ice/boxscore.htm?id=7">   another  </a> </body></html>

接下来，我们可以搜索所有

<a>

以

href

属性开头的标签

http://www.nhl.com/ice/boxscore.htm?id=

。您可以为其使用正则表达式：

>>> import re>>> soup.findAll('a', href=re.compile('^http://www.nhl.com/ice/boxscore.htm?id='))[<a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a>, <a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a>]

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5663017.html

查找带有Beautifulsoup的特定链接

发表评论

评论列表（0条）