Python Beautiful Soup .content属性_随笔

Python Beautiful Soup .content属性

它只是为您提供标记内的内容。让我用一个例子演示：

html_doc = """<html><head><title>The Dormouse's story</title></head><p ><b>The Dormouse's story</b></p><p >once upon a time there were three little sisters; and their names were<a href="http://example.com/elsie"  id="link1">Elsie</a>,<a href="http://example.com/lacie"  id="link2">Lacie</a> and<a href="http://example.com/tillie"  id="link3">Tillie</a>;and they lived at the bottom of a well.</p><p >...</p>"""from bs4 import BeautifulSoupsoup = BeautifulSoup(html_doc)head = soup.headprint head.contents

上面的代码给了我一个清单，

[<title>The Dormouse's story</title>]

因为这就是里面
的

head

标签。因此，致电

[0]

会给您列表中的第一项。

出现错误的原因是因为

soup.contents[0].contents[0].contents[0].contents[0]

返回的内容没有其他标签（因此没有属性）。它

PageTitle

从您的代码返回，因为第一个

contents[0]

给您HTML标记，第二个给您

head

标记。第三个指向

title

标签，第四个为您提供实际内容。因此，当您调用

name

它时，它没有标签可提供。

如果要打印正文，可以执行以下 *** 作：

soup = BeautifulSoup(''.join(doc))print soup.body

如果只想

body

使用

contents

，请使用以下命令：

soup = BeautifulSoup(''.join(doc))print soup.contents[0].contents[1].name

您不会将其

[0]

用作索引，因为它

body

是之后的第二个元素

head

。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5648424.html

Python Beautiful Soup .content属性

发表评论

评论列表（0条）