python解析库

python解析库,第1张

概述BeautifulSoup示例: #!/usr/bin/env python# -*- coding: utf-8 -*-# author: imcatihtml_doc = """<html><head><title>The Dormouse‘s story</title></head><body><p class="title"><b>The Dormouse‘s

BeautifulSoup示例:

#!/usr/bin/env python# -*- Coding: utf-8 -*-# author: imcatiHTML_doc = """<HTML><head><Title>The Dormouse‘s story</Title></head><body><p class="Title"><b>The Dormouse‘s story</b><b>123</b></p><p class="story">Once upon a time there were three little sisters; and their names were<a href="http://example.com/elsIE" class="sister" ID="link1">ElsIE</a>,<a href="http://example.com/lacIE" class="sister" ID="link2">LacIE</a> and<a href="http://example.com/tillIE" class="sister" ID="link3">TillIE</a>;and they lived at the bottom of a well.</p><p class="story">...</p>"""from bs4 import BeautifulSoupsoup = BeautifulSoup(HTML_doc,‘HTML.parser‘)#格式化输出对象内容#print(soup.prettify())#根据标签名获取整个标签,取第一个值print(soup.a)#获取标签的名字print(soup.Title.name)#获取标签中的文本print(soup.Title.string)#获取Title标签的父标签print(soup.Title.parent.name)#获取p标签的子标签print(soup.p.contents)#获取标签的属性值(两种方式)print(soup.p["class"])print(soup.p.attrs["class"])#使用select、css选择器 类名前加.,ID名前加#print(soup.select("a"))print(soup.select(".Title"))#获取内容print(soup.select(".Title")[0])print(soup.select(".Title")[0].get_text())#获取属性值print(soup.select(".Title")[0].attrs["class"])#获取p下面的子标签内容print(soup.select(‘p > b‘)[1].get_text())#使用find、findall进行查找 find返回第一个查找结果,find_all返回所有查找结果print(soup.find(‘p‘,attrs={"class":"Title"}))print(soup.find_all(‘p‘,attrs={"class":"Title"}))输出:<a class="sister" href="http://example.com/elsIE" ID="link1">ElsIE</a> TitleThe Dormouse‘s storyhead[<b>The Dormouse‘s story</b>,<b>123</b>][‘Title‘][‘Title‘][<a class="sister" href="http://example.com/elsIE" ID="link1">ElsIE</a>,<a class="sister" href="http://example.com/lacIE" ID="link2">LacIE</a>,<a class="sister" href="http://example.com/tillIE" ID="link3">TillIE</a>][<p class="Title"><b>The Dormouse‘s story</b><b>123</b></p>]<p class="Title"><b>The Dormouse‘s story</b><b>123</b></p>The Dormouse‘s story123[‘Title‘]123<p class="Title"><b>The Dormouse‘s story</b><b>123</b></p>
[<p class="Title"><b>The Dormouse‘s story</b><b>123</b></p>]
总结

以上是内存溢出为你收集整理的python解析库全部内容,希望文章能够帮你解决python解析库所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/langs/1195608.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-03
下一篇 2022-06-03

发表评论

登录后才能评论

评论列表(0条)

保存