如何使用urllib2从Python中打开的URL中提取特定数据？_python

概述我是 Python的新手,正在玩一个非常基本的网络爬虫.例如,我做了一个简单的功能来加载显示在线游戏的高分的页面.所以我能够获得html页面的源代码,但我需要从该页面中绘制特定的数字.例如,网页如下所示： http://hiscore.runescape.com/hiscorepersonal.ws?user1=bigdrizzle13 其中’bigdrizzle13’是链接的独特部分.需要绘制并我是 Python的新手,正在玩一个非常基本的网络爬虫.例如,我做了一个简单的功能来加载显示在线游戏的高分的页面.所以我能够获得HTML页面的源代码,但我需要从该页面中绘制特定的数字.例如,网页如下所示：

http://hiscore.runescape.com/hiscorepersonal.ws?user1=bigdrizzle13

其中’bigdrizzle13’是链接的独特部分.需要绘制并返回该页面上的数字.从本质上讲,我想构建一个程序,我所要做的就是输入’bigdrizzle13’并输出这些数字.

解决方法正如另一张海报所提到的,BeautifulSoup是这项工作的绝佳工具.

这是整个夸张评论的程序.它可能会使用很多容错,但只要您输入有效的用户名,它就会从相应的网页中提取所有分数.

我尽力发表评论.如果你对BeautifulSoup很感兴趣,我强烈建议你使用BeautifulSoup documentation方便我的例子.

整个计划……

from urllib2 import urlopenfrom BeautifulSoup import BeautifulSoupimport sysURL = "http://hiscore.runescape.com/hiscorepersonal.ws?user1=" + sys.argv[1]# Grab page HTML,create BeatifulSoup objectHTML = urlopen(URL).read()soup = BeautifulSoup(HTML)# Grab the <table ID="mini_player"> elementscores = soup.find('table',{'ID':'mini_player'})# Get a List of all the <tr>s in the table,skip the header rowrows = scores.findAll('tr')[1:]# Helper function to return concatenation of all character data in an elementdef parse_string(el):   text = ''.join(el.findAll(text=True))   return text.strip()for row in rows:   # Get all the text from the <td>s   data = map(parse_string,row.findAll('td'))   # Skip the first td,which is an image   data = data[1:]   # Do something with the data...   print data

这是一个测试运行.

> test.py bigdrizzle13[u'Overall',u'87,417',u'1,784',u'78,772,017'][u'Attack',u'140,903',u'88',u'4,509,031'][u'Defence',u'123,057',u'85',u'3,449,751'][u'Strength',u'325,883',u'84',057,628'][u'Hitpoints',u'245,982',571,420'][u'Ranged',u'583,645',u'71',u'856,428'][u'Prayer',u'227,853',u'62',u'357,847'][u'Magic',u'368,201',u'75',264,042'][u'Cooking',u'34,754',u'99',u'13,192,745'][u'Woodcutting',u'50,080',u'93',u'7,751,265'][u'Fletching',u'53,269',051,939'][u'Fishing',u'5,195',u'14,512,569'][u'Firemaking',u'46,398',677,933'][u'Crafting',u'328,268',u'343,143'][u'Smithing',u'39,898',u'77',561,493'][u'Mining',u'31,584',331,051'][u'Herblore',u'247,149',u'52',u'135,215'][u'Agility',u'225,869',u'60',u'276,753'][u'ThIEving',u'292,638',u'56',u'193,037'][u'Slayer',u'113,245',u'73',u'998,607'][u'Farming',u'204,608',u'51',u'115,507'][u'Runecraft',u'38,369',u'880,789'][u'Hunter',u'384,920',u'53',u'139,030'][u'Construction',u'232,379',u'125,708'][u'Summoning',236',u'64',u'419,086']

瞧:)

总结

以上是内存溢出为你收集整理的如何使用urllib2从Python中打开的URL中提取特定数据？全部内容，希望文章能够帮你解决如何使用urllib2从Python中打开的URL中提取特定数据？所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/1195078.html

如何使用urllib2从Python中打开的URL中提取特定数据？

发表评论

评论列表（0条）