我个人写道:
# Python 2.7import urlliburl = 'http://www.boursorama.com/includes/cours/last_transactions.phtml?symbole=1xEURUS'sock = urllib.urlopen(url)content = sock.read() sock.close()print content
Et si tu parlesfrançais,.. bonjour sur stackoverflow.com!
更新1实际上,我现在喜欢使用以下代码,因为它更快。
# Python 2.7import httplibconn = httplib.HTTPConnection(host='www.boursorama.com',timeout=30)req = '/includes/cours/last_transactions.phtml?symbole=1xEURUS'try: conn.request('GET',req)except: print 'echec de connexion'content = conn.getresponse().read()print content
将此代码更改
httplib为
http.client足以使其适应Python 3。
。
我确认,使用这两个代码,可以获得获取您感兴趣的数据的源代码:
更新2<td width="33%" align="center">11:57:44</td> <td width="33%" align="center">1.4486</td> <td width="33%" align="center">0</td></tr> <tr> <td width="33%" align="center">11:57:43</td> <td width="33%" align="center">1.4486</td> <td width="33%" align="center">0</td></tr>
在上面的代码中添加以下代码段,即可提取我想要的数据:
for i,line in enumerate(content.splitlines(True)): print str(i)+' '+repr(line)print 'nn'import reregx = re.compile('tttttt<td width="33%" align="center">(dd:dd:dd)</td>rn' 'tttttt<td width="33%" align="center">([d.]+)</td>rn' 'tttttt<td width="33%" align="center">(d+)</td>rn')print regx.findall(content)
结果(仅结尾)
............................................................................................................................................................98 'window.config.graphics = {};n'99 'window.config.accordions = {};n'100 'n'101 "window.addEvent('domready', function(){n"102 '});n'103 '</script>n'104 '<script type="text/javascript">n'105 'ttttsas_tmstp = Math.round(Math.random()*10000000000);n'106 'ttttsas_pageid = "177/(includes/cours/last_transactions)"; // Page : boursorama.com/smartad_testn'107 'ttttvar sas_formatids = "8968";n'108 'ttttsas_target = "symb=1xEURUS#"; // TargetingArrayn'109 'ttttdocument.write("<scr"+"ipt src=\"http://ads.boursorama.com/call2/pubjall/" + sas_pageid + "/" + sas_formatids + "/" + sas_tmstp + "/" + escape(sas_target) + "?\"></scr"+"ipt>");ttttn'110 'ttt</script><div id="_smart1"><script language="javascript">sas_script(1,8968);</script></div><script type="text/javascript">rn'111 "twindow.addEvent('domready', function(){rn"112 'sas_move(1,8968);t});rn'113 '</script>n'114 '<script type="text/javascript">n'115 'var _gaq = _gaq || [];n'116 "_gaq.push(['_setAccount', 'UA-1623710-1']);n"117 "_gaq.push(['_setDomainName', 'www.boursorama.com']);n"118 "_gaq.push(['_setCustomVar', 1, 'segment', 'WEB-VISITOR']);n"119 "_gaq.push(['_setCustomVar', 4, 'version', '18']);n"120 "_gaq.push(['_trackPageLoadTime']);n"121 "_gaq.push(['_trackPageview']);n"122 '(function() {n'123 "var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;n"124 "ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';n"125 "var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);n"126 '})();n'127 '</script>n'128 '</body>n'129 '</html>'[('12:25:36', '1.4478', '0'), ('12:25:33', '1.4478', '0'), ('12:25:31', '1.4478', '0'), ('12:25:30', '1.4478', '0'), ('12:25:30', '1.4478', '0'), ('12:25:29', '1.4478', '0')]
我希望您不打算在外汇交易中“玩”交易:这是快速散布资金的最佳方法之一。
更新3对不起!我忘记了您使用Python3。因此,我认为您必须这样定义正则表达式:
regx = re.compile( b ‘ t t t t t ......)
也就是说在字符串之前加上 b
,否则您将收到类似此问题的错误
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)