java中如何根据一个网址获得该网页的源代码?

java中如何根据一个网址获得该网页的源代码?,第1张

package test

import java.io.BufferedReader

import java.io.InputStreamReader

import java.net.HttpURLConnection

import java.net.URL

public class HttpTest {

private String u

private String encoding

public static void main(String[] args) throws Exception {

HttpTest client = new HttpTest("http://www.baidu.com/", "UTF-8")

client.run()

}

public HttpTest(String u, String encoding) {

this.u = u

this.encoding = encoding

}

public void run() throws Exception {

URL url = new URL(u)// 根据链接(字符串格式),生成一个URL对象

HttpURLConnection urlConnection = (HttpURLConnection) url

.openConnection()// 打开URL

BufferedReader reader = new BufferedReader(new InputStreamReader(

urlConnection.getInputStream(), encoding))// 得到输入流,即获得了网页的内容

String line// 读取输入流的数据,并缺兄显示

while ((line = reader.readLine()) != null) {

System.out.println(line)

}

}

}

根据具体悔困问题类型,进行步骤拆解/原碧扮念因原理分析/内容拓展等。

具体步骤如下:/导致这种情况的原因主要是……

传入一个url,返回源代码; public static String getHTML(String url){// 获取指定唯哪URL的网页,返回网指漏码页内容的字符串,然后将此字符串搜世存到文件即可 try { URL newUrl = new URL(url)URLConnection connect = newUrl.openConnection()connect.setRequestProperty("User-Agent","Mozilla/4.0 (compatibleMSIE 5.0Windows NTDigExt)")DataInputStream dis = new DataInputStream(connect.getInputStream())BufferedReader in = new BufferedReader(new InputStreamReader(dis,"UTF-8"))String html = ""String readLine = nullwhile((readLine = in.readLine()) != null) { html = html + readLine} in.close()return html}catch (MalformedURLException me){ System.out.println("MalformedURLException" + me)}catch (IOException ioe){ System.out.println("ioeException" + ioe)} return null}


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/yw/12552124.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-05-26
下一篇 2023-05-26

发表评论

登录后才能评论

评论列表(0条)

保存