你只需要设置用户代理标头即可使其工作:
URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");connection.connect();BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));StringBuilder sb = new StringBuilder();String line;while ((line = r.readLine()) != null) { sb.append(line);}System.out.println(sb.toString());
从异常堆栈跟踪可以看出,已为你透明地处理了SSL。
但是,获取结果数量并不是真的那么简单,在此之后,你必须通过获取cookie并解析重定向令牌链接来假冒你是浏览器。
String cookie = connection.getHeaderField( "Set-cookie").split(";")[0];Pattern pattern = Pattern.compile("content=\"0;url=(.*?)\"");Matcher m = pattern.matcher(response);if( m.find() ) { String url = m.group(1); connection = new URL(url).openConnection(); connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11"); connection.setRequestProperty("cookie", cookie ); connection.connect(); r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8"))); sb = new StringBuilder(); while ((line = r.readLine()) != null) { sb.append(line); } response = sb.toString(); pattern = Pattern.compile("<div id="resultStats">about ([0-9,]+) results</div>"); m = pattern.matcher(response); if( m.find() ) { long amount = Long.parseLong(m.group(1).replaceAll(",", "")); return amount; }}
运行我得到的完整代码2930000000L。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)