https://www.researchgate.net/topic/biotechnology
目前有206770个关注此主题的粉丝.当我单击“查看全部”按钮时,会出现一个d出窗口,其中会显示一个列表,并且随着我的下降而不断扩展.
https://www.researchgate.net/profile/Kestutis_Sasnauskas
…
以上是顶级追随者的链接.有没有办法可以获得所有206770粉丝的网页链接?
解决方法 这可以通过使用rvest和RSelenium来完成.后者基本上是需要的,前者会让你的生活更轻松.从github devtools :: install_github(“ropensci / RSelenium”)安装RSelenium.从头上掠过.以下是完成所需内容所需的代码.
siteUrl <- "http://www.researchgate.net/"GateUrl <- "http://www.researchgate.net/publictopics.KeywordFollowersPeopleList.HTML?vIEw=dialog&showFollowbutton=1&followEvent=tp_followers_xflw&keywordID=4f15497280e582373c000000&offset="library(rvest)library(RSelenium)checkForServer()startServer()remDrv <- remoteDriver()remDrv$open(silent = FALSE)i <- 0profileUrls <- c()for(j in 1:3){ print(j) remDrv$navigate(paste0(GateUrl,i)) l <- HTML(remDrv$getPageSource()[[1]]) profileUrls <- c(profileUrls,paste0(siteUrl,l %>% HTML_nodes(".display-name") %>% xml_attr("href"))) i <- length(profileUrls)+1}remDrv$close()profileUrls
这里有几件事.你需要弄清楚j循环.我认为每个网址都会收集38个配置文件,因此j应该类似于(j in 1:(followers / 38)).
第二点是代码在保存链接的方式上效率不高,即每次都附加代码.更好的解决方案是使用lapply和unList.
最后一点你需要在你的机器上使用mozilla firefox,因为这是RSelenium使用的默认设置,尽管你可以将它设置为使用你最常用的浏览器.
结果
从第56个
> profileUrls[1] "http://www.researchgate.net/profile/Jose_Carbajo2" [2] "http://www.researchgate.net/profile/DanIEle_Riccio" [3] "http://www.researchgate.net/profile/Fiona_Togneri2" [4] "http://www.researchgate.net/profile/Sukanya_Patel" [5] "http://www.researchgate.net/profile/Neri_Fattorini" [6] "http://www.researchgate.net/profile/Pham_Thi_Thuy_Van" [7] "http://www.researchgate.net/profile/Kestutis_Sasnauskas" [8] "http://www.researchgate.net/profile/Iris_Weintal" [9] "http://www.researchgate.net/profile/GodelIEve_Verhaegen" [10] "http://www.researchgate.net/profile/Janani_Venkatraman2" [11] "http://www.researchgate.net/profile/Kai_Wang126" [12] "http://www.researchgate.net/profile/Irine_Ronin" [13] "http://www.researchgate.net/profile/Natasha_Ikhsan" [14] "http://www.researchgate.net/profile/Nadya_Hajar" [15] "http://www.researchgate.net/profile/Gayatr_Venkataraman2" [16] "http://www.researchgate.net/profile/Amsha_Viraragavan" [17] "http://www.researchgate.net/profile/Wei_Leiyan" [18] "http://www.researchgate.net/profile/Yosuke_Inada" [19] "http://www.researchgate.net/profile/Nadya_Hajar" [20] "http://www.researchgate.net/profile/Gayatr_Venkataraman2" [21] "http://www.researchgate.net/profile/Amsha_Viraragavan" [22] "http://www.researchgate.net/profile/Wei_Leiyan" [23] "http://www.researchgate.net/profile/Yosuke_Inada" [24] "http://www.researchgate.net/profile/Yongning_You" [25] "http://www.researchgate.net/profile/Susan_Hu6" [26] "http://www.researchgate.net/profile/Matt_Evans11" [27] "http://www.researchgate.net/profile/Nam_KIEu" [28] "http://www.researchgate.net/profile/Nur_Musa3" [29] "http://www.researchgate.net/profile/Varaporn_S" [30] "http://www.researchgate.net/profile/Askar_Begzat3" [31] "http://www.researchgate.net/profile/Bing_Wang63" [32] "http://www.researchgate.net/profile/Xuebin_Yan" [33] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"[34] "http://www.researchgate.net/profile/Stephen_Heimann" [35] "http://www.researchgate.net/profile/Hanina_Hanifa" [36] "http://www.researchgate.net/profile/Bo_Wang143" [37] "http://www.researchgate.net/profile/Xuebin_Yan" [38] "http://www.researchgate.net/profile/Roberto_Sibaja_Hernandez"[39] "http://www.researchgate.net/profile/Stephen_Heimann" [40] "http://www.researchgate.net/profile/Hanina_Hanifa" [41] "http://www.researchgate.net/profile/Bo_Wang143" [42] "http://www.researchgate.net/profile/Huili_li5" [43] "http://www.researchgate.net/profile/Giuseppe_Infusini" [44] "http://www.researchgate.net/profile/Carmen_Wacher" [45] "http://www.researchgate.net/profile/linyn_linyn" [46] "http://www.researchgate.net/profile/Dan_Youel" [47] "http://www.researchgate.net/profile/Catherine_Williams16" [48] "http://www.researchgate.net/profile/Nichole_Macaraeg" [49] "http://www.researchgate.net/profile/Peter_Oroszlan" [50] "http://www.researchgate.net/profile/Eduard_Karamov" [51] "http://www.researchgate.net/profile/Mauricio_Franco3" [52] "http://www.researchgate.net/profile/Patricia_Zancan" [53] "http://www.researchgate.net/profile/Rohana_Dassanayake" [54] "http://www.researchgate.net/profile/Khadija_Khataby" [55] "http://www.researchgate.net/profile/Imane_Moest" [56] "http://www.researchgate.net/profile/Rory_Adey"总结
以上是内存溢出为你收集整理的从d出窗口中提取Web全部内容,希望文章能够帮你解决从d出窗口中提取Web所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)