什么是“蜘蛛程序”？_软件运维

在互联网发展初期，网站相对较少，信息查找比较容易。然而伴随互联网爆炸性的发展，普通网络用户想找到所需的资料简直如同大海捞针，这时为满足大众信息检索需求的专业搜索网站便应运而生了。现代意义上的搜索引擎的祖先，是1990年由蒙特利尔大学学生Alan Emtage发明的Archie。虽然当时World Wide Web还未出现，但网络中文件传输还是相当频繁的，而且由于大量的文件散布在各个分散的FTP主机中，查询起来非常不便，因此Alan Emtage想到了开发一个可以以文件名查找文件的系统，于是便有了Archie。 Archie工作原理与现在的搜索引擎已经很接近，它依靠脚本程序自动搜索网上的文件，然后对有关信息进行索引，供使用者以一定的表达式查询。由于Archie深受用户欢迎，受其启发，美国内华达System Computing Services大学于1993年开发了另一个与之非常相似的搜索工具，不过此时的搜索工具除了索引文件外，已能检索网页。

/**

* 生成站点地图

class sitemap{

private $sitemapFile = array()

private $oldxml = null

private $newxml = null

public $error = null

public function __construct($sitemapFile) {

$this->sitemapFile = $sitemapFile

if(is_file($this->sitemapFile)) {

$data = file_get_contents($this->sitemapFile)

if($data) {

$this->oldxml = new SimpleXMLElement($data)

}else{

$this->error = '读取站点地图文件失败'

}

}else{

$this->oldxml = $this->createEmptySitemap()

}

$this->newxml = $this->createEmptySitemap()

}

public function createEmptySitemap() {

$str = '<?xml version="1.0" encoding="UTF-8"?>'

$str .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> </urlset>'

return new SimpleXMLElement($str)

}

public function addChilds($urlArr) {

$urlArr = (array) $urlArr

foreach($urlArr as $url) {

$priority = 0.5

$lastmod = date('Y-m-d')

$changefreq = 'weekly'

if(stripos($url,'.html')) {

$priority = 1

$changefreq = 'monthly'

}

if($oldXmlUrl = $this->findOldXmlUrl($url)) {

$priority = $oldXmlUrl->priority

$lastmod = $oldXmlUrl->lastmod

$changefreq = $oldXmlUrl->changefreq

}

$rating = $this->newxml->addChild('url')

$rating->addChild('loc',$url)

$rating->addChild('priority',$priority)

$rating->addChild('lastmod',$lastmod)

$rating->addChild('changefreq',$changefreq)

}

public function findOldXmlUrl($url) {

$oldXmlUrl = ''

foreach($this->oldxml->url as $key=>$xmlUrl) {

if($xmlUrl->loc == $url) {

$oldXmlUrl = $xmlUrl

unset($this->oldxml->url[$key])

break

}

return $oldXmlUrl

}

public function save() {

$data = $this->newxml->asXML()

if(file_put_contents($this->sitemapFile,$data) === false) {

$this->error = '写入站点地图数据失败'

return false

}

return true

}

上面这个是我个人博客生成站点地图用的类。

客户端调用代码如下：

$sitemapFile = 'Sitemap.xml'

$sitemap = new sitemap($sitemapFile)

if($sitemap->error) {

die($sitemap->error)

}

$newUrl = [

'http://www.kiscms.com/content/28.html'

]

$sitemap->addChilds()

if(!$sitemap->save()) {

die($sitemap->error)

}

关键的问题是，你如何得到整站的url呢？

我个人博客的解决方法是写了个蜘蛛程序爬出来的。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/yw/7789249.html

什么是“蜘蛛程序”？

发表评论

评论列表（0条）