1. wget
wget -t 10 –limit-rate 50k -Q 10M -c http://www.linuxeye.com -O linuxeye.html -o download.log
-t 指定重试次数
–limit-rate 下载限速
-Q 最大下载配额(quota)
-c 断点续传
-O 指定输出文件名
-o 指定一个日志文件
wget -r -N -l 2 http://www.linuxeye.com
-r recursive,递归
-N 允许对文件使用时间戳
-l 向下遍历指定的页面级数
访问需要认证的http或ftp页面
wget –user username –password pass URL
也可以不在命令行中指定密码,而由网页提示并手动输入密码,这就需要将–password改成–ask-password 。
2. curl
curl http://www.linuxeye.com –silent -o linuxeye.html
–silent 不显示进度信息,如果需要这些信息,将–silent移除
-o 将下载数据写入文件,而非标准输出
–progress 以#显示进度信息
-C 断点续转
–referer 设置参照页字符串
–cookie 设置cookie
–user-agent 设置用户代理字符串
–limit-rate 限制带宽
–max-filesize 指定最大下载量
-u 认证(curl -u userpass http://www.linuxeye.com)
-I 只答应响应头部信息
3. 从命令行访问Gmail
#curl -u username@gmail.com:password –silent
“https://mail.google.com/mail/feed/atom” | tr -d ‘n’ | sed ‘s::n:g’ |
sed
‘s/.*(.*)([^<]*).*/Author:
2 [3] n Subject: 1n/’ Author: Facebook
[update+kjdm15577-jd@facebookmail.com] Subject: Facebook的有趣专页
Author: offers [offers@godaddy.com]
Subject: Reminder: Get 25% OFF your order – no minimum!
Author: Google+ team [noreply-475ba29f@plus.google.com]
Subject: Top 3 posts for you on Google+ this week
Author: Facebook [update+kjdm15577-jd@facebookmail.com]
Subject: Facebook的有趣专页
curl -u username@gmail.com:password –silent
“https://mail.google.com/mail/feed/atom” | perl -ne ‘print “t” if //;
print “$2n” if /<(title|name)>(.*)1>/;’
4. 从网友上抓取并下载图片的bash脚本
#!/bin/bash #FileName : img_downloader.sh if [ $# -ne 3 ]; then echo "Usage:$0 URL -d DIRECTORY" exit -1 fi for i in {1..4} do case $1 in -d) shift; directory=$1; shift ;; *) url=${url:-$1};shift;; esac done mkdir -p $directory; baseurl=$(echo $url | egrep -o "https?://[a-z.]+") echo $baseurl curl -s $url | egrep -o "]*>" | awk -F""|'" '{print $2}' > /tmp/$$.list sed -i "s|^/|$baseurl/|" /tmp/$$.list cd $directory; while read filename; do echo $filename curl -s -O "$filename" --silent done < /tmp/$$.list
说明:原书脚本用sed截取图片的绝对路径只能用双引号情况下,而很多图片绝对路径可能有单引号,于是我用awk处理,也可以用sed在原脚本的基础上修改
5. curl查找网上无效链接bash脚本
#!/bin/bash if [ $# -eq 2 ]; then echo -e "$Usage $0 URLn" exit -1; fi echo Broken links: mkdir /tmp/$$.lynx cd /tmp/$$.lynx lynx -traversal $1 > /dev/null count=0; sort -u reject.dat > links.txt while read link; do output=`curl -I $link -s | grep "HTTP/.*OK"`; if [[ -z $output ]]; then echo $link; let count++ fi done < links.txt [ $count -eq 0 ] && echo No broken links found.
Thu Jan 10 13:30:52 CST 2013
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)