1. PC网页爬虫
2. H5网页爬虫
3. 微信小程序爬虫
4. 手机APP爬虫
爬取超级猩猩的课表,该平台仅提供了微信小程序这一个途径,前面两种针对html网槐举页的爬取方式都不再适用。
采用抓包分析是我们制定方案的第一步。
我用的Mac电脑,fiddler只有一个简化版,所以另找了Charles这个类似的软件。启动Charles的代理,在手机WIFI中设置好对应的代理就可以开抓了。但是,抓到的https包的内容都是乱码,咋办?
Charles中提供了ssl证书,在手机端安装证书即可。推荐使用iPhone,直接安装描述文件即可。Android手机必须使用吵配系统版本在7.0以下的才行,7.0以上还需要反编译什么的,太麻烦了。
很容易的定位到了超级猩猩微信小程序载入课表的后台接口。拿这个URL在浏览器里访问试试,直接返回了json结果!超级猩猩很友好!
提取对应的铅碰碧URL,放到浏览器中验证,也可以支持返回json包,剩下就是分析一下这个json的数据结构,按照需要的方式导出了。
直接通过接口的爬取效率非常高,几秒钟就拉取了全国各个门店的排课,相当舒心。(下图的录屏没有进行加速)
最后一个挑战就是对只有Android/iOS的APP端应用数据的爬取。请看下一章
请点击: <下一页>
https://nuozhilin.site/2020/02/20/2020-02-20-weixin-crawler-practice/https://www.lizenghai.com/archives/31687.html
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
3
4
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">brew install mitmproxy
mitmdump
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
3
</pre>
|
<pre style="态颂overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">双击 ~/.mitmproxy/mitmproxy-ca-cert.pem
配置 mitmproxy证书为 始终信任
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
3
4
<蚂戚/pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">系统配置 =>网闷闭陵络 =>高级 =>代理
Web Proxy (HTTP) =>127.0.0.1:8080
Secure Web Proxy (HTTPS) =>127.0.0.1:8080
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">拷贝 macOS证书~/.mitmproxy/mitmproxy-ca-cert.pem至手机
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">MIUI11 =>设置 =>加密与凭据 =>从SD卡安装
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">网络 =>代理 =>macos_ip:8080
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
3
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">docker run --name mysql-weixin -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7.17
docker exec -i mysql-weixin mysql -uroot -p123456 <<<"CREATE DATABASE IF NOT EXISTS wechat DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_general_ci"
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">docker run --name redis-weixin -p 6379:6379 -d redis
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
3
4
5
6
7
8
9
10
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">wget https://zbkj-service.oss-cn-beijing.aliyuncs.com/wechat/wechat_spider.zip
unzip wechat_spider.zip &&rm -rf __MACOSX
cd wechat_spider
chmod +x wechat-spider-mac
./wechat-spider-mac
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">docker exec -i mysql-weixin mysql -uroot -p123456 <<<"USE wechatINSERT INTO wechat_account_task (__biz) VALUES('MzIyNzk1MTU2OQ==')"
</pre>
|
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
3
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">MIUI11 =>微信 =>通讯录 =>公众号 =>
"机械指挥官" =>新闻资讯 =>"机械指挥官" (历史消息)
</pre>
|
此时爬虫开始抓取 ./logs/wechat_spider.log日志如下
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1px 20px 1px 1pxcolor: rgb(102, 102, 102)background: rgb(0, 0, 0)line-height: 1.6border: nonetext-align: right">1
2
</pre>
|
<pre style="overflow: autofont-family: consolas, Menlo, "PingFang SC", "Microsoft YaHei", monospacefont-size: 13pxmargin: 0pxpadding: 1pxcolor: rgb(234, 234, 234)background: rgb(0, 0, 0)line-height: 1.6border: none">MainThread|2020-02-20 14:48:17,877|deal_data.py|deal_article_list|line:290|INFO| 抓取到列表底部 无更多文章,公众号 MzIyNzk1MTU2OQ== 抓取完毕
MainThread|2020-02-20 15:00:40,828|deal_data.py|__parse_article_list|line:153|INFO| 采集到上次发布时间 公众号 MzIyNzk1MTU2OQ== 采集完成
</pre>
|
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)