Whalebot is open-source web crawler. It is intended to be simple, fast and memory efficient. It was created as a targeted spider, but you may use it as common.
Current release 0.02
Current state. Bold - done, normal - TODO
If something broken or you have an idea, please visit http://groups.google.com/group/whalebot
Usages- It was used for collecting papers on target thematic from http://citeseerx.ist.psu.edu for my master degree work
- Candidates for logo were collected using whalebot
- Eating own dogs food (links for url parsing benchmark)
- Simple configuration from command line and text files
- Start/Stop/Resume fetching sessions
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)