hightman / pspider Goto Github PK
View Code? Open in Web Editor NEW纯 PHP 开发的并行抓取工具 (Parallel web crawler written in PHP)
纯 PHP 开发的并行抓取工具 (Parallel web crawler written in PHP)
我想要实现以下抓twitter页面的功能,但是把twitter URL输进去后出来这个错误:
Unable to find the socket transport "ssl" - did you forget to enable it when you configured PHP?
不然抓一些被屏蔽的网址内容,就用不上了。
CURL 扩展是可以支持的。
当我把 const MAX_BURST 设置大于1时 去抓取 多个不同的地址时 会出现不准确和串数据的情况。
如我设置为10 去抓100个网站,他们每次会去抓取10个,但返回的结果不准确。
当单独抓这些不准确的地址时结果是准确的。
我以为是超时时间太短的原因,当我设置超时时间时,但他并没有起到作用。跪求解决哇
PROCESSING: http://www.hao123.com/
PROCESSING: http://www.hao123.com/u;h.target='_top';document.body.appendChild(h);var
01-23 04:03:31 - Time cost: 0 secs, URLs total: 1, Add: 1, Update: 2, Filtered: 0
OK, finished!
我自己测试 curl multi 比 socket 有更好的并行无阻塞的特性,不知道 hightman 大神是出于什么原因选择了 socket 来做 HttpClient?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.