Comments (6)
可能出现的问题就是账号被封,其它没啥问题,你要账号多的话,可以频繁一些。而且抓取不同模块,微博的不同模块限制的抓取间隔也不一样,目前还没做细粒度的控制。
from weibospider.
最想用的是用户主页的抓取,和搜索模块,如果是一个账号推荐最低的间隔是多少,
from weibospider.
搜索的话,限制很严格,如果一个账号的话,你试试20~30秒吧,用户主页的话,大概十来秒。这个我也不能说绝对是这样,只能给一个大概的范围
from weibospider.
那岂不是在源码文件中原来设置的小时已经是很宽松了的?也就是在tasks/workers.py 设置 hours minutes seconds 成需要间隔多少时间对应的值是吗?
from weibospider.
读文档吧。我感觉你没读文档,我希望你是读了文档再来提问的。
一个是抓取间隔,一个是定时任务间隔,两个是不一样的
在spider.yaml
中是抓取间隔,就是每两个http request的发送间隔
min_crawl_interal: 10 # min interal of http request
max_crawl_interal: 20 # max interal of http request
在workers.py
中是定时任务间隔,比如你要用微博搜索,我会在这一轮搜完过后,再过N
个小时再进行搜索,而不会这轮搜完马上就进行下一轮搜索
from weibospider.
好的,麻烦了
from weibospider.
Related Issues (20)
- 如何限定时间段,爬取从某年月日到某年月日的微博? HOT 3
- mysql数据库里user_relation这样表 一直是空,是哪里有问题? HOT 16
- 微博关键词搜索 create_time 一栏有四种时间格式,能否统一为一个 20**年**月**日的形式? HOT 3
- user.py 中 script.string 有bug 导致 mysql数据库表user_relation一直是空
- 抓取 user_relation。 user.py 有bug
- 无法启动worker (停在INFO/MainProcess] mingle: all alone不动)
- 启动worker时执行到**[2020-04-02 12:36:58,850: INFO/MainProcess] mingle: all alone**就不再继续 HOT 2
- 执行login_first.py之后显示ValueError: not enough values to unpack (expected 3, got 0) HOT 1
- 运行worker 就报错了,我的redie 配置和爬虫配置密码都对的:[2020-04-25 19:53:50,902: ERROR/MainProcess] consumer: Cannot connect to redis://:**@localhost:6379/6: Client sent AUTH, but no password is set.
- 云打码平台好像失效了,之前那个超级鹰平台的issues下的temp_verification我按照操作来可是出了奇怪的bug,请问能根据新的打码平台更新一下吗,麻烦了 HOT 9
- threading.Thread.isAlive has been deprecated and removed in Python 3.9 in favour of is_alive
- 登入帐号时遇到要求扫码登入,是Weibo有改版吗? HOT 2
- 爬取不到数据,启动 work 页面 输出的都是一些爬取失败 和 warning 信息 类似: HOT 1
- 非酋做配置,试错笔记 HOT 2
- 微博爬虫的合理阈值
- 请问这个爬虫爬取关键词的话是只能爬取50页的上限吗? HOT 1
- 关于更新/维护
- requirements.txt 文件当中的requests、Django版本号请进行修改下,谢谢
- 运行python config/create_all.py 报错 HOT 1
- 运行 python3 config/create_all.py 报错 HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weibospider.