Comments (4)
是在weibo.yaml下设置uid
把配置贴出来看看,注意帐号密码隐藏一下。
from cola.
好的
job:
db: sina
mode: bundle # also can be bundle
size: 1000 # the destination (including bundle or url) size
speed:
max: -1 # to the cluster, -1 means no restrictions, if greater than 0, means webpages opened per minute
single: -1 # max restrictions to a single instance
adaptive: no
instances: 2
priorities: 13 # priorities queue count in mq
copies: 1 # redundant size of objects in mq
inc: yes
shuffle: no # only work in bundle mode, means the urls in a bundle will shuffle before fetching
error:
network:
retries: -1 # 0 means no retry, -1 means keeping on trying
span: 20 # seconds span to retry
ignore: no # only work under bundle mode, if True will ignore this url and move to the next after several tries, or move to the next bundle
server: # like 404 or 500 error returned by server
retries: 5
span: 10
ignore: no
components:
deduper:
cls: cola.core.dedup.FileBloomFilterDeduper
mongo:
host: localhost
port: 27017
login:
- username: ***
password: ***
starts:
- uid: 3045198074
- uid: 3969344084
fetch:
forward: no
comment: no
like: no
clear: no
发现总是会get同样的uid
比如:
get 1701018393 url: http://weibo.com/aj/mblog/mbloglist?count=15&pre_page=28&uid=1701018393&end_id=3907520622156990&_t=0&_k=1447243354911000&__rnd=1455415453462&pagebar=1&max_id=3618682338871099&page=28
from cola.
OK,可能是你之前跑过了,然后重设了uid以后会接着之前的跑。
可以把clear设置成yes看看行不行。(这个选项每次会重新开始,下次确认要接着抓取需要改回来)
from cola.
ok 解决了 非常感谢
from cola.
Related Issues (20)
- json.loads(br.response().read())["data"] HOT 1
- windows下coca无法启动分布式程序 HOT 1
- 遇到执行weibosearch的时候包不存在包问题 HOT 1
- 在CentOS 6中无法运行
- instances设置为大于core个数时,会出问题,过一段时间就会停止爬取了
- 在parser中获取网页html信息时卡住出不来
- 抓取网页出现HTTP ERROR处理问题
- 在抓取过程中突然卡住三四个小时,ctrl C不会退出。应该是mq处理出现问题了 HOT 1
- Failed to save to db, weakly-referenced object no longer exists HOT 2
- ValueError: No JSON object could be decoded HOT 8
- 爬取follow列表的问题 HOT 2
- 爬取新浪微博出错 HOT 3
- 看了下,和上一个issues的log是一样的,应该是mq没有保护好的问题把
- 分布式爬取中,worker的主备mq同步问题
- 任务现场保存问题,任务现场保存在tmp里面,重启pc tmp会被清空
- 不太明白weibo.yaml里面的部分配置,有详细的一对一解释吗? HOT 2
- 还有更新的打算么? HOT 2
- 任务执行完成后为什么始终不退出 HOT 5
- Fix simple typo: falese -> false
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cola.