Comments (16)
自行检查是否任务启动及流程
from weibospider.
怎么启动,
自行检查是否任务启动及流程
具体哪个命令启动
from weibospider.
fans_followers
这个队列
from weibospider.
没有抓过粉丝列表和关注列表。这个表就是空的
from weibospider.
没有抓过粉丝列表和关注列表。这个表就是空的
报错
from weibospider.
fans_followers
这个队列
报错
[2020-03-21 16:54:57,151: INFO/MainProcess] Received task: tasks.user.crawl_follower_fans[2768cfb2-337d-4cbf-8aea-635c971d45de]
[2020-03-21 16:54:57,153: ERROR/ForkPoolWorker-1] Task tasks.user.crawl_follower_fans[6c223b6f-8f48-46b3-b59d-c308763835fb] raised unexpected: AttributeError("'NoneType' object has no attribute 'other_crawled'",)
Traceback (most recent call last):
File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/home/wentao/programming/weibospider/tasks/user.py", line 12, in crawl_follower_fans
if seed.other_crawled == 0:
AttributeError: 'NoneType' object has no attribute 'other_crawled'
from weibospider.
检查表seed_ids
from weibospider.
检查表
seed_ids
seed_ids 也里有东西,:
from weibospider.
检查crawl_follower_fans
函数入参以及seed_ids
是否有此条uid记录
from weibospider.
检查
crawl_follower_fans
函数入参以及seed_ids
是否有此条uid记录
但是提示seed没有other_crawled属性
from weibospider.
那大概便是```seed_ids``无此条uid记录
from weibospider.
那大概便是```seed_ids``无此条uid记录
crawl_follower_fans 是从seed_ids 读取数据吗?
seed_ids现在有大批数据,怎么让 crawl_follower_fans 读呢
from weibospider.
可以尝试重写11行SeedidsOper.get_seed_by_id(uid)
使用类似这样的函数SeedidsOper.get_seed_ids从数据库中获取数据
from weibospider.
可以尝试重写11行SeedidsOper.get_seed_by_id(uid)
使用类似这样的函数SeedidsOper.get_seed_ids从数据库中获取数据
我先去试一试看看。
from weibospider.
可以尝试重写11行SeedidsOper.get_seed_by_id(uid)
使用类似这样的函数SeedidsOper.get_seed_ids从数据库中获取数据
报错,是一个bug吗?
[2020-03-25 19:07:12,388: ERROR/ForkPoolWorker-1] Task tasks.user.crawl_follower_fans[23f3c1fd-fc6e-4c5b-b0cc-5d5c6a9ad068] raised unexpected: TypeError('expected string or bytes-like object',)
Traceback (most recent call last):
File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/wentao/programming/weibospider/WeiboSpider/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/home/wentao/programming/weibospider/tasks/user.py", line 19, in crawl_follower_fans
rs = get_fans_or_followers_ids(uid, 1, 1)
File "/home/wentao/programming/weibospider/page_get/user.py", line 159, in get_fans_or_followers_ids
urls_length = public.get_max_crawl_pages(page)
File "/home/wentao/programming/weibospider/page_parse/user/public.py", line 223, in get_max_crawl_pages
m = re.search(pattern, script.string)
File "/usr/lib/python3.6/re.py", line 182, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
user.py 中 script.string 有bug
script.string 这里有bug无法判断是哪个类型一会nontype,一会是<class 'bs4.element.NavigableString'>
无论哪种类型都无法用re 模块抓取
for script in scripts:
#print('i am in '+dir_path,'script is '+script)
#print('script.string:',script.string)
print('type pattern',pattern)
print('pattern', pattern)
print('type:',type(script.string))
m = re.search(pattern, script.string)
if m and 'pl.content.followTab.index' in script.string:
all_info = m.group(1)
cont = json.loads(all_info).get('html', '')
soup = BeautifulSoup(cont, 'html.parser')
pattern = 'uid=(.*?)&'
if 'pageList' in cont:
urls2 = soup.find(attrs={'node-type': 'pageList'}).find_all(attrs={
'class': 'page S_txt1', 'bpfilter': 'page'})
length += len(urls2)
return length
from weibospider.
与issue无关, 建议新开issue
from weibospider.
Related Issues (20)
- 如何限定时间段,爬取从某年月日到某年月日的微博? HOT 3
- 微博关键词搜索 create_time 一栏有四种时间格式,能否统一为一个 20**年**月**日的形式? HOT 3
- user.py 中 script.string 有bug 导致 mysql数据库表user_relation一直是空
- 抓取 user_relation。 user.py 有bug
- 无法启动worker (停在INFO/MainProcess] mingle: all alone不动)
- 启动worker时执行到**[2020-04-02 12:36:58,850: INFO/MainProcess] mingle: all alone**就不再继续 HOT 2
- 执行login_first.py之后显示ValueError: not enough values to unpack (expected 3, got 0) HOT 1
- 运行worker 就报错了,我的redie 配置和爬虫配置密码都对的:[2020-04-25 19:53:50,902: ERROR/MainProcess] consumer: Cannot connect to redis://:**@localhost:6379/6: Client sent AUTH, but no password is set.
- 云打码平台好像失效了,之前那个超级鹰平台的issues下的temp_verification我按照操作来可是出了奇怪的bug,请问能根据新的打码平台更新一下吗,麻烦了 HOT 9
- threading.Thread.isAlive has been deprecated and removed in Python 3.9 in favour of is_alive
- 登入帐号时遇到要求扫码登入,是Weibo有改版吗? HOT 2
- 爬取不到数据,启动 work 页面 输出的都是一些爬取失败 和 warning 信息 类似: HOT 1
- 非酋做配置,试错笔记 HOT 2
- 微博爬虫的合理阈值
- 请问这个爬虫爬取关键词的话是只能爬取50页的上限吗? HOT 1
- 关于更新/维护
- requirements.txt 文件当中的requests、Django版本号请进行修改下,谢谢
- 运行python config/create_all.py 报错 HOT 1
- 运行 python3 config/create_all.py 报错 HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weibospider.