Code Monkey home page Code Monkey logo

Comments (20)

ResolveWang avatar ResolveWang commented on May 21, 2024

你插入种子用户id了吗?请查看使用常见问题第十一点

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

哦可以了谢谢,刚创建完数据库的表,我还想抓取别的内容,请问还要在哪些表填数据、是不是都只要填了id就行?

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

这里是所有表的设计。目前在数据抓取方面,只有两张表需要预插入内容,一张是seed_ids,另外一张是keywords,但是如果你想自己指定一些东西抓,就需要读我刚贴出来的表设计链接。

此外,为了避免常见的问题,你可能需要快速浏览一下所有相关文档

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

好的谢谢

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

还想请问一下,之前在抓取然后突然断网了就一直在报no cookies in cookies pool, please find out the reason这个错误,然后我关闭了worker,但重新启动还是一直提示这个错误

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

因为你的账号被封了,程序检测到账号被封了就会报这个错。你重启worker之前需要把redis的db5清空。或者只指定-Q login queue,先登录再指定别的task queue,否则程序在redis中或取不到cookie

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

已解决,谢谢

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

用买来的新浪账号,抓取的数据很多字段都没有,有什么办法让数据比较全面呢,我下载了度盘分享的抓取数据那个就很全面

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

不知道你说的不全是什么意思,账号应该不会是原因。你有确认过你抓取的那些数据那些字段本来都有,然后被漏抓了,还是说你只是看到有很多列是空?企业号本来本项目就没重点处理它的详细信息,你如果有需要,可以自己做一些处理。个人号基本都做到了能看到的信息都抓了,只有极少数会被漏掉。你说不全的另一个可能原因是,本来那个用户就只有少数信息公开。

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

另外,也有可能是项目的某些地方处理不够好,如果你对其进行了完善,欢迎提PR一起改进

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

image
执行了 python3 comment_first.py抓取评论,数据库里没有评论数据

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

......麻烦仔细读文档和相关issue.你需要确定你的任务是否已经执行了。如果你需要一个一个功能的验证,麻烦你把数据设置少一些,或者指定-Q为你需要验证的那个功能!你的问题我觉得主要是任务太多了,worker根本没机会执行评论抓取,因为它还在忙别的任务。

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

image
您好,怎么把tasks都清空呢,用celery purge没反应

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

在redis的db5中,你可以用redis cli或者直接用python shell连接redis db5,然后使用flushdb()来清空

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

好的谢谢解答

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

运行python3 user_first.py:解析出错,具体原因为'NoneType' object has no attribute 'get_text'?然后就停住了

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

提问的话,麻烦参考issue模版,你这样没人能猜出你的问题是什么,并且,看你这个问题是:错误已经被捕获了,常理上worker进程是不会崩的,如果你说的停住是worker还活着,那么就没任何问题。如果worker进程挂了,你可以把你抓取和解析的那个user id贴上来我试试,不过我还是希望你能好好确认一下到底是程序的问题还是你误解了。我们开发者的精力也十分有限,不可能一直回答一些用户自己操作有误的问题。还望理解。

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

嗯嗯是我没表述清楚,明白了谢谢

from weibospider.

hyjpaul avatar hyjpaul commented on May 21, 2024

请问有办法拿到发微博时的定位坐标(经纬度)吗?

from weibospider.

ResolveWang avatar ResolveWang commented on May 21, 2024

能拿到的都是你能看到的。看不到的就拿不到。如果能看到,你可以自己去解析相关代码。据我所知,好像有网友在解析模块中加了这个小功能的。

from weibospider.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.