Code Monkey home page Code Monkey logo

spider_world's People

Contributors

hacksman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spider_world's Issues

用别人的抖音号 一直报video_infos 'NoneType' object is not iterable

None
Traceback (most recent call last):
File "video_download_run.py", line 38, in
douyin_crawl.grab_user_media(sys.argv[-1], "USER_LIKE")
File "../www_douyin_com/spiders/douyin_crawl.py", line 110, in grab_user_media
hasmore, max_cursor = self.grab_video(user_id, action, content)
File "../www_douyin_com/spiders/douyin_crawl.py", line 141, in grab_video
for per_video in video_infos:
TypeError: 'NoneType' object is not iterable
除了demo提供的抖音号 别的都是不行的 作者方便给个群或者联系方式嘛

求助 JSONDecodeError

**Traceback (most recent call last):
File "video_download_run.py", line 32, in
douyin_crawl.grab_user_media(sys.argv[-1], "USER_POST")
File "../www_douyin_com/spiders/douyin_crawl.py", line 110, in grab_user_media
hasmore, max_cursor = self.grab_video(user_id, action, content)
File "../www_douyin_com/spiders/douyin_crawl.py", line 124, in grab_video
real_url = gen_url(self.token, url, query_params)
File "../www_douyin_com/common/utils.py", line 57, in gen_url
resp = requests.post(URL.api_sign(token), json={"url": url}).json()
File "/home/vts/anaconda3/lib/python3.6/site-packages/requests/models.py", line 892, in json
return complexjson.loads(self.text, kwargs)
File "/home/vts/anaconda3/lib/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/home/vts/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)

KeyError: 'data'

Traceback (most recent call last):
File "video_download_run.py", line 13, in
douyin_crawl = DouyinCrawl()
File "..\www_douyin_com\spiders\douyin_crawl.py", line 71, in init
self.common_params = common_params(self)
File "..\www_douyin_com\common\utils.py", line 75, in common_params
device_info = getDevice(self)
File "..\www_douyin_com\common\utils.py", line 44, in getDevice
device_info = resp['data']
KeyError: 'data'

无法获得作者或作品ID

为什么我扫出来的ID是六位数字和字母的组合?
然后就:
2018-11-29 21:31:44,886 - utils.py[line:104] INFO - 请输入正确的用户id, 用户id为10,11,12或13位纯数字...
Traceback (most recent call last):
File "video_download_run.py", line 32, in
douyin_crawl.grab_user_media(sys.argv[-1], "USER_POST")
File "../www_douyin_com/common/utils.py", line 105, in wrapper
raise Exception
Exception

用户id长度判断有问题

if not re.findall('^\d{11}$', user_id) or not re.findall('^\d{12}$', user_id): self.logger.info("请输入正确的用户id, 用户id为11或12位纯数字...") return
判断 改成
if not (re.findall('^\d{11}$', user_id) or re.findall('^\d{12}$', user_id)): self.logger.info("请输入正确的用户id, 用户id为11或12位纯数字...") return

抖音好像不能爬了

PS D:\SourceCode\douyin\spider_world\www_douyin_com> python .\video_download_run.py -upost 66076741938
Traceback (most recent call last):
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\connection.py", line 56, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 849, in _validate_conn
    conn.connect()
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 314, in connect
    conn = self._new_conn()
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x036E0D10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 445, in send
    timeout=timeout
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.appsign.vip', port=2688): Max retries exceeded with url: /douyin/device/new/version/2.7.0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x036E0D10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\video_download_run.py", line 13, in <module>
    douyin_crawl = DouyinCrawl()
  File "..\www_douyin_com\spiders\douyin_crawl.py", line 69, in __init__
    self.common_params = common_params()
  File "..\www_douyin_com\common\utils.py", line 74, in common_params
    device_info = getDevice()
  File "..\www_douyin_com\common\utils.py", line 41, in getDevice
    resp = requests.get(API + "/douyin/device/new/version/2.7.0").json()
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\quran\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.appsign.vip', port=2688): Max retries exceeded with url: /douyin/device/new/version/2.7.0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x036E0D10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

运行中偶现Error,getSign sign = resp['data']

Traceback (most recent call last):
File "video_download_run.py", line 32, in
douyin_crawl.grab_user_media(sys.argv[-1], "USER_POST")
File "../www_douyin_com/spiders/douyin_crawl.py", line 127, in grab_user_media
hasmore, max_cursor = self.grab_video(user_id, action, content)
File "../www_douyin_com/spiders/douyin_crawl.py", line 166, in grab_video
self.download_user_video(aweme_id, **download_item)
File "../www_douyin_com/spiders/douyin_crawl.py", line 233, in download_user_video
video_content = self.download_video(aweme_id)
File "../www_douyin_com/spiders/douyin_crawl.py", line 277, in download_video
sign = getSign(self.__get_token(), query_params)
File "../www_douyin_com/common/utils.py", line 62, in getSign
sign = resp['data']
KeyError: 'data'

抖音爬虫无法使用

python3 douyin_crawl.py
Traceback (most recent call last):
  File "douyin_crawl.py", line 342, in <module>
    douyin.grab_comment_main(aweme_id, 0)
  File "douyin_crawl.py", line 157, in grab_comment_main
    has_more = self.__grab_comment(aweme_id, upvote_bound)
  File "douyin_crawl.py", line 216, in __grab_comment
    hasmore = int(comment_content.get("hasmore"))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

确认已经修改过Token和视频ID,查了一下很多项目都停止于今年五月,这个项目还在运作吗?

缺少token

Traceback (most recent call last):
File "video_download_run.py", line 13, in
douyin_crawl = DouyinCrawl()
TypeError: init() missing 1 required positional argument: 'token'

getDevice error

resp = requests.get(API + "/douyin/device/new/version/2.7.0").json()
resp返回的信息是 {'message': 'Internal Server Error'},是不是你那边断网了?

视频清晰度问题

你好 这个爬虫爬下来的视频好像清晰度有损 一般的都是1280X720的 但是我下载下来的很多视频都是比这个低。。。

File "video_download_run.py", line 11, in <module> from www_douyin_com.spiders.douyin_crawl import DouyinCrawl File "..\www_douyin_com\spiders\douyin_crawl.py", line 4, in <module> from backports import csv ImportError: cannot import name 'csv' from 'backports'

File "video_download_run.py", line 11, in
from www_douyin_com.spiders.douyin_crawl import DouyinCrawl
File "..\www_douyin_com\spiders\douyin_crawl.py", line 4, in
from backports import csv
ImportError: cannot import name 'csv' from 'backports'

使用命令报错No module named 'backports'

python video_download_run.py -m -upost 58065297584
Traceback (most recent call last):
  File "video_download_run.py", line 11, in <module>
    from www_douyin_com.spiders.douyin_crawl import DouyinCrawl
  File "..\www_douyin_com\spiders\douyin_crawl.py", line 4, in <module>
    from backports import csv
ModuleNotFoundError: No module named 'backports'

token question

您好大神,好多次在获取token的时候程序会断开,这个怎么破

缺少token

init() missing 1 required positional argument: 'token'

?

张哥,又爬不了了

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.