cuanboy / scrapyproject Goto Github PK

View Code? Open in Web Editor NEW

416.0 416.0 234.0 983 KB

开始Scrapy实战如：存数据库、下载文件、爬京东、淘宝、Anti-Anti-Spider……

Home Page: http://www.scrapyd.cn

Python 100.00%

scrapyproject's People

Contributors

Stargazers

Watchers

Forkers

ghrhomeebook liangkoong 424138799 scrapyspider tong60su 81815658qq fengyingfb rainlixq liuhaonan2003 hanwangkun gaoxiaosai ra61hub xiaominglei001 haochuang mrmiaolei charm139 oumiga1314 pearlriverrunner hlmx123 marshalws qeq66 pythondjangogit qepwqlpf zhoodj 1024vinceli redleaves ilokin mjdong abbieharris kerrz justforheart pi314159126 jfanfung keven998 frsq shzym86 fangbo6699 joeyho728 yutiya ctest11 philipccc callmebinge ssl834 xmc2014 xiangxiaodong ssskming ljxok2001 jasonfoxtrot j6l lllllliulei twanfan zlb2016 hansel163 jinghunao zhibai-xx zyxceng miaohua1982 hello344868264 hackjsw akidongzi i65 champion-yang ysguoqiang yuliming5218 houchanglong mrfiveii qiantangjun had1128 shellwang hehan502 dw1997 mrbin96 wangzeling baixue1 kingking888 sillylawliet desirefire juphy hhy5277 nycchen yangziping superxuu xiangnanxiangbei mayun1987 feirenk smartisantt lihu2018 zhaojunchen bfd2018 github716 manderls yqzhang0326 qiu957919102 gangxuezhang actioncr kioco stephencurry33 cnoveler wangafsadfx crazyhb

scrapyproject's Issues

运行后，只得到一个meizi3的空文件，是怎么回事？大家来看看

after running the code, my ip was blocked by the site. congrats.

下载失败

一直提示WARNING: File (code: 403): Error downloading file from *** referred in ***
WARNING: Dropped: Item contains no images，好像没有一张下载成功过，win10+python3.6+scrapy1.5

原网站反爬虫机制做了更改. 代码也需要相应更改(到2019年9月4日为止,更改两个地方即可绕过反爬机制)
①将AoiSolaSpider.py中的allowed_domains = ["www.mm131.com"]变为allowed_domains = ["www.mm131.com", "www.mm131.net"] (这里解决的是content被过滤的问题)
②将middlewares.py下AoisolasSpiderMiddleware类中process_request函数的内容整个换成: request.headers['referer'] = "http://www.mm131.com/?zzaqkey=4087969942"
(绕过防盗链)

子类重写def item_completed(self, results, item, info),可以实现文件重命名功能

from scrapy.pipelines.images import ImagesPipeline
from scrapy import Request
from ImageSpider.settings import IMAGES_STORE as images_store
import os

class ImagespiderPipeline(ImagesPipeline):

def get_media_requests(self, item, info):
    # 循环每一张图片地址下载，若传过来的不是集合则无需循环直接yield
    for image_url in item['imgurl']:
        yield Request(image_url)

# def file_path(self, request, response=None, info=None):
#     # 重命名，若不重写这函数，图片名为哈希，就是一串乱七八糟的名字
#     image_guid = request.url.split('/')[-1]  # 提取url前面名称作为图片名。
#     return image_guid

# def item_completed(self, results, item, info):
# 	#重命名文件,并把默认路径D:\ImageSpider\full\*图片 
# 	#修改为D:\ImageSpider\*.jpg,提取item['imgurl']中url前面名称作为图片名
# 	#功能上类似file_path
# 	image_path = [x["path"] for ok, x in results if ok]
# 	for i in range(len(image_path)):
# 		os.rename(images_store+'/'+image_path[i],images_store+'/'+item['imgurl'][i].split('/')[-1])

ModuleNotFoundError: No module named 'AoiSolas'错误

代码完全复制，但是总会出现
Traceback (most recent call last):
File "D:/zart/Aoisolas/Aoisolas/spiders/AoiSolaSpider.py", line 13, in
from AoiSolas.items import AoisolasItem
ModuleNotFoundError: No module named 'AoiSolas'
这样的错误，爬虫跑不起来
python3.7版本
前面几个项目都试了一下都可以，就这个出现这个错误，水平有限，找不出原因，求大佬指教

cuanboy / scrapyproject Goto Github PK

scrapyproject's People

Contributors

Stargazers

Watchers

Forkers

scrapyproject's Issues

运行后，只得到一个meizi3的空文件，是怎么回事？大家来看看

after running the code, my ip was blocked by the site. congrats.

下载出了问题

下载失败

[2019年9月4日]原网站反爬措施更新了, 代码更改如下

子类重写def item_completed(self, results, item, info),可以实现文件重命名功能

ModuleNotFoundError: No module named 'AoiSolas'错误

每个链接下载下来都只有一张图片

管理器：不是内部或外部命令

反爬虫机制换了，referer需要更换为上一个的url

No module named 'PIL'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent