cuanboy / scrapyproject Goto Github PK
View Code? Open in Web Editor NEW开始Scrapy实战如:存数据库、下载文件、爬京东、淘宝、Anti-Anti-Spider……
Home Page: http://www.scrapyd.cn
开始Scrapy实战如:存数据库、下载文件、爬京东、淘宝、Anti-Anti-Spider……
Home Page: http://www.scrapyd.cn
renfer出现的301如何解决
一直提示WARNING: File (code: 403): Error downloading file from *** referred in ***
WARNING: Dropped: Item contains no images,好像没有一张下载成功过,win10+python3.6+scrapy1.5
原网站反爬虫机制做了更改. 代码也需要相应更改(到2019年9月4日为止,更改两个地方即可绕过反爬机制)
①将AoiSolaSpider.py中的allowed_domains = ["www.mm131.com"]变为allowed_domains = ["www.mm131.com", "www.mm131.net"] (这里解决的是content被过滤的问题)
②将middlewares.py下AoisolasSpiderMiddleware类中process_request函数的内容整个换成: request.headers['referer'] = "http://www.mm131.com/?zzaqkey=4087969942"
(绕过防盗链)
from scrapy.pipelines.images import ImagesPipeline
from scrapy import Request
from ImageSpider.settings import IMAGES_STORE as images_store
import os
class ImagespiderPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
# 循环每一张图片地址下载,若传过来的不是集合则无需循环直接yield
for image_url in item['imgurl']:
yield Request(image_url)
# def file_path(self, request, response=None, info=None):
# # 重命名,若不重写这函数,图片名为哈希,就是一串乱七八糟的名字
# image_guid = request.url.split('/')[-1] # 提取url前面名称作为图片名。
# return image_guid
# def item_completed(self, results, item, info):
# #重命名文件,并把默认路径D:\ImageSpider\full\*图片
# #修改为D:\ImageSpider\*.jpg,提取item['imgurl']中url前面名称作为图片名
# #功能上类似file_path
# image_path = [x["path"] for ok, x in results if ok]
# for i in range(len(image_path)):
# os.rename(images_store+'/'+image_path[i],images_store+'/'+item['imgurl'][i].split('/')[-1])
代码完全复制,但是总会出现
Traceback (most recent call last):
File "D:/zart/Aoisolas/Aoisolas/spiders/AoiSolaSpider.py", line 13, in
from AoiSolas.items import AoisolasItem
ModuleNotFoundError: No module named 'AoiSolas'
这样的错误,爬虫跑不起来
python3.7版本
前面几个项目都试了一下都可以,就这个出现这个错误,水平有限,找不出原因,求大佬指教
display error:
from scrapy.pipelines.images import ImagesPipeline
File "D:\Anaconda3\envs\my_env\lib\site-packages\scrapy\pipelines\images.py", line 15, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'
pip install pillow
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.