Code Monkey home page Code Monkey logo

scrapyproject's People

Contributors

cuanboy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapyproject's Issues

下载失败

一直提示WARNING: File (code: 403): Error downloading file from *** referred in ***
WARNING: Dropped: Item contains no images,好像没有一张下载成功过,win10+python3.6+scrapy1.5

[2019年9月4日]原网站反爬措施更新了, 代码更改如下

原网站反爬虫机制做了更改. 代码也需要相应更改(到2019年9月4日为止,更改两个地方即可绕过反爬机制)
①将AoiSolaSpider.py中的allowed_domains = ["www.mm131.com"]变为allowed_domains = ["www.mm131.com", "www.mm131.net"] (这里解决的是content被过滤的问题)
②将middlewares.py下AoisolasSpiderMiddleware类中process_request函数的内容整个换成: request.headers['referer'] = "http://www.mm131.com/?zzaqkey=4087969942"
(绕过防盗链)

子类重写def item_completed(self, results, item, info),可以实现文件重命名功能

from scrapy.pipelines.images import ImagesPipeline
from scrapy import Request
from ImageSpider.settings import IMAGES_STORE as images_store
import os

class ImagespiderPipeline(ImagesPipeline):

def get_media_requests(self, item, info):
    # 循环每一张图片地址下载,若传过来的不是集合则无需循环直接yield
    for image_url in item['imgurl']:
        yield Request(image_url)

# def file_path(self, request, response=None, info=None):
#     # 重命名,若不重写这函数,图片名为哈希,就是一串乱七八糟的名字
#     image_guid = request.url.split('/')[-1]  # 提取url前面名称作为图片名。
#     return image_guid

# def item_completed(self, results, item, info):
# 	#重命名文件,并把默认路径D:\ImageSpider\full\*图片 
# 	#修改为D:\ImageSpider\*.jpg,提取item['imgurl']中url前面名称作为图片名
# 	#功能上类似file_path
# 	image_path = [x["path"] for ok, x in results if ok]
# 	for i in range(len(image_path)):
# 		os.rename(images_store+'/'+image_path[i],images_store+'/'+item['imgurl'][i].split('/')[-1])

ModuleNotFoundError: No module named 'AoiSolas'错误

代码完全复制,但是总会出现
Traceback (most recent call last):
File "D:/zart/Aoisolas/Aoisolas/spiders/AoiSolaSpider.py", line 13, in
from AoiSolas.items import AoisolasItem
ModuleNotFoundError: No module named 'AoiSolas'
这样的错误,爬虫跑不起来
python3.7版本
前面几个项目都试了一下都可以,就这个出现这个错误,水平有限,找不出原因,求大佬指教

No module named 'PIL'

display error:
from scrapy.pipelines.images import ImagesPipeline
File "D:\Anaconda3\envs\my_env\lib\site-packages\scrapy\pipelines\images.py", line 15, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'

pip install pillow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.