Code Monkey home page Code Monkey logo

agespider's Introduction

AGESpider

一个基于scrapy框架的爬虫脚本,对agefans.tv进行动漫的爬取,可将json数据存入数据库。

Usage

0.执行pip install -r requirements.txt 安装依赖。
1.可修改spider主文件中的开始年份与结束年份,默认爬取从2000-2020年间的动漫。
2.执行scrapy crawl AGE -o anime.json 获取动漫数据
3.根据需要,可修改settings中数据库的各项信息,执行sorting_data.py 将数据存入数据库

Log

2020.07.05
1.网站更改了百度云链接获取方式,需要添加Referer头。
2.似乎改成了使用js跳转,并且对js进行了加密(sojson.v5加密),在打开控制台同时取消了js跳转,具体跳转逻辑尚不明确。
2020.09.14
1.修复了由于网站404页面更改而导致爬虫终止的bug。
2.减少了大量无效的请求
3.取消了代理,使用更加简单。
4.取消了获取百度云链接的功能,如需此功能,可以使用selenium从origin_url js跳转至download_url。可能会根据业余时间完成相应的转换脚本。
2020.10.12
1.从app中抓包的后端地址可以直接获取百度网盘地址。其中aid为动漫编号。
https://api.agefans.app/v2/detail/{aid}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.