Code Monkey home page Code Monkey logo

jd_spider's Introduction

jd_spider

用scrapy框架写的京东爬虫,可以抓取京东商品信息和评论

1、目的:

  • 1、爬取京东的商品信息(以电子烟为例)
  • 2、爬取商品的评论信息
  • #2、抓取到的数据属性如下所示
    ##商品数据
    image
    ##评论数据
    image
    #3、使用说明: ##(1)抓取商品信息和抓取评论分别写在了两个不同的爬虫里

    抓取商品信息代码在jd_home.py中,数据库在setting.py中修改ITEM_PIPELINES,使用MySQLPipeline

    抓取评论代码在jd_comment.py中,数据库在setting.py中修改ITEM_PIPELINES,使用CommentPipeline

    ##(2)setting.py文件

    默认开启了代理IP,因为IP的存活期的限制,要定期更新PROXIES中IP信息,可从网站:http://www.xicidaili.com/ 中找免费的代理IP

    如果不想使用代理IP,可以将DOWNLOADER_MIDDLEWARES代码注释掉

    数据库的配置:

  • setting.py中可以配置数据库的主机,端口,用户名,密码和数据库名信息
  • pipeline.py中在sql语句中配置你要存入的表的名称。
  • 数据库表结构:
  • jd_comment.sql:评论数据
  • jd_goods.sql:商品数据

  • 在使用本爬虫中,因为在抓取评论信息时需要用到goods.xls文件。因此需要先抓取商品信息,然后将商品信息的相关内容导出到goods.xls中(这里提供了一个goods.xls的格式供参考)

    goods.xls格式:第1列:商品ID,第2列:商品评论数;第3列:商品的commentVersion

    在一个工程中,抓取商品信息和抓取评论信息不能同时进行。


    更多爬虫的细节可以参考我的博客文章:

  • http://blog.csdn.net/xiaoquantouer/article/details/51840332
  • http://blog.csdn.net/xiaoquantouer/article/details/51841016

  • ##有问题欢迎留言

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.