Code Monkey home page Code Monkey logo

doubanspider's Introduction

DoubanSpider

Scrape douban movie top250 and book top250

movie top250 https://movie.douban.com/top250

book top250 https://book.douban.com/top250?icn=index-book250-all

For movies:

Download the pages, extract data like movie name,photos, description, and most important , reviews, then save them as follows:

./No.1--肖申克的救赎 The Shawshank Redemption (1994)

    肖申克的救赎简介.txt  from https://movie.douban.com/subject/1292052/
    内容:
        导演:
                弗兰克·德拉邦特
        编剧: 
                弗兰克·德拉邦特
                斯蒂芬·金
        主演:
                蒂姆·罗宾斯
                摩根·弗里曼
                鲍勃·冈顿
                ......
        剧情简介:
             20世纪40年代末,小有成就的青年银行家安迪(蒂姆·罗宾斯 Tim Robbins 饰)因涉嫌杀害妻子及她的情人而锒铛入狱。
                    .......
    肖申克的救赎获奖情况.txt  (https://movie.douban.com/subject/1292052/awards/):
             第67届奥斯卡金像奖(link) 最佳影片(提名)   尼基·马文  (link)
             .......
    豆瓣评分(9.6).txt     from https://movie.douban.com/subject/1292052/
    ./五星影评    (from  https://movie.douban.com/subject/1292052/reviews)
    *****命名都是非常有规律的,可以通过Rule规则匹配****
    十年 肖申克的救赎.txt
    《肖申克的救赎》与斯德哥尔摩综合症--你我都是患者.txt
      ...
    ./一星影评

    ./imgs   (from  https://movie.douban.com/subject/1292052/photos?type=S&start=40&sortby=vote&size=a&subtype=a)

For books:

Download the pages, extract data like book names, photos, description, reviews and notes. then save them as follows:

 ./1--小王子(9.0)-[法]圣埃克苏佩里

    小王子简介.txt
    [法]圣埃克苏佩里简介.txt
    ./reviews
        书评1.txt
        书评2.txt
        ...
    ./annotations
        笔记1.txt
        笔记2.txt
        ...

Examples for saved files:

1)Top50 movie

image

2)The Shawshank Redemption/肖申克的救赎

image

3)Photoes for The Shawshank Redemption/肖申克的救赎图片(剧照、海报、壁纸)

image

  1. Douban book/豆瓣读书-1988我想和这个世界谈谈

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.