Code Monkey home page Code Monkey logo

douban_movies's Introduction

豆瓣电影榜单

  1. 用python写的小程序,抓取豆瓣电影列表页的电影名称、评价人数、排名和链接,生成HTML页面,按评价人数从达到小排序,共3000;
  2. 程序仅理论上可行,在抓取电影时会发生请求过多,被豆瓣屏蔽的现象……
  3. 题材来源:http://crossin.me/forum.php?mod=viewthread&tid=736&extra=page%3D1。

更新记录

2013-10-28

  1. 增加了步骤计时监控;
  2. 每个页面完成后暂停2s再继续,躲避豆瓣的屏蔽;
  3. 在抓取每部电影时,增加了去重;
  4. 去除超过了预期的电影数目;
  5. 修改了评论人数的抓取规则,初步抓取时包涵“还未上映”的情况;
  6. 取消了超过3000部停止抓取的设置,改为先抓取全部标签,再去掉超过3000的部分;

更新总结:试了两次,2s/页的暂停已经可以让豆瓣开心了,不会屏蔽抓取请求……但是,出现了新的问题。每次抓取到差不多2000部电影,程序就会死机……目前猜测是电 影数量多了,所以每一次抓取后的“去重”计算量过大。 另外,真心觉得每次问题的解决方法都要比开始想象的复杂。要实现一个想法,往往会碰到之前从未考虑过的细节问题。

douban_movies's People

Contributors

jxgx072037 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.