Code Monkey home page Code Monkey logo

scenerycrawler's Introduction

总体

  1. 完成对两层字体加密的破解。(每日零点都会更新加密词典,所以每天都要重新获取解密词典)
  2. 完成对景点信息的采集。(店铺的同样适用,修改下起始URL即可)
  3. 完成对景点/商铺内的所有用户评论数据的采集。
  4. 完成对所有用户的打卡记录的采集。

发现

21:30 会更新一次词典(目前是) 验证码的出现似乎和IP有关 无痕模式,登录账号,更换IP,输入验证码,更新Cookie即可。

问题

  • 采集到一定数量,账户就会跳验证码
    • 方案:出现验证码:停止爬虫,刷网页手动验证,再启动爬虫。
  • 403,特定页面不能访问
    • 方案:源于Cookie失效。换账号模拟登录,更换Cookie和User-Agent,再启动爬虫。

Sceneries

北京市的已经采集完毕

Reviews

约144万条评论,对应百万个用户。 直接访问 /reivew_all页面,极易触发验证码。先访问景点/店铺详情页,然后再访问/review_all好一些。

Checkin

调用API直接采集,等Reviews采集完开始。

scenerycrawler's People

Contributors

sugarsugarzz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.