Code Monkey home page Code Monkey logo

1024img's Introduction

1024img

闲着没事写的爬虫

主要爬两个页面,分别是[达盖尔的旗帜]和[新时代的我们]。

360浏览器有个功能,下载本页面所有图片,而且图片也能设置大小等规则,如果不是大批量的下载,这个够用了。使用我写的这个反而更麻烦,嗯嗯。chrome插件我没找。

项目进度

  • [ x ] 基本功能完成
  • [ x ] 达盖尔的旗帜 测试
  • 新时代的我们 测试
  • 数据库的读写

你们应该都知道这个爬虫是经过代理的,没有代理,你们就看代码好了😝

数据

返回的数据结构:可以直接看Interfaces.ts

{
    "postName": " [原创][[cl分享团出品]xxx[19P]",
    "postTime": "2018-03-23 21:04",
    "postUrl": "http://t66y.com/htm_data/xxx.html",
    "highlight": true,
    "done": true,
    "images": [
        {
            "url": "http://s6tu.com/images/2018/03/11/xxx.jpg",
            "index": 1,
            "id": "96w6q01I1",
            "downloaded": false,
            "retryTime": 0
        }
    ]
}

其他

主要是想试试node的爬虫,和async&await的使用。

数据库不是必要的,但是也写了,用的mongodb。这个比较有趣,因为官方drive有ES6的教程,使用的是co,也就是generator,现在有了原生的async&await(其实还是generator)写起来更方便了。

现在(node8)也自带了promisify又省了一个库,爽。

大家选择node的原因无非是:

  • 会javascript
  • 基于事件的异步执行
  • 轻量

但是这里主要的目的并不是爬虫,所以异步并发的反而不是优点,所以全是同步的操作开着坦克不能压坏路的感觉。其实这种场景用python,一个接一个的爬,什么也不用考虑,真的比node爽多了。

Happy watching~~~

1024img's People

Contributors

maicss avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.