Code Monkey home page Code Monkey logo

spiderutilpackage's Introduction

SpiderUtilPackage

不断增加更新中...

一些常用的方便爬虫工作的工具包

Author Zok
Email [email protected]
BLOG www.zhangkunzhi.com

工具表


directory tree


.
├── Proxy                               //      代理工具包 
│   ├── ZhiMaProxyPool.py               // 芝麻代理ip清洗工具
│   ├── ZhiMaProxyUseDemo.py            // 芝麻代理池客户端使用Demo
│   ├── XDLProxyPool.py                 // 讯代理ip清洗工具
│   └── XDLProxyUseDemo.py              // 讯代理池客户端使用Demo
├── Register                            //      注册类工具
│   └── MessageCode.py                  // 异步验证短信接收器
├── Cookies                             //      Cookies获取
│   └── MeiTuan                         // 异步并发批量获取美团登陆Cookies
├── DataMigration                       //      跨数据库迁移
│   ├── db                              // 基础数据库封装包
│   └── migration                       // 迁移器
├── Decode                              //      可拓展式解密器
├── Jsencrypt                           //      自动生成encrypt加密
└── README.md


可拓展式解密器

博客传送门

方便测试可连续转换重制的编码转换器,可灵活拓展解码规则


代理池清洗工具

博客传送门

爬虫经常会用到代理ip,其中有很多收费ip,但是如何在scrapy中,高效使用这些ip是一个比较麻烦的事情,在这里基于芝麻代理ip做一个代理池监控器,首先整理我们的需求再对其代理质量进行管理,从而保持高效IP使用率

key位置


验证码短信接收器

基于短信接收平台的异步短信接收器,最大并发上限 20,Python3.5+。 启动后会根据设置的异步并发数进行获取手机号码并监听短信接收情况(60秒) 超过60秒后会将未收到短信的手机号拉入黑名单,并是释放。

若要配置具体某个网站使用,还需开发对应的账号注册器,配合调用本短信接收器来达到自动注册账号的功能


cookies获取Demo

基于Pyppeteer 并发获取站点cookies

  • 美团登陆cookies

跨数据库迁移器

工作中经常有这种需求

将采集好的mongodb数据转存到mysql中,或者是redis数据转到mongodb,于是打算封装一个组件便于以后调用

  • mysql 数据迁移 mongodb

spiderutilpackage's People

Contributors

wkunzhi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.