Code Monkey home page Code Monkey logo

bpproxypool's Introduction

BPProxyPool

____  ____  ____                      ____             _
| __ )|  _\|  _ \ _ __ _____  ___   _|  _ \ ___   ___ | |
|  _ \| |_) | |_) | '__/ _ \ \/ / | | | |_) / _ \ / _ \| |
| |_) |  __/|  __/| | | (_) >  <| |_| |  __/ (_) | (_) | |
|____/|_|   |_|   |_|  \___/_/\_\\__, |_|   \___/ \___/|_|
                                 |___/

白嫖 IP 池,从互联网中免费的代理网站中爬取并验证代理。这是一个协程实现版本。

Get start

docker-compose

mkdir bp_proxy_pool && cd bp_proxy_pool
wget -N https://raw.githubusercontent.com/jungheil/bpproxypool/main/docker-compose.yml && docker-compose up

source

git clone https://github.com/jungheil/bpproxypool.git bp_proxy_pool
cd bp_proxy_pool
pip3 install -r requirements.txt

自行运行 redis,并修改 config.py

运行:

python3 bpproxypool.py launch

Config

可以通过添加环境变量(参数名全大写)或者修改文件 config.py配置

  • 数据库配置
    name description remark
    db_conn 数据库地址
    table_name 数据库表名
  • API 服务配置
    name description remark
    host API 监听地址
    port API 监听端口
  • 爬虫设置
    name description remark
    fetchers 代理获取源 fetcher/fetcher.py
    fetch_proxy 爬虫代理 空则不使用
    val_timeout 验证超时时间
    recheck_failed_count 失败容许次数
    min_pool_size 代理池最小数量
    get_ip_info 是否获取代理地区
    val_sites 用于验证的网站 环境变量方式 e. g.httpbin=httpbin.org,bing=bing.com
    fetch_protocol 爬取协议 环境变量方式 e. g.http,https
  • 调度器配置
    name description remark
    timezone 时区
    run_fetch_interval 运行爬虫间隔时间 当代理池中代理数量大于 min_pool_size不会运行
    max_fetch_interval 强制运行爬虫间隔时间
    run_recheck_interval 验证代理池间隔时间
    fetch_semaphore 爬虫并行数量
    recheck_semaphore 验证代理池并行数量

API

api method description params example
/ GET api 介绍 None
/get GET 随机获取一个代理 protocol ?protocol=http,https
/pop GET 获取并删除一个代理 protocol
/all GET 获取所有代理 protocol
/count GET 查看代理数量 None
/delete GET 删除代理 addr addr=192.168.16.1:6666

Custom Fetcher and Validator

  • Fetcher

    fetcher/fetcher

  • Validator

    helper/validator.py

Acknowledgment

bpproxypool's People

Contributors

jungheil avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.