Code Monkey home page Code Monkey logo

ballyregan's Introduction

Ballyregan

Get working free proxies fast.

Package version Supported Python versions License: Apache 2.0 Coverage


Ballyregan is a super fast proxy fetcher.
It provides a python package and an easy-to-use CLI to help you fetch Free Tested Proxies fast, and keep your privacy.


Key features:


Install

All you need to do is install the package from pypi, and it will automatically install the CLI for you.

pip install ballyregan

Usage

๐Ÿ“ฆ Package

Create a fetcher instance

from ballyregan import ProxyFetcher

# Setting the debug mode to True, defaults to False
fetcher = ProxyFetcher(debug=True)

Get one proxy

proxy = fetcher.get_one()
print(proxy)

Get multiple proxies

proxies = fetcher.get(limit=4)
print(proxies)

Get proxies by filters

from ballyregan.models import Protocols, Anonymities

proxies = fetcher.get(
  limit=4,
  protocols=[Protocols.HTTPS, Protocols.SOCKS5],
  anonymities=[Anonymities.ELITE]
)
print(proxies)

๐Ÿ’ป CLI

Need some help?

ballyregan get --help

Get one proxy

ballyregan get

Get all proxies

ballyregan get --all

Use debug mode

ballyregan --debug get [OPTIONS]

Format output to json

ballyregan get -o json

Get proxies by limit

ballyregan get -l 4

Get proxies by filters

ballyregan get -l 4 -p https -p socks5 -a elite

How does it work?

When you use the ProxyFetcher to fetch a proxy, it performs several steps:

  1. Gather all the available proxies from a list of built-in providers (each provider gathers its own and returns it to the fetcher).
  1. Filter all the gathered proxies by the given protocols and anonymities (if exist).
  2. Validate the filtered proxies and return them.

Note
You can write and append your own custom providers and pass them to the ProxyFetcher class as an attribute.
Every custom proxy provider must implement the IProxyProvider base interface.

Behind the scenes

Fetching a proxy is an IO bound operation that depends on the network. A common approach for this problem is performing your network requests async.
After digging a bit, and testing Threads, Greenlets, and async operations, we decided to go the async way.
To perform async HTTP requests, ballyregan uses aiohttp and asyncio, as "asyncio is often a perfect fit for IO-bound and high-level structured network code." (from asyncio docs).
By using the power of async HTTP requests, ballyregan can validate thousands of proxies really fast.
it to the ProxyFetcher class as an attribute.

Every custom proxy provider must implement the IProxyProvider base interface.


๐Ÿ“ License

Copyright ยฉ 2022 Idan Daniel.
This project is Apache License Version 2.0 licensed.

ballyregan's People

Contributors

idandaniel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ballyregan's Issues

Hello

Hello
please show an example of using a proxy validator

ProxyScrape proxy provider

You can add ProxyScrapeprovider.

proxyscrape.py

from dataclasses import dataclass
from typing import List

from ballyregan import Proxy
from ballyregan.models import Protocols
from ballyregan.providers import IProxyProvider
from ballyregan.core.exceptions import ProxyGatherException

@dataclass
class ProxyscrapeProvider(IProxyProvider):
    # https://docs.proxyscrape.com/
    url: str = 'https://api.proxyscrape.com/v2/?request=getproxies&country=all&ssl=all'

    def _get_raw_proxies(self) -> List[str]:
        proxies = []

        for anonymity in ['elite', 'transparent', 'anonymous']:
            for protocol in Protocols.values():
                try:
                    proxies_response = self._session.get(f'{self.url}', params={'protocol': protocol, 'anonymity': anonymity})
                    if not proxies_response.ok:
                        raise ProxyGatherException
                except (IndexError, ValueError, ConnectionError) as e:
                    raise ProxyGatherException from e
                else:
                    for proxy in proxies_response.text.splitlines():
                        ip, port = proxy.split(':')
                        proxies.append({
                            'protocol': protocol,
                            'ip': ip,
                            'port': port,
                            'anonymity': anonymity
                        })

        return proxies

    @staticmethod
    def raw_proxy_to_object(raw_proxy: dict) -> Proxy:
        return Proxy(
            protocol=raw_proxy['protocol'],
            ip=raw_proxy['ip'],
            port=raw_proxy['port'],
            anonymity=raw_proxy['anonymity']
        )

test_proxyscrapeprovider.py

def test_proxyscrapeprovider():
    fetcher = ProxyFetcher(debug=False)
    fetcher._proxy_providers = [ProxyscrapeProvider()]

    proxies = fetcher._get_all_proxies_from_providers()
    assert len(proxies) > 0

    valid_proxies = fetcher._proxy_validator.filter_valid_proxies(proxies)
    assert len(valid_proxies) > 0

ballyregan.validator:is_proxy_valid

Describe the bug
I get many ERROR messages.

To Reproduce
Steps to reproduce the behavior:

  1. python -m venv TEST
  2. source TEST/bin/activate
  3. pip install git+https://github.com/idandaniel/ballyregan.git
  4. ballyregan get

Expected behavior
A list with proxy servers.

Output

2023-03-22 21:17:36.932 | ERROR    | ballyregan.validator:is_proxy_valid:61 - Unknown exception has occured while validating proxy: 'protocols.http' is not a valid ProxyType
2023-03-22 21:17:36.932 | ERROR    | ballyregan.validator:is_proxy_valid:61 - Unknown exception has occured while validating proxy: 'protocols.socks5' is not a valid ProxyType
2023-03-22 21:17:36.932 | ERROR    | ballyregan.validator:is_proxy_valid:61 - Unknown exception has occured while validating proxy: 'protocols.http' is not a valid ProxyType

Desktop:

  • OS: Arch Linux
  • Python-Version: 3.11.2 (installed for the user with pyenv)

Unable to work inside async functions

Describe the bug
Can't run properly in async functions.

To Reproduce
Steps to reproduce the behavior:

  1. Create an async function which would try to fetch proxies via ballyregan.
  2. Call the async function.
  3. See error.

Expected behavior
Normal operation, as with synchronous calling.

Screenshots
Screenshot_2023-09-28-19-28-11-185_com termux

Desktop (please complete the following information):

  • OS: N/A
  • Browser N/A
  • Version N/A

Smartphone (please complete the following information):

  • Device: Poco F3
  • OS: Android 12
  • Browser N/A
  • Version N/A

Additional context
The library seems to work fine with simple synchronous calls, but it's always spitting out an error whenever I'm trying to use it in async functions.

Here's (one of the) the errors I get:

Traceback (most recent call last):                                                                File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/aiohttp/web_protocol.py", line 433, in _handle_request                                                                         resp = await request_handler(request)                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                         File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/aiohttp/web_app.py", line 504, in _handle                                                                                      resp = await handler(request)                                                                          ^^^^^^^^^^^^^^^^^^^^^^                                                                 File "/storage/emulated/0/tmp/apihost-tts-proxy.py", line 37, in handle_request                   fetch_proxies()                                                                               File "/storage/emulated/0/tmp/apihost-tts-proxy.py", line 18, in fetch_proxies                    raw_proxies = fetcher.get(                                                                                    ^^^^^^^^^^^^                                                                    File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/ballyregan/fetcher.py", line 134, in get                                                                                       proxies = self._gather(                                                                                   ^^^^^^^^^^^^^                                                                       File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/ballyregan/fetcher.py", line 95, in _gather                                                                                    valid_proxies = self._proxy_validator.filter_valid_proxies(                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                   File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/ballyregan/validator.py", line 129, in filter_valid_proxies                                                                    return self.loop.run_until_complete(                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                          File "/data/data/com.termux/files/usr/lib/python3.11/asyncio/base_events.py", line 629, in run_until_complete                                                                                     self._check_running()                                                                         File "/data/data/com.termux/files/usr/lib/python3.11/asyncio/base_events.py", line 590, in _check_running                                                                                         raise RuntimeError(                                                                         RuntimeError: Cannot run the event loop while another loop is running                           /data/data/com.termux/files/usr/lib/python3.11/asyncio/base_events.py:1907: RuntimeWarning: coroutine 'ProxyValidator._async_filter_valid_proxies' was never awaited                              handle = self._ready.popleft()                                                                RuntimeWarning: Enable tracemalloc to get the object allocation traceback

I even tried using it as normal synchronous, blocking function, but it still throws an error.

Here's my gpt-4's code (lol) I was trying to get working with it:

import random
from aiohttp import web, ClientSession
from ballyregan import ProxyFetcher
from ballyregan.models import Protocols

# A list to hold our proxies and number of uses
proxies = []

MAX_USES = 10  # Maximum number of uses per proxy
REFETCH_NO = 20  # Number of proxies you want to fetch initially and when the list is exhausted

# Prepare the proxy fetcher
fetcher = ProxyFetcher(debug=True)

# Fetch and format proxies
def fetch_proxies():
    proxies.clear()
    raw_proxies = fetcher.get(
        limit=REFETCH_NO,
        protocols=[Protocols.HTTPS, Protocols.SOCKS4, Protocols.SOCKS5]
    )
    for p in raw_proxies:
        proxies.append({
            "proxy": f'{p.ip}:{p.port}',
            "type": {'https': 'HTTPS', 'socks4': 'SOCKS4', 'socks5': 'SOCKS5'}.get(p.protocol.lower()),
            "count": 0,
        })
    print(f'Fetched {len(proxies)} proxies. Proxy list: {proxies}')


async def handle_request(request):
    # Prepare headers for forward request
    headers = {k: v for k, v in request.headers.items()}
    url = 'https://hidden.due/to/privacy/reasons'
    while True:
        if not proxies:  # If no proxies left
            fetch_proxies()

        cur_proxy = random.choice(proxies)
        count = cur_proxy["count"]

        if count >= MAX_USES: # If proxy use count exceed the maximum use limit
            proxies.remove(cur_proxy) # If so, remove the proxy from the list
        else:
            try:
                cur_proxy["count"] += 1
                async with ClientSession() as session:
                    # Correctly format the proxy URL depending on the type
                    proxy_url = {'HTTP': 'http://', 'HTTPS': 'https://', 'SOCKS4': 'socks4://', 'SOCKS5': 'socks5://'}.get(cur_proxy['type'], 'http://') + cur_proxy["proxy"]
                    print(f'Proxy url is {proxy_url}')
                    async with session.request(
                        request.method,
                        url,
                        headers=headers,
                        allow_redirects=False,
                        data=await request.read(),
                        proxy=proxy_url
                    ) as resp:
                        resp_headers = {k: v for k, v in resp.headers.items()}
                        body = await resp.text()
                        break
            except:
                proxies.remove(cur_proxy)  # If the proxy is not working, remove it from the list

    return web.Response(
        headers=resp_headers,
        status=resp.status,
        body=body
    )

app = web.Application()
app.router.add_route('*', '/{tail:.*}', handle_request)
fetch_proxies()  # Fetch proxies initially

try:
    web.run_app(app, port=1337)
except KeyboardInterrupt:
    print("KeyboardInterrupt received, exiting...")

As expected, it works for me at the start, but throws an error every time the script tries to fetch new proxies afterwards.

ballyregan.core.exceptions.NoInternetConnection

Linux pop7550 6.0.6-76060006-generic #202210290932166906205022.04~d94609a SMP PREEMPT_DYNAMIC Mon N x86_64 x86_64 x86_64 GNU/Linux

Python 3.10.6

julien@pop7550:~$ ballyregan get --all
Traceback (most recent call last):
File "/home/julien/.local/bin/ballyregan", line 5, in
from cli.app import run
File "/home/julien/.local/lib/python3.10/site-packages/cli/app.py", line 14, in
fetcher = ProxyFetcher()
File "", line 7, in init
File "/home/julien/.local/lib/python3.10/site-packages/ballyregan/fetcher.py", line 37, in post_init
raise NoInternetConnection
ballyregan.core.exceptions.NoInternetConnection

Not Working, ballyregan.core.exceptions.NoProxiesFound: Could not find any proxies

Describe the bug
I cant get any proxies, i think something is wrong

To Reproduce
Steps to reproduce the behavior:
just doing this:
fetcher = ProxyFetcher(debug=True)
proxiese = fetcher.get_one()
print(proxiese)

Expected behavior
get a proxy

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

RuntimeError

RuntimeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 proxy = fetcher.get_one()
2 print(proxy)

4 frames
/usr/lib/python3.10/asyncio/base_events.py in _check_running(self)
582 def _check_running(self):
583 if self.is_running():
--> 584 raise RuntimeError('This event loop is already running')
585 if events._get_running_loop() is not None:
586 raise RuntimeError(

RuntimeError: This event loop is already running

How to append own custom providers?

I would like to develop and test other proxy providers.

You mention:
You can write and append your own custom providers and pass it to the ProxyFetcher class as attribute.
Every custom proxy provider must implement the IProxyProvider base interface.

@dataclass
class TestProvider(IProxyProvider):
   def _get_raw_proxies(self,) -> List[Any]:
      # Do some stuff

   def raw_proxy_to_object(raw_proxy: Any) -> Proxy:
      # Do some stuff

I don't understand, how can I pass TestProvider to the ProxyFetcher class as attribute.
fetcher = ProxyFetcher( ??? TestProvider ??? )

PATH issue with pytest 'ImportError: No module named 'moduleone'

With pytest from the project root directory, the imports of local modules fail with E ModuleNotFoundError: No module named 'moduleone'

Solution:
Recently, pytest (pytest>=7) has added a new core plugin that supports sys.path modifications via the pythonpath configuration value. The solution is thus much simpler now and doesn't require any workarounds anymore with this in pyproject.toml

[tool.pytest.ini_options]
pythonpath = [
  "."
]

Get proxy response time

It would be interesting to have the response time (in milliseconds) for each proxy.
Thus, we could obtain proxies by a filter on the maximum response time.
For example:

proxies = fetcher.get(
  timeout=3000
)
print(proxies)

Readme Typo

bellyregan insted of ballyregan under Get proxies by limit and Get proxies by filters on readme

Wrong detection of HTTP proxies as HTTPS proxies?

Describe the bug
With some HTTPS proxies, got this error:
Proxy: https://208.82.61.66:3128 is invalid. Error:HTTPSConnectionPool(host='api.ipify.org', port=443): Max retries exceeded with url: / (Caused by ProxyError('Your proxy appears to only use HTTP and not HTTPS, try changing your proxy URL to be HTTP. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#https-proxy-error-http-proxy', SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1129)'))))

It seems that there is a wrong detection of HTTP proxies as HTTPS proxies

To Reproduce

    fetcher = ProxyFetcher(debug=False)
    proxies = fetcher.get(protocols=[Protocols.HTTPS])

    with requests.session() as reqs:
        for idx in range(len(proxies)):
            reqs.proxies.update({
                "http": str(proxies[idx]),
                "https": str(proxies[idx]),
            })
            try:
                response = reqs.get('https://api.ipify.org', timeout=15, verify=False)
                #response = reqs.get('http://httpheader.net/azenv.php', timeout=15, verify=False)
                assert response.status_code == 200
                print(f"INFO Proxy: {proxies[idx]} is valid")
            except Exception as e:
                print(f"WARNING Proxy: {proxies[idx]} is invalid. Error:{str(e)}")

Expected behavior
With this proxy https://168.119.175.224:3128, changing to http://168.119.175.224:3128. It's OK.
Rรฉf: Urrlib3 - Your proxy appears to only use HTTP and not HTTPS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.