Code Monkey home page Code Monkey logo

dirtyproxy's Introduction

DirtyProxy

Nuget Url

A quick and easy proxy scraper!

That does not mean it lacks features ;)

Features

  • High-Performance Asynchronous Scraping
  • Highly Configurable
  • Comes with proxy checker
  • Comes with a default list of proxy sites
  • Lightweight (~6% CPU usage on a 4-core i7)
  • Filter unique proxies

Usage

Please note, enabling proxy checking (on by default) will take MUCH longer!

Using Default Parameters

var scraper = new ProxyScraper(ProxyScraper.DefaultList);
var proxies = await scraper.ScrapeAsync();

await File.WriteAllLinesAsync("validProxies.txt", proxies.ValidProxies.Select(x=>x.ToString()));
await File.WriteAllLinesAsync("validSources.txt", proxies.ValidSources.Select(x=>x.Trim()));
await File.WriteAllLinesAsync("allProxies.txt", proxies.Proxies.Select(x=>x.ToString()));

Using Custom Proxy Source List

var sources = new[]
{
    "https://source.proxy.list",
    "https://other.source.proxy.list"
};
// You can use your own list, or the list included by default!
var scraper = new ProxyScraper(sources);

...

Using Custom Proxy Checker

var scraper = new ProxyScraper(ProxyScraper.DefaultList, async proxy =>
{
    try
    {
        var wc = new WebClient();
        wc.Proxy = new WebProxy(proxy.ToString());
        wc.Headers[HttpRequestHeader.UserAgent] = ProxyScraper.DefaultAgent;
        // timeout in 10 seconds
        var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
        cts.Token.Register(wc.CancelAsync);
        await wc.OpenReadTaskAsync("https://google.com");
        cts.Dispose();
        return true;
    }
    catch
    {
        return false;
    }
});

...

Using Custom User Agent

// You can use any user agent you want!
var scraper = new ProxyScraper(ProxyScraper.DefaultList, "Your user agent");
var proxies = await scraper.ScrapeAsync();

...

Using Custom Request Timeouts

var scraper = new ProxyScraper(ProxyScraper.DefaultList, checkTimeout: 5, scrapeTimeout: 2);
var proxies = await scraper.ScrapeAsync();

...

Fast Scraping (Without proxy validation)

// Disable proxy checking
var scraper = new ProxyScraper(ProxyScraper.DefaultList, checkProxies: false);
var proxies = await scraper.ScrapeAsync();

...

Custom Proxy Check URL

// Make sure the proxies can successfully connect to a url
var scraper = new ProxyScraper(ProxyScraper.DefaultList, checkUrl: "https://google.ca");
var proxies = await scraper.ScrapeAsync();

...

Misc Configuration

// number of tasks for proxy checking (mainly waiting)
ProxyScraper.CheckTasks = 300;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.