Code Monkey home page Code Monkey logo

Comments (13)

ericjung avatar ericjung commented on June 7, 2024

Thanks for the suggestion. Is this something you're planning on implementing?

from firefox-extension.

KOLANICH avatar KOLANICH commented on June 7, 2024
  1. For now I'm not planning to implement it
  2. I can share my implementation of trie lookup I have created for a .pac file, but I guess it may make sense to use.
  3. I guess that uBlock Origin has to do the similar tasks and its matching engine may already contain the needed code
  4. Ideally we need to decouple storage and update of lists into a separate addon, and a matching engine constructing a trie into a yet another addon, in order for the devs of each addon not to have to implement it themselves. Again, uBO is the most likely candidate to rip the relevant parts. See open-source-ideas/ideas#60 for more info. I'd like to add that the subscribtion manager addon should allow 2 models of storing, one in own storage and another one on the side of another addon, but controlling the storage through message passing. It is needed because we can precompile a trie and store it in a serialized form, this way we don't have to regenerate it on every startup.

from firefox-extension.

ericjung avatar ericjung commented on June 7, 2024

I am curious to see your implementation for a PAC file if it handles regular expressions. I don’t understand how a trie is going to work with regular expressions that can match any number of domains and/or subdomains.

from firefox-extension.

KOLANICH avatar KOLANICH commented on June 7, 2024

It doesn't handle regular expressions, I just a pac file

`my.pac`
"use strict";
const neededProxy = "HTTP some.proxy.example:3128"

function Node(suffixes){
	let processedSuffixes = [];
	for(let s of suffixes){
		if(s instanceof Array){
			let [prefix, childSuffixes] = s
			s = [prefix, new Node(childSuffixes)];
		} else {
			s = [s, null];
		}
		processedSuffixes.push(s);
	}
	this.suffixes = new Map(processedSuffixes);
}

Node.prototype.get1Level = function(comp){
	if(this.suffixes.has(comp)){
		let res = this.suffixes.get(comp);
		if(res === null){
			return true;
		}
		return res;
	}
	return false;
}

Node.prototype.testSplitted = function(uriSplitted){
	let checker = this; 
	for(let comp of uriSplitted){
		checker = checker.get1Level(comp);
		if(! (checker instanceof Node)){
			return checker;
		}
	}
	return false;
}

Node.prototype.test = function(domain){
	return this.testSplitted(domain.split(".").reverse());
}

const domainz = new Node([
	["a", [
		"b",
		"c",
		"d",
		["e", [
			"f",
			"g"
		]],
		["h", ["i"]],
		
	]],
	[
		"j",
		[
			[
				"k",
				["l"]
			],
			"m",
			["n", [
				"o"
			]],
			"p",
			["q", ["r"]],
			"s",
			"t",
		]
	],
	["u",
		"v",
		["w", [
			"x"
		]]
	],
	["y", [
		"z"
	]]
]);

function FindProxyForURL(url, host) {
	if(url.startsWith("https:") && domainz.test(host)){
		return neededProxy;
	}
	return "DIRECT";
}

For regexps there is a tool https://github.com/ermanh/trieregex , but it only generates regex0s and it is in python. If you would like to rewrite it, it may make sense to rewrite into haxe, not in JS directly.

from firefox-extension.

ericjung avatar ericjung commented on June 7, 2024

How do you propose to handle regular expressions in a trie?

from firefox-extension.

KOLANICH avatar KOLANICH commented on June 7, 2024

For the first time - not to handle them at all except of wildcard ones (and my example handles wildcards through a special node). In future it may be possible to parse regexps and integrate them into the automata, but not now, it is a large work. For now it is desireable to achive just the simplest case. I have a list of domains that must be accessed through a proxy because otherwise there is no access to them. Not regexps, but a pretty long list of domains, line-by-line.

from firefox-extension.

erosman avatar erosman commented on June 7, 2024

For reference only: tld service for webextensions

from firefox-extension.

ericjung avatar ericjung commented on June 7, 2024

I think you are suggesting two pattern matching paths; one for regular expressions and one for literals with no wildcards/regular expressions. Is that right?

from firefox-extension.

KOLANICH avatar KOLANICH commented on June 7, 2024

One for regular expressions and another one for flat lists of domains without wildcards in the middle (since there is no backtracking), but wildcards in the end are possible. To be honest it is possible to unify them, but I don't think it is needed. To be honest I don't think that the regexp path is needed at all. Regexps are slow and applying then one by one is inefficient if you have a lot of them. Their only advantage is that look familiar. What really people need is structured matching against a parsed URI, not against an URI as a text string. So we need a restricted DSL that will be transformed into a state machine.

The human-readable template DSL can look like a wildcard pattern for a domain name with paths inverted, github.com will look like com.github without a scheme. It is a sequence of tokens separated by dots.
abcd matches string abcd literally
/rx/ matches strings matching the regexp
$ - means non-wildcard match, allowed only in the end.
%<number> - borrows n first tokens from the last full pattern. Allowed only in the beginning. Enables the stuff like

io.github.a
%2.b
%2.c

The DSL looks somehow similar to the tree like in the pac file example and can be easily transformed into the trie and back.

Another ingested format is just flat list of domains. To convert it into the DSL each domain name is parsed and its components order is inverted, then the leading %<digit> are easily inferred.

from firefox-extension.

ericjung avatar ericjung commented on June 7, 2024

The DSL is recommend is not user-friendly. People will stop using FoxyProxy if they have to learn a DSL first before using it. As far as I can see, this entire suggestion is related to performance improvement. But there are no numbers showing performance degradation. I am sure as pattern lists get high, performance is impacted in some way, but it's anyone's guess how much. Your suggestion might be called "premature optimization" -- optimizing without knowing the impact of the current solution.

Of course the impact depends on many things like the complexity of regular expressions or wildcards used. I do agree most people don't need the power of reg exp, but again it's an understood DSL that is ubiquitous.

from firefox-extension.

KOLANICH avatar KOLANICH commented on June 7, 2024

But there are no numbers showing performance degradation.
Your suggestion might be called "premature optimization" -- optimizing without knowing the impact of the current solution.

To measure performance degradation one has to implement the feature first.

People will stop using FoxyProxy if they have to learn a DSL first before using it.

FoxyProxy target audience are power users. Non-power users usually don't use addons at all.

The DSL is recommend is not user-friendly.

Regexps are even less user-friendly. Anyway, it is proposed to keep the old methods for the ones using them.

The DSL is recommend is not user-friendly.
but again it's an understood DSL that is ubiquitous.

I guess editing a serialized structure is even less user-friendly. The DSL is a compromise between repeating oneself (which is not user-friendly and also bloat) and editing a JSON- or YAML-serialized structure, which is error-prone.

from firefox-extension.

erosman avatar erosman commented on June 7, 2024

FoxyProxy is being updated for v8.0 in preparation for manifest v3.
The next update will support full & partial URL matches, similar to original Firefox FoxyProxy v3 (and current Chrome FoxyProxy) before host limit came into effect in Firefox FoxyProxy v4.

from firefox-extension.

ericjung avatar ericjung commented on June 7, 2024

We will be keeping wildcard and regular expressions in v8.0 and the future.

from firefox-extension.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.