Sometimes we have a list of domains to access through a proxy. So we generate a prefix

For now I'm not planning to implement it I can share my implementation

For reference only: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1315558" rel

Use domain names prefix tree (trie) for templates storage about firefox-extension HOT 13 CLOSED

KOLANICH commented on June 7, 2024

Use domain names prefix tree (trie) for templates storage

from firefox-extension.

Comments (13)

ericjung commented on June 7, 2024

Thanks for the suggestion. Is this something you're planning on implementing?

from firefox-extension.

KOLANICH commented on June 7, 2024

For now I'm not planning to implement it
I can share my implementation of trie lookup I have created for a .pac file, but I guess it may make sense to use.
I guess that uBlock Origin has to do the similar tasks and its matching engine may already contain the needed code
Ideally we need to decouple storage and update of lists into a separate addon, and a matching engine constructing a trie into a yet another addon, in order for the devs of each addon not to have to implement it themselves. Again, uBO is the most likely candidate to rip the relevant parts. See open-source-ideas/ideas#60 for more info. I'd like to add that the subscribtion manager addon should allow 2 models of storing, one in own storage and another one on the side of another addon, but controlling the storage through message passing. It is needed because we can precompile a trie and store it in a serialized form, this way we don't have to regenerate it on every startup.

from firefox-extension.

ericjung commented on June 7, 2024

I am curious to see your implementation for a PAC file if it handles regular expressions. I don’t understand how a trie is going to work with regular expressions that can match any number of domains and/or subdomains.

from firefox-extension.

KOLANICH commented on June 7, 2024

It doesn't handle regular expressions, I just a pac file

`my.pac`

"use strict";
const neededProxy = "HTTP some.proxy.example:3128"

function Node(suffixes){
	let processedSuffixes = [];
	for(let s of suffixes){
		if(s instanceof Array){
			let [prefix, childSuffixes] = s
			s = [prefix, new Node(childSuffixes)];
		} else {
			s = [s, null];
		}
		processedSuffixes.push(s);
	}
	this.suffixes = new Map(processedSuffixes);
}

Node.prototype.get1Level = function(comp){
	if(this.suffixes.has(comp)){
		let res = this.suffixes.get(comp);
		if(res === null){
			return true;
		}
		return res;
	}
	return false;
}

Node.prototype.testSplitted = function(uriSplitted){
	let checker = this; 
	for(let comp of uriSplitted){
		checker = checker.get1Level(comp);
		if(! (checker instanceof Node)){
			return checker;
		}
	}
	return false;
}

Node.prototype.test = function(domain){
	return this.testSplitted(domain.split(".").reverse());
}

const domainz = new Node([
	["a", [
		"b",
		"c",
		"d",
		["e", [
			"f",
			"g"
		]],
		["h", ["i"]],
		
	]],
	[
		"j",
		[
			[
				"k",
				["l"]
			],
			"m",
			["n", [
				"o"
			]],
			"p",
			["q", ["r"]],
			"s",
			"t",
		]
	],
	["u",
		"v",
		["w", [
			"x"
		]]
	],
	["y", [
		"z"
	]]
]);

function FindProxyForURL(url, host) {
	if(url.startsWith("https:") && domainz.test(host)){
		return neededProxy;
	}
	return "DIRECT";
}

For regexps there is a tool https://github.com/ermanh/trieregex , but it only generates regex0s and it is in python. If you would like to rewrite it, it may make sense to rewrite into haxe, not in JS directly.

from firefox-extension.

ericjung commented on June 7, 2024

How do you propose to handle regular expressions in a trie?

from firefox-extension.

KOLANICH commented on June 7, 2024

For the first time - not to handle them at all except of wildcard ones (and my example handles wildcards through a special node). In future it may be possible to parse regexps and integrate them into the automata, but not now, it is a large work. For now it is desireable to achive just the simplest case. I have a list of domains that must be accessed through a proxy because otherwise there is no access to them. Not regexps, but a pretty long list of domains, line-by-line.

from firefox-extension.

erosman commented on June 7, 2024

For reference only: tld service for webextensions

from firefox-extension.

ericjung commented on June 7, 2024

I think you are suggesting two pattern matching paths; one for regular expressions and one for literals with no wildcards/regular expressions. Is that right?

from firefox-extension.

KOLANICH commented on June 7, 2024

One for regular expressions and another one for flat lists of domains without wildcards in the middle (since there is no backtracking), but wildcards in the end are possible. To be honest it is possible to unify them, but I don't think it is needed. To be honest I don't think that the regexp path is needed at all. Regexps are slow and applying then one by one is inefficient if you have a lot of them. Their only advantage is that look familiar. What really people need is structured matching against a parsed URI, not against an URI as a text string. So we need a restricted DSL that will be transformed into a state machine.

The human-readable template DSL can look like a wildcard pattern for a domain name with paths inverted, github.com will look like com.github without a scheme. It is a sequence of tokens separated by dots.
abcd matches string abcd literally
/rx/ matches strings matching the regexp
$ - means non-wildcard match, allowed only in the end.
%<number> - borrows n first tokens from the last full pattern. Allowed only in the beginning. Enables the stuff like

io.github.a
%2.b
%2.c

The DSL looks somehow similar to the tree like in the pac file example and can be easily transformed into the trie and back.

Another ingested format is just flat list of domains. To convert it into the DSL each domain name is parsed and its components order is inverted, then the leading %<digit> are easily inferred.

from firefox-extension.

ericjung commented on June 7, 2024

The DSL is recommend is not user-friendly. People will stop using FoxyProxy if they have to learn a DSL first before using it. As far as I can see, this entire suggestion is related to performance improvement. But there are no numbers showing performance degradation. I am sure as pattern lists get high, performance is impacted in some way, but it's anyone's guess how much. Your suggestion might be called "premature optimization" -- optimizing without knowing the impact of the current solution.

Of course the impact depends on many things like the complexity of regular expressions or wildcards used. I do agree most people don't need the power of reg exp, but again it's an understood DSL that is ubiquitous.

from firefox-extension.

KOLANICH commented on June 7, 2024

But there are no numbers showing performance degradation.
Your suggestion might be called "premature optimization" -- optimizing without knowing the impact of the current solution.

To measure performance degradation one has to implement the feature first.

People will stop using FoxyProxy if they have to learn a DSL first before using it.

FoxyProxy target audience are power users. Non-power users usually don't use addons at all.

The DSL is recommend is not user-friendly.

Regexps are even less user-friendly. Anyway, it is proposed to keep the old methods for the ones using them.

The DSL is recommend is not user-friendly.
but again it's an understood DSL that is ubiquitous.

I guess editing a serialized structure is even less user-friendly. The DSL is a compromise between repeating oneself (which is not user-friendly and also bloat) and editing a JSON- or YAML-serialized structure, which is error-prone.

from firefox-extension.

erosman commented on June 7, 2024

FoxyProxy is being updated for v8.0 in preparation for manifest v3.
The next update will support full & partial URL matches, similar to original Firefox FoxyProxy v3 (and current Chrome FoxyProxy) before host limit came into effect in Firefox FoxyProxy v4.

from firefox-extension.

ericjung commented on June 7, 2024

We will be keeping wildcard and regular expressions in v8.0 and the future.

from firefox-extension.

Use domain names prefix tree (trie) for templates storage about firefox-extension HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent