Code Monkey home page Code Monkey logo

ublock-origin-dev-filter's Issues

Search result removal not working for DDG

I have the DDG and Google bundle for Firefox. Links on the list are blocked in that I cannot access the sites, but results with those links are still showing up in search. Using this rule instead of the current one does remove the result however:

duckduckgo.*##.results>div:has(a[href*="copycatsite.com"])

Request: Explain a little what "copycat" websites are

I don't actually know what copycat websites are and why they should be removed from search results.
Are these the websites that mirror github/SO content as their own and therefore catch users who would have otherwise reached github/SO directly?
I looked into the 'data' folder and yet somehow decided NOT to actually try to visit any of those sites, to see what they are ;)

Request: add Wikipedia clones

Another category of copycat sites that I find maddening are Wikipedia clones. Wikipedia is freely licensed and even allows database dumps, so there are a ton of annoying mirrors out there. These sites can show up in search results for just about any topic. Here are a few:

*://www.wikiwand.com/*
*://wiki2.org/*
*://worddisk.com/wiki/*
*://thereaderwiki.com/*
*://www.absoluteastronomy.com/*
*://encyclopedia.thefreedictionary.com/*
*://peoplepill.com/*
*://www.algebra.com/algebra/about/history/*.wikipedia

You can see that these are clones from the following links:

There are a lot more copycats out there (Wikipedia maintains a giant list, but it includes many sites that have just copied a small amount of content and not the entire site, and I'm sure there are some dead/outdated entries as well), but I figured I'd just post a few to start.

OK to include in the letsblock.it project?

Hello @quenhus,

I am the maintainer of https://letsblock.it, a uBlock Origin list generator that allows users to pick and customize filter templates to filter out low-value content. The most requested feature so far is to extend the hide websites from search results template with presets to hide Github and Stackoverflow copycats. Instead of duplicating work, I'd love to reuse the data from your project and import it as user-selectable presets.

I am currently working on the implementation PR: letsblockit/letsblockit#64 and will deploy a staging version when the frontend is ready. I'd love to have your input on it, whether that's questions, ideas or concerns.

Add metadata to filter lists

Filter lists using the Adblock syntax can have metadata in special comments in the header. This helps identify the name of the list, its origin, refresh frequency, etc. For example:

! Title: uBlock-Origin-dev-filter
! Version: 12345
! Expires: 1 day
! Description: Filters to block and remove copycat-websites from DuckDuckGo and Google. Specific to dev websites like StackOverflow or GitHub.
! Homepage: https://github.com/quenhus/uBlock-Origin-dev-filter

One side-effect of not having this, is the lack of title after importing the uBlock-Origin-dev-filter list:
Screen Shot 2022-01-22 at 11 27 52

Thanks!

Request: add vuejscode.com to the filter

Request: add 12 URLs to the filter

ask-ubuntu.ru - askubuntu.com translation, has origin link
Evidence: https://ask-ubuntu.ru/questions/2337/ispravit-povrezhdennyij-razdel-ntfs-bez-windows
Original: https://askubuntu.com/questions/47700/fix-corrupt-ntfs-partition-without-windows

askubuntu.ru - askubuntu.com translation, no origin link
Evidence: https://askubuntu.ru/questions/170334/vosstanovit-ntfs-bez-windows-dublikat
Original: https://askubuntu.com/questions/330733/repair-ntfs-without-windows

kompsekret.ru - superuser.com translation, origin link leads to its own page
Evidence: https://kompsekret.ru/q/best-ways-to-fix-outlook-2007-error-your-mailbox-is-over-its-size-limit-288222/
Original: https://superuser.com/questions/293412/best-ways-to-fix-outlook-2007-error-your-mailbox-is-over-its-size-limit

ohandroid.com - stackoverflow.com translation, no origin link, answers merged with the question in one chunk
Evidence: http://www.ohandroid.com/android-widget-switch-toggled-event-listener.html
Original: https://stackoverflow.com/questions/21010924/android-widget-switch-toggled-event-listener/21010941

poweruser.guru - superuser.com translation, has origin link
Evidence: https://poweruser.guru/questions/324399/проводник-windows-держит-дескриптор-открытым-на-исполняемых-файлах
Original: https://superuser.com/questions/324399/windows-explorer-keeps-handle-open-on-executable-files

ruphp.com - stackoverflow.com translation, no origin link, answers merged with the question in one chunk
Evidence: https://ruphp.com/google-x43e-2.html
Original: https://stackoverflow.com/questions/10549049/accessing-google-bookmarks-server-side-with-php

server-fault.ru - serverfault.com translation, origin link leads to its own page
Evidence: https://server-fault.ru/questions/273014/kak-vosstanovit-zapisat-metku-klonirovat-suschestvujuschuju
Original: https://serverfault.com/questions/1085397/how-can-one-recover-write-a-label-clone-existing-one

sprosi.pro - stackoverflow.com translation, has origin link
Evidence: https://sprosi.pro/questions/824506/kak-mne-obnovit-razvetvlennyiy-repozitoriy-github
Original: https://stackoverflow.com/questions/7244321/how-do-i-update-or-sync-a-forked-repository-on-github

stackru.com - stackoverflow.com translation, has origin link
Evidence: https://stackru.com/questions/2261423/eclipse-javalangclassnotfoundexception
Original: https://stackoverflow.com/questions/1052978/eclipse-java-lang-classnotfoundexception

switch-case.ru - redirect to answer-id.com/ru/

ubuntugeeks.com - askubuntu.com translation, has origin link
Evidence: https://ubuntugeeks.com/questions/1/how-to-check-internet-speed-via-terminal
Original: https://askubuntu.com/questions/104755/how-to-check-internet-speed-via-terminal

ubuntuplace.info - askubuntu.com translation, has origin link
Evidence: https://ubuntuplace.info/questions/1/comment-verifier-la-vitesse-internet-le-terminal
Original: https://askubuntu.com/questions/104755/how-to-check-internet-speed-via-terminal

wikiroot.ru - superuser.com translation, no origin link
Evidence: https://wikiroot.ru/question/moy-bios-zavisaet-na-testirovanie-pamyati-chto-mojet-vyzvaty-eto
Original: https://superuser.com/questions/185086/my-bios-hangs-at-testing-memory-what-could-cause-this

russianblogs.com - no direct evidence, but looks very copycat site

Some dead links which can resurrect somedays:
fliplinux.com
issue.life
javahow.ru
programmerz.ru
qaru.site
ru.craftjs.com
stackoverrun.com
unix.stackovernet.com
vpros.ru

Request: add these sites to the filter

I never thought much about these copycat sites, until this...
https://www.google.com/search?q=baldur+site:fuscin.com

Practically fell backward outta my chair!

{funny enough, this comes up blank, lol: https://www.google.com/search?q=quenhus+site%3Afuscin.com}

Github copycats. (haven't checked if they copy other sites)

Evidence: https://www.fuscin.com/btigi/iiTweak
Original: https://github.com/btigi/iiTweak

Evidence: https://iboxshare.com/quenhus/uBlock-Origin-dev-filter
Original: https://github.com/quenhus/uBlock-Origin-dev-filter

Add Brave Search support

Just posting this here as a feature request. I might eventually get the time to add this myself, but I figured it's good to post this here in case other users are looking for the support.

Effectively, we'd just need to expand the Google + DuckDuckGo support to include search.brave.com as well.

Request: add serveanswer.com and solveforums.msomimaktaba.com to the filter

Request: add softbranchdevelopers.com to the filter

Reddit:

Evidence: https://softbranchdevelopers.com/typescript-type-substitution-for-proptypes-type/
Original: https://www.reddit.com/r/reactjs/comments/oyxbfn/typescript_type_substitution_for_proptypes_type/

Github:

Evidence: https://softbranchdevelopers.com/a-simple-vue-js-app-made-for-learning-the-framework-and-how-to-work-with-api/
Original: https://github.com/matteotagliatti/movie-vue-app?ref=vuejsexamples.com

Evidence: https://softbranchdevelopers.com/starter-pack-for-creating-ui-kit-on-vue-js/
Original: https://github.com/NorvikIT/vue-uikit-starter?ref=vuejsexamples.com

Comments

Sometimes there are non-human generic comments like the one on this article:
https://softbranchdevelopers.com/learn-swift-for-c-developers/

Thanks:

  1. Thank for for the awesome filterlist. I didn't knew the situation is so bad until I started working with more javascript technology. I thought it's just a myth that the situation is getting out of control.
  2. Do you plan to also include reddit copycats?

GitHub Raw isn't meant for serving files

GitHub Raw isn't made for serving files to tons of people, even though it can, so I would recommend setting up Pages on this repo to reduce strain on GitHub's systems.

Bug: url with non-wildcard path are not filtered on DDG

For example, the DDG filter for *://www.scholarship.edu.vn/wiki/* is duckduckgo.com##[data-domain$="www.scholarship.edu.vn/wiki"]

However, the attribute data-domain, as its name implies, only contain the domain part of the URL. In this example it is www.scholarship.edu.vn

We have to rewrite the DDG filter to handle this case and generate a more precise uBO filter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.