Code Monkey home page Code Monkey logo

ublockorigin-huge-ai-blocklist's People

Contributors

desk7 avatar eltociear avatar ite-usagi avatar laylavish avatar maybeanerd avatar notainutilis avatar pengowray avatar richlv avatar vishalnandagopal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ublockorigin-huge-ai-blocklist's Issues

Add ZebraBI.com to list

Every time I search for something PowerBI related I get this AI-generated nonsense as the first result in bing at work

Misinterpreted/missing feature?

Line 936 of the primary list mentions the ability to press tab to comment out specific URLs.

However, I'm personally unable to get this to work. I'm unsure if there's something I'm missing, or if this was merely an internal dev comment that should be ignored. If it's intended to be a user-accessible feature and I'm simply not understanding how to utilize it, could a tutorial be added to the readme? If it's meant to be a dev comment and otherwise ignored, could that be clarified?

"Nuclear" sites are in the main list

I've imported the main list.txt, but now can't browse DeviantArt (specifically user profiles). I think it's because there's a "nuclear" section contained within list.txt, not just the extra addon txt file

Suggestion: Properly extend filters to all search results

When I searched perchance.org on Google, the links are still clickable and the captions below them still display:
image
Extending the filters to all search results and not just image results would be nice, and it would help me even more in avoiding websites which embrace AI.

Consider adding blocklist header for uBlock

Many blocklists have a header for additional information e.g:

! Title: Huge AI Blocklist
! Expires: 1 day
! Description: A huge blocklist of sites that contain AI generated content for uBlock Origin. 
! Homepage: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist
!
! GitHub issues: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/issues
! GitHub pull requests: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/pulls

otherwise the blocklist will appear just as an url in the filter list:
image

Provide a version for duckduckgo/other search engines?

I'm not a Google user. if I want to make this list work for me I have to search and replace the term "google" on every line, which means I have to modify it and lose the ability to update it automatically from the source. Would be nice to not have to do that especially since I'm not sure how different the syntax will be for the new URLs.

[Request] Hosts file version

I don't wanna just filter the sites, but block them fully. Could we get an autogenerated file that's in hosts file format, using 0.0.0.0?

(There are apparently simple scripts to generate such files, but I don't know any particular ones. Github Community to the rescue!)

Not fully blocking?

I'm still getting results from sites that are definitely in the blocklist like openart or seaart. Original guess was that it's due to having google.co.uk as my default instead of .com, but switching to .com the images still show up, just with the caption link to the site missing.

Add additional search engines to the blocklist

I use Startpage (privacy browser with google-based results instead of bing-based results) and the blocklist isn't blocking anything for me on Startpage. Images of Startpage and Google image results shown for comparison. Google is working as intended, but Startpage doesn't block anything. Could you all add compatibility to not only my search engine but search engines other than Google, Bing, and DuckDuckGo? Thank you!
image
image

Pixiv Support

Not sure how hard this would be, but Pixiv has an AI tag that this may be able to take advantage of in order to block said images/creators from showing that post AI images.

`opacity: 0` versus the default `display: none`?

I've been enjoying using the list so far, now that it works on DDG via uBO :D! I manually added most but not all of the nuclear list to my custom filters, and took the opportunity to experiment with the filters themselves. Removing the :style(opacity:0.00!important;) part causes the default behaviour (display: none) to take effect, which I've found more convenient, especially for searches with a lot of rubbish results. While the change is a simple matter of search and replace for a manually updated list, I can't really do the same for the main list pulled from the repo.

If it's not too much trouble, would it be possible to have versions of the lists provided that don't include the custom style, allowing uBO to completely hide those elements instead? This way, people can choose the one best suited to their preference or search engine (in my testing, DDG looked good without the custom style, but Google looked messy).

Comparison on DDG with and without :style(opacity:0.00!important;):
with:
image
without (display:none is applied by default)
image
The image results in these aren't completely the same, DDG served slightly different results every time I tried ): You'll have to trust me that a bunch of elements are hidden in the second one xP

Speaking of :style(opacity:0.00!important;), why 0.00 instead of just 0?

Unrelated, but not worthy of an issue I think: the nuclear deviantArt, etsy, etc filters are very likely to break other sites as artists often link these in their social media profiles, on their websites, etc, and the filters hide the entire containing div. Perhaps the nuclear list should be amended to specify specific search engines, as was done for dreamup recently?
Also, pinterest has a load of regional domains in addition to .com. In my custom filter, I changed it to pinterest., which captures those, but is still fairly unlikely to match any non-Pinterest sites.

Additional AI website to add

Hey there, Lexica is another that needs adding to the list. Came across it searching "anime girl -ai" to test, and it's one of the first results.

EDIT: Huh, it is part of the list, but not fully blocked? Odd. Oh well

Some rules can block legitimate sites

Problem

Some blocklist rules, especially in the search engine result blocklist, can block legitimate sites including with the blanket block rules

Affected sites

Search engine blocklist: Every site having the blocked words in the search engine result having the blocked word(s)

Details:

After reading some of the uBlock rules, in the search engine block session, there are several blanket rules at the end of the blocklist that will block any site which has title having words in the blocklist, such as "Ai", or "Lora model", regardless of context. \

Also in the blocklist, some sites such as Medium or Artstation, even though they may contain content generated in large language models and image generation models (which are most people concerning about AI-related spams), many people such as those working in software engineering and IT-related fields (for Medium) and artists, illustrators, designers(for Artstation) still use these platforms, even though they are in the caution list, they and others are still in blanket rules and aren't commented yet. \

For example, if the user using the blocklist wants to search about AI-related laws or want to find resources about fine tuning a "AI" model for other purposes (like machine translation, text summarization, OCR,... ) and want to learn about LoRA; any result about them will be blocked due to the blanket rules. Also some of the subreddit blocks like r/machinelearning for example, are mostly about news and technical questions rather than actual spam, and they may cover other topics as well. And even limiting the keywords to just these sites are less likely to work (especially with Medium). \

Also I think we need to clarify that these blocklists will only work with sites that are FOUND to be full of spam created by LLMs and image generation models. And use of LLMs and image generation models such as for SEOs or posting in social medias and forum are likely not covered as they are likely trying to not presenting themselves as using these models and pass as legitimate sites. And due to how complex of the models used for AI-related spam (such as GPT-3.5 Turbo, GPT-4, Gemini, MidJourney, DALL-E,...) and how variant of those models' result, uBlock filters cannot block them and any model or algorithms to filter these posts need to be extremely complex.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.