laylavish / ublockorigin-huge-ai-blocklist Goto Github PK
View Code? Open in Web Editor NEWA huge blocklist of sites that contain AI generated content for uBlock Origin & uBlacklist.
A huge blocklist of sites that contain AI generated content for uBlock Origin & uBlacklist.
Every time I search for something PowerBI related I get this AI-generated nonsense as the first result in bing at work
blogspam
Line 936 of the primary list mentions the ability to press tab to comment out specific URLs.
However, I'm personally unable to get this to work. I'm unsure if there's something I'm missing, or if this was merely an internal dev comment that should be ignored. If it's intended to be a user-accessible feature and I'm simply not understanding how to utilize it, could a tutorial be added to the readme? If it's meant to be a dev comment and otherwise ignored, could that be clarified?
I've imported the main list.txt, but now can't browse DeviantArt (specifically user profiles). I think it's because there's a "nuclear" section contained within list.txt, not just the extra addon txt file
Many blocklists have a header for additional information e.g:
! Title: Huge AI Blocklist
! Expires: 1 day
! Description: A huge blocklist of sites that contain AI generated content for uBlock Origin.
! Homepage: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist
!
! GitHub issues: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/issues
! GitHub pull requests: https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/pulls
otherwise the blocklist will appear just as an url in the filter list:
Hi, thank you for your work! Like the title says it would be nice to have HOST format lists for pihole/adguard and also NAT blocking.
Greetings!
I'm not a Google user. if I want to make this list work for me I have to search and replace the term "google" on every line, which means I have to modify it and lose the ability to update it automatically from the source. Would be nice to not have to do that especially since I'm not sure how different the syntax will be for the new URLs.
I don't wanna just filter the sites, but block them fully. Could we get an autogenerated file that's in hosts file format, using 0.0.0.0?
(There are apparently simple scripts to generate such files, but I don't know any particular ones. Github Community to the rescue!)
I'm still getting results from sites that are definitely in the blocklist like openart or seaart. Original guess was that it's due to having google.co.uk as my default instead of .com, but switching to .com the images still show up, just with the caption link to the site missing.
uBlock Origin supports these links:
subscribe link
[subscribe link](https://subscribe.adblockplus.org?title=Huge%20AI%20Blocklist&location=https://raw.githubusercontent.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/main/list.txt)
I use Startpage (privacy browser with google-based results instead of bing-based results) and the blocklist isn't blocking anything for me on Startpage. Images of Startpage and Google image results shown for comparison. Google is working as intended, but Startpage doesn't block anything. Could you all add compatibility to not only my search engine but search engines other than Google, Bing, and DuckDuckGo? Thank you!
Not sure how hard this would be, but Pixiv has an AI tag that this may be able to take advantage of in order to block said images/creators from showing that post AI images.
I've been enjoying using the list so far, now that it works on DDG via uBO :D! I manually added most but not all of the nuclear list to my custom filters, and took the opportunity to experiment with the filters themselves. Removing the :style(opacity:0.00!important;)
part causes the default behaviour (display: none
) to take effect, which I've found more convenient, especially for searches with a lot of rubbish results. While the change is a simple matter of search and replace for a manually updated list, I can't really do the same for the main list pulled from the repo.
If it's not too much trouble, would it be possible to have versions of the lists provided that don't include the custom style, allowing uBO to completely hide those elements instead? This way, people can choose the one best suited to their preference or search engine (in my testing, DDG looked good without the custom style, but Google looked messy).
Comparison on DDG with and without :style(opacity:0.00!important;)
:
with:
without (display:none
is applied by default)
The image results in these aren't completely the same, DDG served slightly different results every time I tried ): You'll have to trust me that a bunch of elements are hidden in the second one xP
Speaking of :style(opacity:0.00!important;)
, why 0.00
instead of just 0
?
Unrelated, but not worthy of an issue I think: the nuclear deviantArt, etsy, etc filters are very likely to break other sites as artists often link these in their social media profiles, on their websites, etc, and the filters hide the entire containing div. Perhaps the nuclear list should be amended to specify specific search engines, as was done for dreamup recently?
Also, pinterest has a load of regional domains in addition to .com. In my custom filter, I changed it to pinterest.
, which captures those, but is still fairly unlikely to match any non-Pinterest sites.
Hey there, Lexica is another that needs adding to the list. Came across it searching "anime girl -ai" to test, and it's one of the first results.
EDIT: Huh, it is part of the list, but not fully blocked? Odd. Oh well
Some blocklist rules, especially in the search engine result blocklist, can block legitimate sites including with the blanket block rules
Search engine blocklist: Every site having the blocked words in the search engine result having the blocked word(s)
After reading some of the uBlock rules, in the search engine block session, there are several blanket rules at the end of the blocklist that will block any site which has title having words in the blocklist, such as "Ai", or "Lora model", regardless of context. \
Also in the blocklist, some sites such as Medium or Artstation, even though they may contain content generated in large language models and image generation models (which are most people concerning about AI-related spams), many people such as those working in software engineering and IT-related fields (for Medium) and artists, illustrators, designers(for Artstation) still use these platforms, even though they are in the caution list, they and others are still in blanket rules and aren't commented yet. \
For example, if the user using the blocklist wants to search about AI-related laws or want to find resources about fine tuning a "AI" model for other purposes (like machine translation, text summarization, OCR,... ) and want to learn about LoRA; any result about them will be blocked due to the blanket rules. Also some of the subreddit blocks like r/machinelearning for example, are mostly about news and technical questions rather than actual spam, and they may cover other topics as well. And even limiting the keywords to just these sites are less likely to work (especially with Medium). \
Also I think we need to clarify that these blocklists will only work with sites that are FOUND to be full of spam created by LLMs and image generation models. And use of LLMs and image generation models such as for SEOs or posting in social medias and forum are likely not covered as they are likely trying to not presenting themselves as using these models and pass as legitimate sites. And due to how complex of the models used for AI-related spam (such as GPT-3.5 Turbo, GPT-4, Gemini, MidJourney, DALL-E,...) and how variant of those models' result, uBlock filters cannot block them and any model or algorithms to filter these posts need to be extremely complex.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.