Comments (11)
A regex-trie may be useful here.
Here's an explanation
https://stackoverflow.com/questions/42742810/speed-up-millions-of-regex-replacements-in-python-3/42789508#42789508
from userstalker.
Note that my uneducated impression is that these are just vague guidelines to help figure out filters for the future. Catching all those Mayweather vs McGregor spam accounts won't be terribly helpful going forward. Similarly, filtering for mlbopeningdayx
will probably be less useful. So my layman's impression is that throwing these names into a TRIE will not necessarily be the best course of action. But I'm curious of whatever we can do :)
from userstalker.
Interesting, thanks for that.
from userstalker.
If any one wants to contribute, drop in the regexes into this file here https://github.com/SOBotics/UserStalker/blob/master/data/blacklistRegex.txt
from userstalker.
Are you happy with a computer generated regex-trie, or are you more after a human readable regex?
from userstalker.
Anything is fine. As long as it does the job.
from userstalker.
Come to think of it, in cases of false positives one often wants to look at the pattern that was triggered. So if a TRIE is included, it would probably be prudent to additionally provide the list of keywords from which the TRIE was generated.
from userstalker.
@adeak Yeah, you raise a good point if we're trying to catch future spammers
from userstalker.
There is a blacklisted username list by Charcoal for SmokeDetector here
from userstalker.
Yeah, we're using that, but it certainly isn't as comprehensive as Rubio's list.
from userstalker.
I'm marking this as "closed", because the action items have been completed.
We are now using all of the regex blacklists available from HeatDetector (to detect offensive words) as well as those available from Charcoal's SmokeDetector project (which will detect known trolls/spammers, as well as known problematic patterns). Plus, User Stalker has its own user-name blacklist built in, which includes Rubio's list, and is meant to be kept up-to-date by any moderator who handles the User Stalker reports.
Help is, of course, always welcome on improving and/or expanding the regexes in blacklists. If you have something you want to add or improve, simply submit a pull request (PR) for one of the existing patterns (https://github.com/SOBotics/UserStalker/tree/master/patterns). If you have a more complex suggestion, please open a new Issue.
from userstalker.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from userstalker.