Code Monkey home page Code Monkey logo

Comments (11)

rjrudman avatar rjrudman commented on July 18, 2024 2

A regex-trie may be useful here.

Here's an explanation
https://stackoverflow.com/questions/42742810/speed-up-millions-of-regex-replacements-in-python-3/42789508#42789508

from userstalker.

adeak avatar adeak commented on July 18, 2024 1

Note that my uneducated impression is that these are just vague guidelines to help figure out filters for the future. Catching all those Mayweather vs McGregor spam accounts won't be terribly helpful going forward. Similarly, filtering for mlbopeningdayx will probably be less useful. So my layman's impression is that throwing these names into a TRIE will not necessarily be the best course of action. But I'm curious of whatever we can do :)

from userstalker.

Bhargav-Rao avatar Bhargav-Rao commented on July 18, 2024

Interesting, thanks for that.

from userstalker.

Bhargav-Rao avatar Bhargav-Rao commented on July 18, 2024

If any one wants to contribute, drop in the regexes into this file here https://github.com/SOBotics/UserStalker/blob/master/data/blacklistRegex.txt

from userstalker.

rjrudman avatar rjrudman commented on July 18, 2024

Are you happy with a computer generated regex-trie, or are you more after a human readable regex?

from userstalker.

Bhargav-Rao avatar Bhargav-Rao commented on July 18, 2024

Anything is fine. As long as it does the job.

from userstalker.

adeak avatar adeak commented on July 18, 2024

Come to think of it, in cases of false positives one often wants to look at the pattern that was triggered. So if a TRIE is included, it would probably be prudent to additionally provide the list of keywords from which the TRIE was generated.

from userstalker.

rjrudman avatar rjrudman commented on July 18, 2024

@adeak Yeah, you raise a good point if we're trying to catch future spammers

from userstalker.

Papershine avatar Papershine commented on July 18, 2024

There is a blacklisted username list by Charcoal for SmokeDetector here

from userstalker.

Bhargav-Rao avatar Bhargav-Rao commented on July 18, 2024

Yeah, we're using that, but it certainly isn't as comprehensive as Rubio's list.

from userstalker.

codygray avatar codygray commented on July 18, 2024

I'm marking this as "closed", because the action items have been completed.

We are now using all of the regex blacklists available from HeatDetector (to detect offensive words) as well as those available from Charcoal's SmokeDetector project (which will detect known trolls/spammers, as well as known problematic patterns). Plus, User Stalker has its own user-name blacklist built in, which includes Rubio's list, and is meant to be kept up-to-date by any moderator who handles the User Stalker reports.

Help is, of course, always welcome on improving and/or expanding the regexes in blacklists. If you have something you want to add or improve, simply submit a pull request (PR) for one of the existing patterns (https://github.com/SOBotics/UserStalker/tree/master/patterns). If you have a more complex suggestion, please open a new Issue.

from userstalker.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.