Code Monkey home page Code Monkey logo

smokedetector's Introduction

SmokeDetector

Build Status Circle CI Coverage Status Open issues Open PRs

Headless chatbot that detects spam and posts it to chatrooms. Uses ChatExchange, takes questions from the Stack Exchange realtime tab, and accesses answers via the Stack Exchange API.

Example chat post:

Example chat post

Documentation

User documentation is in the wiki.

Detailed documentation for setting up and running SmokeDetector is in the wiki.

Basic setup

To set up SmokeDetector, please use

git clone https://github.com/Charcoal-SE/SmokeDetector.git
cd SmokeDetector
git checkout deploy
sudo pip3 install -r requirements.txt --upgrade
pip3 install --user -r user_requirements.txt --upgrade

Next, copy config.sample to a new file called config, and edit the values required.

To run, use python3 nocrash.py (preferably in a daemon-able mode, like a screen session.) You can also use python3 ws.py, but then SmokeDetector will be shut down after 6 hours; when running from nocrash.py, it will be restarted. (This is to be sure that closed websockets, if any, are reopened.)

Virtual environment setup

Running in a virtual environment is a good way to isolate dependency packages from your local system. To set up SmokeDetector in a virtual environment, you can use

git clone https://github.com/Charcoal-SE/SmokeDetector.git
cd SmokeDetector
git config user.email "[email protected]"
git config user.name "SmokeDetector"
git checkout deploy

python3 -m venv env
env/bin/pip3 install -r requirements.txt --upgrade
env/bin/pip3 install --user -r user_requirements.txt --upgrade

Next, copy the config file and edit as said above. To run SmokeDetector in this virtual environment, use env/bin/python3 nocrash.py.

[Note: On some systems (e.g. Mac's and Linux), some circumstances may require the --user option to be removed from the last pip3 command line in the above instructions. However, the --user option is known to be necessary in other circumstances. Further testing is necessary to resolve the discrepancy.]

Docker setup

Running in a Docker container is an even better way to isolate dependency packages from your local system. To set up SmokeDetector in a Docker container, follow the steps below.

  1. Grab the Dockerfile and build an image of SmokeDetector:

    DATE=$(date +%F)
    mkdir temp
    cd temp
    wget https://raw.githubusercontent.com/Charcoal-SE/SmokeDetector/master/Dockerfile
    docker build -t smokey:$DATE .
  2. Create a container from the image you just built

    docker create --name=mysmokedetector smokey:$DATE
  3. Start the container. Don't worry, SmokeDetector won't run until it's ready, so you have the chance to edit the configuration file before SmokeDetector runs.

    Copy config.sample to a new file named config and edit the values required, then copy the file into the container with this command:

    docker cp config mysmokedetector:/home/smokey/SmokeDetector/config
  4. If you would like to set up additional stuff (SSH, Git etc.), you can do so with a Bash shell in the container:

    docker exec -it mysmokedetector bash

    After you're ready, put a file named ready under /home/smokey:

    touch ~smokey/ready

Automate Docker deployment with Docker Compose

I'll assume you have the basic ideas of Docker and Docker Compose.

The first thing you need is a properly filled config file. You can start with the sample.

Create a directory (name it whatever you like), place the config file and docker-compose.yml file. Run docker-compose up -d and your SmokeDetector instance is up.

If you want additional control like memory and CPU constraint, you can edit docker-compose.yml and add the following keys to smokey. The example values are recommended values.

restart: always  # when your host reboots Smokey can autostart
mem_limit: 512M
cpus: 0.5  # Recommend 2.0 or more for spam waves

Requirements

SmokeDetector only supports Stack Exchange logins, and runs on Python 3.7 or higher, for now.

To allow committing blacklist and watchlist modifications back to GitHub, your system also needs Git 1.8 or higher, although we recommend Git 2.11+.

License

Licensed under either of

at your option.

Contribution Licensing

By submitting your contribution for inclusion in the work as defined in the Apache-2.0 license, you agree that it be dual licensed as above, without any additional terms or conditions.

smokedetector's People

Contributors

angussidney avatar artofcode- avatar awegnergithub avatar basicnullification avatar brocka avatar bytecommander avatar calvt avatar csnardi avatar double-fault avatar ferrybig avatar glorfindel83 avatar ibug avatar j-f1 avatar jeffschaller avatar magisch avatar makyen avatar manishearth avatar metasmoke avatar normalhuman avatar papershine avatar quartata avatar rschrieken avatar smokedetector avatar spevacus avatar superplane39 avatar teward avatar thomas-daniels avatar tripleee avatar undo1 avatar user12986714 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

smokedetector's Issues

Trips on posts in a foreign language

Whenever a post shows up that is posted in a different language, a message like this is posted:

[ SmokeDetector ] All-caps title: יוֹשְׁבֵי דְּבִיר

Could we make it not do that?

Add blacklist commands for keywords

Prompted by this: http://chat.stackexchange.com/transcript/message/33655049#33655049

I (and bwDraco) think that it would be great if we could implement a !!/blacklist-keyword command or similar.

Since we already have the !!/blacklist command implemented for websites, and the bad keyword regexes have already been separated into a text file, it should be fairly easy. All the infrastructure is there; we just need to copy/paste some code and change a few things.

While we're doing this, it might also be a good idea to rename the existing !!/blacklist command to !!/blacklist-website. I've noticed that a few people recently have misunderstood the use of the blacklist command, and changing its name should make that clear.

What are your thoughts?

If we reach an agreement that this would be a good idea, I could do this on the weekend (as long as no-one beats me to it).

Creating a write-only team for Charcoal projects

I'm leaving this here because it seems like more people see our GitHub issues than they do chat messages.

This was an idea I had a couple days ago. What if we created another team on the Charcoal-SE organization, with write access to all of Charcoal's projects (as opposed to the admin access that the Core team have)? That allows us to give people who commonly contribute to the projects write access, instead of forcing them to rely on pull requests - which, let's be honest, aren't much fun when you're just waiting around.

I'm thinking of people like Kyll, Ashish, and angussidney, who have sent us the majority of our recent pull requests. There are probably more people I've missed out there, but that's the general idea.

Obviously, they'd need to be people (a) who we can trust, and (b) with privileges for Smokey. I'm not proposing adding just anyone who submits a PR; rather, people who have a track record of good contributions. Those people are a big resource for Charcoal; this would reduce the friction to them helping out.

Thoughts?

Alias 'tpa' with 'tpu'

... I'm asking for this because I seem to always reply with "tpa" by habit (from using Phamhilator) instead of tpu.

Blacklist functionality is broken, due to Git checks

Git currently checks against "Master", not "Deploy" which isn't updated as frequently. This prevents HEAD checks from being done. Ideally, we'd be updating "Master" right before the check, or we'd switch all the checks to "Deploy".

We'll have to determine which approach we want to have happen, depending on which approach we want to have go forward.

!!false command to indicate false positives

As an admin I want to be able to respond to a spam post with the !!false command to instruct smokey that the reported spam is a false positive so that smokey doesn't keep posting that same post in the chat room.

Possible steps to implement:
start with an in memory storage for the current running instance
Later add persistent storage

Commands not all returning Response Objects

At this point, this is an issue to remind me what I've investigated.

Problem: Smokey is returning something other than a response object in a few areas.

A bool here:

AttributeError: 'bool' object has no attribute 'message'
2016-07-18 10:37:33.673173 UTC
  File "/media/sda2/Smokey2/excepthook.py", line 46, in run_with_except_hook
    run_old(*args, **kw)

  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)

  File "/media/sda2/Smokey2/ChatExchange/chatexchange/browser.py", line 696, in _runner
    self.on_activity(json.loads(a))

  File "/media/sda2/Smokey2/ChatExchange/chatexchange/rooms.py", line 81, in on_activity
    event_callback(event, self._client)

  File "/media/sda2/Smokey2/chatcommunicate.py", line 164, in watcher
    if result.message:

A NoneType here:




  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)

  File "/media/sda2/Smokey2/ChatExchange/chatexchange/browser.py", line 696, in _runner
    self.on_activity(json.loads(a))

  File "/media/sda2/Smokey2/ChatExchange/chatexchange/rooms.py", line 81, in on_activity
    event_callback(event, self._client)

  File "/media/sda2/Smokey2/chatcommunicate.py", line 133, in watcher
    if result.command_status and result.message:


AttributeError: 'NoneType' object has no attribute 'command_status'
2016-07-18 10:15:45.964939 UTC
  File "/media/sda2/Smokey2/excepthook.py", line 46, in run_with_except_hook
    run_old(*args, **kw)

  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)

  File "/media/sda2/Smokey2/ChatExchange/chatexchange/browser.py", line 696, in _runner
    self.on_activity(json.loads(a))

  File "/media/sda2/Smokey2/ChatExchange/chatexchange/rooms.py", line 81, in on_activity
    event_callback(event, self._client)

  File "/media/sda2/Smokey2/chatcommunicate.py", line 133, in watcher
    if result.command_status and result.message:

Initial oddness:

  • Issuing the command !!/addblu google.com returns the following:

Invalid format. Valid format: !!/addblu profileurl or !!/addblu userid sitename.

Expected to get the above plus <unrecognized command> because command_status is False and it should fall through to that check here

  • bool may be from a failed permissions check. check_permissions returns False on failure. Change this to a response object
  • Multiple responses to same command

Issuing sd abc dbf sdc responds with

1. [:31107584] <processed without return value>
<unrecognized command>
2. [:31107579] <processed without return value>
<unrecognized command>
3. [:31107416] <processed without return value>
<unrecognized command>

This needs to be adjusted to only response with the "unrecognized command". This is due to these checks being only if not if...else checks.

Migrate config to globalvariables

There are currently only a few values in config. I propose we remove "config" completely and migrate these values to globalvaribles.

This removes two locations (config and globalvariables) that developers have to modify when starting SmokeDetector and allows us to remove a ConfigParser import that is only used at start up.

Limit "notify" to users that have been active in the room in the past <X> minutes.

At this moment, in the SO Close Vote Reviewers chatroom, SD is notifying 6 different users for every single report.

Example:

[ SmokeDetector | MS ] Few unique characters in body: SolvedSOLVEDSOLVED by Furkan Ayık on stackoverflow.com
(@​PraveenKumar @​AndrasDeak @πάνταῥεῖ @​FrankerZ @​tripleee @​dorukayhan)

The amount of users getting notified has steadily been growing. Imo, it's getting a little annoying. Only a portion of the users actually respond to these notifications, and they get notified even when they haven't been active for hours.

I'd like to request these notifications to be filtered on user activity.
IE: Don't bother notifying a user that hasn't been active in the last hour.

Just to be clear: I have no issue with these users. Just that the list of notifications is getting close to the length of the actual report.

Relicense under dual MIT/Apache 2.0

We currently don't have a specified license for SmokeDetector - and thus, by default, it is under full copyright. This is very restrictive, and we'd like to change it to something more permissive (in this case, dual licensed under MIT/Apache 2.0). We'll need consent from all contributors to this repository to do so:

To agree to relicensing, just leave this comment below or otherwise indicate consent:

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to choose either at their option.

Some more info:

This involves adding the following to the README and including the full text of both licenses in the repository:

## License

Licensed under either of

 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

MIT is fairly permissive, so it's preferred by most, however it requires you to include the license in everything using the code. On the other hand, Apache doesn't have this issue, but is incompatible with GPLv2. A dual license gives users the freedom to choose a license of their choice.

Submitting nonexistent sites seems to crash the bot

I am afraid I succeeded in bringing the bot offline repeatedly just now by attempting to !!/report posts which had already been deleted.

The first attempt was an experiment where perhaps I should have been more careful; the second, reports which originally didn't succeed, and when I retried, some of them had been deleted in the meantime.

This could easily happen by mistake at any time, so the bot should be able to cope with this.

Chat link for context: http://codereview.stackexchange.com/a/146113/63322

Smarter tp/fp comments from metasmoke on blacklist request

See here for an example.

A user requested that a username should be blacklisted - which prompted the usual metasmoke comment. However - the result only checks for true/false positives in post bodies, rather than the username.

  • Fix comment
  • Fix link in PR

Patterns for catching known spam domains

I know that some of these are already in the system, but it was easier to include them here as some can be improved. These were compiled based on the master spam list by @FieryDragonLord

Here are a few patterns that catch most of the "Repair Toolbox" domains

filefix(er)?\.com
fix(.*)(file(s)?|tool(box)?)\.com
(recovery|repair)kit\.com
(repair|recovery|fix)tool(box)?\.com
\.repair

And some patterns for most of the "Wise Recovery" domains

fix1\.org
easyfix\.org
errorsfixer\.org
regeasypro\.com
recoverypro\.(com|net)
smart(pc)?fixer\.(com|net|org)
wiserecovery\.com
drivertuner\.(com|net)
official-drivers\.com
wisefixer\.(com|net|org)

These patterns can match most of the "Tenorshare" domains

-recovery\.(com|net)
passwordcracker\.(com|net)
-password\.net
(windows(7-)?)?password(s)?(-)?(recovery|reset)\.(com|net)
lost(windows)?password\.com
tenorshare\.com
(downloader|pdf)converter\.(com|net)

Some for the "iSpire/Wasel" domains

i-spire\.net
iwasl\.com
qobul\.com
unblockingtwitter\.com
bestcheapvpnservice\.com
openingblockedsite\.com
arabic(soft)?download(s)?\.com
(vpn|internet)?wasel(pro)?\.com
vpn(faqs|answers)\.com

Add !!/block and !!/unblock commands

With our recent chat trolls with SmokeDetector, we've found problems with !!/stappit. Since Undo has added an auto-restarting feature, !!/stappit doesn't work. Thus, Smokey keeps posting...

I think an easy solution to this would be to create !!/block {time} and !!/unblock commands. !!/block {time} would stop Smokey from posting until this given time (in minutes) is up. It could just be a simple check when posting, i.e. if isBlocked: return else: post. !!/unblock would just make it eligible to post again.

Is this a good solution to the problem? Feedback welcome.

Remove backoff messages

Is there any reason we can't remove these messages?

screen shot 2016-12-01 at 10 35 04 am

I can't think of any real action we take on them any more. Any actual violations of the backoff should be reported, but just receiving a backoff is a common thing.

Thoughts? Any reason we shouldn't stop posting these?

Please document how to add a new room

I was unable to find documentation for how to add a new room to the bot's configuration. This seems to be a recurring thing, so having a few sentences about how to request an addition could be a welcome addition to e.g. https://github.com/Charcoal-SE/SmokeDetector/wiki/Chat-Rooms

I can guess, but I don't think that qualifies me to actually write documentation.

A rough transcript of my guesses will be visible around http://chat.stackoverflow.com/transcript/message/32434109#32434109

Improvements to auto-blacklist PRs

I'd like to see if I can make the following improvements to the PRs created by non-code-privileged users when they use the !!/blacklist command:

  • Link to a MS search for the URL
  • Link the username to their chat profile
  • (if possible) automatically search MS and show the number of tps and fps for the site.

Different alert for apparent vandalism

I have repeatedly tripped on SmokeDetector reports which looked like spam or rude/abusive but which were self-vandalism, which is easily reverted and should not be flagged as spam.

Could the alert from the bot look different when it triggers because of an edit to a post which was previously fine?

For example, a chat alert message like

[ SmokeDetector ] Few unique characters in body, repeating characters in body, repeating characters in title, title has only one unique char: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa by Kot on stackoverflow.com (@tripleee)

... could have been more easily categorized correctly if it had a different chat message, perhaps something like

[ SmokeDetector ] Few unique characters in body, repeating characters in body, repeating characters in title, title has only one unique char: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa edited by Kot on stackoverflow.com (@tripleee)

Notice how the question's title is no longer a link, and the "edited" link instead links to the actual revision which introduced this vandalism.

Make SmokeDetector easier to run on an alternate account - Chat message prefix variables

I've been testing SmokeDetector more on an alternate account and would like to propose a modest change to make this easier for other developers.

I'd like to put this information in globalvars.py to make it easier for developers to quickly change this data.

  • Pull [SmokeDetector](https://github.com/Charcoal-SE/SmokeDetector)

By pulling this out of the multiple places it is in the code, we can allow users to link to their personal Repository and change the name from the official "SmokeDetector". This makes it easier to identify and if someone is curious, they can follow to an appropriate GitHub profile. It also removes the "authentic SmokeDetector" look that a test version may be running.

I'd suggest either pulling this into a single variable like chatmessage_prefix that contains the entire string, or two config variables one for bot_name and one for bot_repository.

This change would affect both ws.py and chatcommands.py

A similar variable may be needed for https://github.com/Charcoal-SE/SmokeDetector/commit/ links (in ws.py), but to me this isn't as important as the two listed above.

Regarding the blocking issue..

So what do you think about my suggestion? we can elaborate on a better approach if we bring the discussion further. I can code in C, C#, Python and so on.. there's no problem for me.

Creating a write-only team for Charcoal projects

I'm leaving this here because it seems like more people see our GitHub issues than they do chat messages.

This was an idea I had a couple days ago. What if we created another team on the Charcoal-SE organization, with write access to all of Charcoal's projects (as opposed to the admin access that the Core team have)? That allows us to give people who commonly contribute to the projects write access, instead of forcing them to rely on pull requests - which, let's be honest, aren't much fun when you're just waiting around.

I'm thinking of people like Kyll, Ashish, and angussidney, who have sent us the majority of our recent pull requests. There are probably more people I've missed out there, but that's the general idea.

Obviously, they'd need to be people (a) who we can trust, and (b) with privileges for Smokey. I'm not proposing adding just anyone who submits a PR; rather, people who have a track record of good contributions. Those people are a big resource for Charcoal; this would reduce the friction to them helping out.

Thoughts?

Clean up handle_commands discussion

Summary

The handle_commands function is a 500 line monstrosity. I'd like to discuss cleaning it up to do the following:

  • Standardize what is returned from handle_commands
  • Have individual functions for each command.

If we agree that this work is needed, I'll take on the work load over the next several weeks (translation: after my summer vacation) to do this work.

Clean up details

This section describes my plan for the above changes. I'd like to do both changes at the same time, but if opposition to that exists one change at a time is acceptable to me.

Standardize what is returned from handle_commands

Currently, the function has multiple return paths and multiple ways of returning data.

One example is in the !!/block command:

return False, "Invalid duration."

Another example is in !!/errorlogs

return "The !!/errorlogs command requires 1 argument."

These two methods lead to weird handling in the two locations that handle_commands is called from. (Location 1, Location 2)

Method of handling 1:

if type(result) != tuple:
    result = (True, result)
if result[1] is not None:
    if wrap2.host + str(message_id) in GlobalVars.listen_to_these_if_edited:
        GlobalVars.listen_to_these_if_edited.remove(wrap2.host + str(message_id))
    message_with_reply = u":{} {}".format(message_id, result[1])
    if len(message_with_reply) <= 500 or "\n" in result[1]:
        ev.message.reply(result[1], False)
if result[0] is False:
    add_to_listen_if_edited(wrap2.host, message_id)

Method of handling 2:

r = result
if type(result) == tuple:
    result = result[1]
if result is not None and result is not False:
    reply += result + os.linesep
elif result is None:
    reply += "<processed without return value>" + os.linesep
    amount_none += 1
elif result is False or r[0] is False:
    reply += "<unrecognized command>" + os.linesep
    amount_unrecognized += 1

These type of checks are unintuitive and difficult to follow. It also adds two sections of unnecessary code that perform in a similar, but not identical way.

Proposal

I propose that all return paths in handle_commands return the expected tuple of (boolean, string) and the two locations that call handle_commands do so like this, with appropriate variable names:

bool_result, string_result = handle_commands(...)

The block of code afterward can then be reduced and simplified by using descriptive variables instead of elements of a tuple.

Have individual functions for each command.

As handle_commands is currently written, all command processing is occurring in this function. Much of this logic should be moved to their own individual functions and handle_commands should be used to determine which function to call. Each block that looks like the following should have their own function:

if content_lower.startswith("!!/addblu")

Proposal

Move the logic of each command to it's own function. This keeps handle_commands cleaner and allows for certain areas of code to be reused

Should reports be edited on FP instead of deletion?

Currently, if a report is marked as False Positive, Smokey will try to delete the report to prevent accidental flagging.

However, on more than one occasion I have been wanting to see the post itself so I can make up my own mind/investigate/check for hidden spam etc. To do this, I have to go to metasmoke, click the post tab, find the post, go to the post page and click once more to view on the original site. Too much effort for a lazy person like me :)

Instead, I propose that Smokey should try to edit the report to the following:

[SmokeDetector | MS] (false positive - report deleted)

or something similar, so that the MetaSmoke link is still clickable.

This could be easily done by modifying line 983 of chatcommands.py. Should be low-hanging fruit.

Inconsistent environments in CI and readme.md

Issue

In .travis.yml:

pip install coverage phonenumbers regex==2015.11.22 beautifulsoup4 requests websocket-client pytest flake8 termcolor --upgrade

in circleci.yml:

pip install beautifulsoup4 requests websocket-client coverage pytest phonenumbers flake8 regex==2015.11.22 termcolor --upgrade

In readme.md:

sudo pip install pip --upgrade
sudo pip install beautifulsoup4
sudo pip install requests --upgrade
sudo pip install websocket-client --upgrade
sudo pip install phonenumbers
sudo pip install regex
sudo pip install termcolor

Diff:

  1. pip, requests, websocked-client are not upgraded in travis and circleci build
  2. termcolor is not upgraded in readme.md.
  3. regex version is frozen in travis build, but not in the instruction
  4. circleci and travis use the same list with different ordering. It's hard to check if the environments are actually equal.

Proposal

This can be solved by making requirements.txt and reusing it both in CI and manual installation.

Make SmokeDetector's code pass Flake8

Kevin Brown suggested to use something like Flake8 for the source checks. But in the current state, there are too many warnings to use it for actual tests. Hence, we should make SmokeDetector's code pass Flake8 (except warning E501, which means "too long line", and should not always be changed). /cc @kevin-brown

Add check_permissions decorator

Before I create a pull request for a feature I've already written, I want to make sure everyone is on board with the change.

The full change is available here (with a successful Travis CI build).


This change pulls the is_privileged check out of each of the commands and instead uses a decorator on the functions that should be restricted to privileged users.

Previously:

def command_add_blacklist_user(message_parts, content_lower, message_url, ev_room, ev_user_id, wrap2, *args, **kwargs):
    quiet_action = any([part.endswith('-') for part in message_parts])
    if is_privileged(ev_room, ev_user_id, wrap2):
        uid, val = get_user_from_list_command(content_lower)
        if uid > -1 and val != "":
            add_blacklisted_user((uid, val), message_url, "")
            return None if quiet_action else "User blacklisted (`{}` on `{}`).".format(uid, val)
        elif uid == -2:
            return "Error: {}".format(val)
        else:
            return "Invalid format. Valid format: `!!/addblu profileurl` *or* `!!/addblu userid sitename`."

New:

@check_permissions
def command_add_blacklist_user(message_parts, content_lower, message_url, ev_room, ev_user_id, wrap2, *args, **kwargs):
    quiet_action = any([part.endswith('-') for part in message_parts])
    uid, val = get_user_from_list_command(content_lower)
    if uid > -1 and val != "":
        add_blacklisted_user((uid, val), message_url, "")
        return None if quiet_action else "User blacklisted (`{}` on `{}`).".format(uid, val)
    elif uid == -2:
        return "Error: {}".format(val)
    else:
        return "Invalid format. Valid format: `!!/addblu profileurl` *or* `!!/addblu userid sitename`."

Using this decorator, we can write the functions to work without caring about whether the user has permission to access it. If we want to protect the function, add the decorator. If we do not, don't add the decorator.

add a command for tpu and delete

Looks like we haven't a command can run tpu and delete yet, I mean if I need make a message as tpu and delete it I need:

sd tpu
sd deleted

What about add a command like tp[u]d[-] or tp[u]del[-]?

Add command to show flagged posts that got not deleted yet (or post automatically)

Sometimes spam reports about smaller sites get less attention than they need, either if they're followed by many other reports or if only few people are online at the time.

I would suggest that Smokey should keep a list of all reported posts that got positive feedback or no feedback at all yet and are not yet removed from the site. A command like !!/pending would then show a list of all those reports that still need more flags or feedback. Example:

"Skin care tips" by "SpamUser" on webmsaters.stackexchange.com [MS] (reported 12 minutes ago, 1 tp, 0 naa, 0 fp, post score -3)
"Best essay writing service" by "Writer" on graphicdesign.stackexchange.com [MS] (reported 6 minutes ago, no feedback yet, post score -1)

This would be very helpful to make sure no reports slip through and to verify if anything needs more flags after a bunch of reports appeared without having to walk through the links manually.

Additionally, it might be useful to not only post this report on demand but also automatically for posts in the list that were reported more than e.g. 10 minutes ago.

Create unit tests for ws.py

We inadvertently break ws.py from time to time. With Travis, we should be able to catch most of these errors by having unit tests for ws.py.

Should we consider automatically flagging posts which hit more than one filter?

As we all know, the role of the SmokeDetector project is to detect and delete spam as fast as possible. Of course, the fastest way to delete spam is to have as many people flag it as fast as possible.

So, what if we made Smokey automatically flag posts which hit more than one filter, so that spam gets deleted faster? According to Undo, reports which hit more than one filter are pretty damn accurate:

  • At least two reasons: 22136 TPs, 424 FPs (~2% false positives)
  • At least three reasons: 13087 TPs, 24 FPs (~0.2% false positives)

When you compare that to the helpful flag percentage of a human (like me) of ~95%, Smokey is definitely accurate enough for us to consider automated flagging.

Of course, we can do more to make the auto-flags more accurate:

  • Increase the number of filters required before an auto-flag is cast
  • Exclude less accurate reasons from the count of filters hit, such as Link at End of Answer, Repeating characters in body etc.
  • Revert flags if FP feedback is provided (is this possible via the API? Maybe a Meta FR?)

However, there will be some things we need to think about before we do anything:

  • NAA posts need to be separated from red-flaggable posts
  • Do we need to separate spam and offensive posts (well, yes.... but will one incorrect type of red flag by smokey make much of a difference in comparison to 5 correct humans)?
  • Would SE be happy with us doing this? (I assume the answer would be yes, since they are already considering integration with us at our current accuracy, and with a little tweaking we can make auto-flags based on the number of filters more accurate than Andy's auto-comment-flagging, which everyone seems to be happy about)

But Angus, isn't that a huge amount of effort for just one extra flag? Surely it can't make much of a difference compared to our current flagger userbase?

Well, yes, in some ways, you are right. Most of the time, it won't make much of a difference, but it certainly will make spam get deleted faster, especially on smaller sites, where every flag counts.

Also, now that SE is starting to talk to us about the possibility of integration, I think it is important that we have a tried-and-tested way of identifying which posts are almost definitely spam, which we can present to them. Our overall false positive rate of 17% isn't good enough to be put into production on one of the biggest sites on the internet, so I think we should start work on a more accurate identification system so that it is ready for when SE needs it.


So, what does everyone else think? Please share your thoughts, ideas, and criticizms.

Of course, this is only an initial basic idea. Please don't nitpick at the specifics - if (mostly) everyone agrees to this, we can iron out any details in another discussion. For now, let's just discuss whether we want to put some serious effort into this.

New filter: website resembles username

E.g. for these kind of spam posts, which go undetected quite often or
https://metasmoke.erwaysoftware.com/post/52946
https://metasmoke.erwaysoftware.com/post/52841
https://metasmoke.erwaysoftware.com/post/51936

Procedure: replace spaces in username by \W? and check if there's a link in the post which contains that string.
There are some users with 3 character usernames which have a chance of accidentally triggering the filter. Maybe this should only work for usernames above a certain length.

Use named arguments for data-passing

As Normal so wisely suggests:

seeing these on separate lines suggests further improvement (for the future): naming these arguments. It's a long list with some "False, False" in the middle, and the only way to know what these do is to read the code in another module.

At some point in the future, we should use named arguments or something similar; especially as this grows and more data is flying around.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.