foxxmd / context-mod Goto Github PK

an event-based, reddit moderation bot built on top of snoowrap and written in typescript

License: MIT License

TypeScript 84.09% Dockerfile 0.28% CSS 0.24% EJS 14.11% JavaScript 1.10% Ruby 0.04% HTML 0.14%

reddit reddit-bot typescript reddit-application bot

context-mod's Introduction

title	nav_order
Home	1

ContextMod

Context Mod (CM) is an event-based, reddit moderation bot built on top of snoowrap and written in typescript.

It is designed to help fill in the gaps for automoderator in regard to more complex behavior with a focus on user-history based moderation.

An example of the above that Context Bot can do:

On a new submission, check if the user has also posted the same link in N number of other subreddits within a timeframe/# of posts

On a new submission or comment, check if the user has had any activity (sub/comment) in N set of subreddits within a timeframe/# of posts

In either instance Context Bot can then perform any action a moderator can (comment, report, remove, lock, etc...) against that user, comment, or submission.

Feature Highlights for Moderators:

Complete bot autonomy. YAML config is stored in your subreddit's wiki (like automoderator)
Simple rule-action behavior can be combined to create complex behavior detection
Support Activity filtering based on:
- Author criteria (name, css flair/text, age, karma, moderator status, Toolbox User Notes, and more!)
- Activity state (removed, locked, distinguished, etc...)
- State of Subreddit Activity is in Subreddit (nsfw, name, profile, etc...)
Rules and Actions support named references -- write once, reference anywhere
Powerful logic control (if, then, goto)
Delay/re-process activities using arbitrary rules
Image Comparisons via fingerprinting and/or pixel differences
Repost detection with support for external services (youtube, etc...)
Event notification via Discord
Web interface for monitoring, administration, and oauth bot authentication
Placeholders (like automoderator) can be configured via a wiki page or raw text and supports mustache templating
Partial Configurations -- offload parts of your configuration to shared locations to consolidate logic between multiple subreddits
Guest Access enables collaboration and easier setup by allowing temporary access
Toxic content prediction using moderatehatespeech.com machine learning model

Feature highlights for Developers and Hosting (Operators):

Server/client architecture
- Default/no configuration runs "All In One" behavior
- Additional configuration allows web interface to connect to multiple servers
- Each server instance can run multiple reddit accounts as bots
Global/subreddit-level caching of Reddit APIs responses and CM results
Database Persistence using SQLite, MySql, or Postgres
- Audit trails for bot activity
- Historical statistics
Docker container and docker-compose support
Easy, UI-based OAuth authentication for adding Bots and moderator dashboard
Integration with InfluxDB for detailed time-series metrics and a pre-built Grafana dashboard

How It Works
Getting Started
Configuration And Documentation
Web UI and Screenshots

How It Works

Each subreddit using the RCB bot configures its behavior via their own wiki page.

When a monitored Activity (new comment/submission, new modqueue item, etc.) is detected the bot runs through a list of Checks to determine what to do with the Activity from that Event. Each Check consists of :

Kind

Is this check for a submission or comment?

Rules

A list of Rules to run against the Activity. Triggered Rules can cause the whole Check to trigger and run its Actions

Actions

A list of Actions that describe what the bot should do with the Activity or Author of the activity (comment, remove, approve, etc.). The bot will run all Actions in this list.

The Checks for a subreddit are split up into Submission Checks and Comment Checks based on their kind. Each list of checks is run independently based on when events happen (submission or comment).

When an Event occurs all Checks of that type are run in the order they were listed in the configuration. When one check is triggered (an Action is performed) the remaining checks will not be run.

Learn more about the RCB lifecycle and core concepts in the docs.

Getting Started

Operators

This guide is for users who want to run their own bot on a ContextMod instance.

See the Operator's Getting Started Guide

Moderators

This guide is for reddit moderators who want to configure an existing CM bot to run on their subreddit.

See the Moderator's Getting Started Guide

Configuration and Documentation

Context Bot's configuration can be written in YAML (like automoderator) or JSON5. Its schema conforms to JSON Schema Draft 7. Additionally, many operator settings can be passed via command line or environmental variables.

For operators (running the bot instance) see the Operator Configuration guide
For moderators consult the Subreddit Configuration Docs

Check the full docs for in-depth explanations of all concepts and examples

Web UI and Screenshots

Dashboard

CM comes equipped with a dashboard designed for use by both moderators and bot operators.

Authentication via Reddit OAuth -- only accessible if you are the bot operator or a moderator of a subreddit the bot moderates
Connect to multiple ContextMod instances (specified in configuration)
Monitor API usage/rates
Monitoring and administration per subreddit:
- Start/stop/pause various bot components
- View statistics on bot usage (# of events, checks run, actions performed) and cache usage
- View various parts of your subreddit's configuration and manually update configuration
- View real-time logs of what the bot is doing on your subreddit
- Run bot on any permalink

Bot Setup/Authentication

A bot oauth helper allows operators to define oauth credentials/permissions and then generate unique, one-time invite links that allow moderators to authenticate their own bots without operator assistance. Learn more about using the oauth helper.

Operator view/invite link generation:

Moderator view/invite and authorization:

A similar helper and invitation experience is available for adding subreddits to an existing bot.

Configuration Editor

A built-in editor using monaco-editor makes editing configurations easy:

Automatic JSON or YAML syntax validation and formatting
Automatic Schema (subreddit or operator) validation
All properties are annotated via hover popups
Unauthenticated view via yourdomain.com/config
Authenticated view loads subreddit configurations by simple link found on the subreddit dashboard
Switch schemas to edit either subreddit or operator configurations

Grafana Dashboard

Overall stats (active bots/subreddits, api calls, per second/hour/minute activity ingest)
Over time graphs for events, per subreddit, and for individual rules/check/actions

License

MIT

context-mod's People

Contributors

Stargazers

Watchers

Forkers

coreyclip jfarseneau recaptime-dev rysie wchristian barterclub monerobull mhfdoge greenpixelmedia font21 quentinwolf

context-mod's Issues

Implement basic actions

Need to make action classes actually do something

Add footer notice to content actions

If the bot performs an action that submits content (comment, report) it should contain a footer saying:

It's a bot
Link to the bot subreddit
Do not reply/DM directly to the bot for issues -- contact subreddit mods if sub issue, bot subreddit if technical issue

Run bot on permalink should open a popup

Either a modal or a window popup should open with the results. Currently there isn't really any user feedback for submitting a link apart from the link disappearing.

Toolbox usernote integration

Would like to integrate with toolbox usernotes to enable mod interaction tracking:

Integration requirements

Opt-in using subreddit configuration
Read (decode. uncompress) notes and store in cache
Write (encode, compress) notes and invalidate cache

Implementation

Add additional criteria to author filter/rule. Usernote criteria should be based on warning type, text, and count of notes (and/or some combination of all of the above)
Add new Action to add usernote to Author

allow saving wiki changes

Currently you can load but not save the config from the editor.

Improve error handling at each level of operation

Need to encapsulate rule/action running in try-catch so they can fail but application can continue

Easy cloud deployment options

Implement cloud templates so new users can deploy a working bot without having to figure out infrastructure.

Heroku
EC2
Digital Ocean

"Depleted in" should have the window listed

"Depleted in ~2 hours".

Does 2 hours signify an issue?
Is 2 hours a good thing?

Add 3rd party notifications when check runs

Would be cool to get Discord webhooks or something so that user can notified, outside of reddit, when a critical check runs

Add missing actions

Approve
Ban user

Refactor logger labelling to be less cumbersome and more descriptive of context

Right now would need to pass all prefix content down from config builder all the way to action/rule if I want to see the context for some log statement on write. Need to refactor this so a logger can be created from a pool or something with prefix content already built. Basically make it more stateless to create a logger.

Can't access "Actioned Events"

When clicking "Actioned Events" I get this error.

allow nuking a user

It'd be nice to not only be able to check a permalink but a whole user. This would be great for new spam as we could add the new spam rule and then run it against that whole account removing all their posts.

Remove node-canvas from docker

Current dockerfile dependencies stem from when CM was using node-canvas for image comparison. It has been replaced by sharp, which only depends on libvips and may already be built into alpine...need to test what can be removed.

Announce bot on reddit

TODO before public announcement:

Setup informational post on /u/ContextModBot
Setup subreddit for satellite support/feedback
Create issue template on github

Places to post announcement:

r/botwatch
r/bots (may be dead)
r/bot
r/ModSupport
- precedent 1
- precedent 2
r/RequestABot
r/modhelp
- plenty of threads where people are asking for bot suggestions/help so seems appropriate
Per this suggestion
- r/thesefuckingaccounts
- u/blogspammr
Share on Reddit Mods discord server in #automation

Implement media-based count rule

Reddit returns good information about Submissions with media links (youtube, vimeo, etc.)

Taking a page out of toolbox's playbook:

aggregate an author's submission history on secure_media.oembed.author_url
use large default window (100 or 200 submissions)
allow trigger on threshold of author_url count (variant with useSubmissionAsReference)

allow filtering web log

I'd be great to allow both filtering by a string/regex and enabling the types of messages being output for example.

Stuck while trying to deploy to heroku

Used the heroku link and then provided it the needed envs.

The site shows a heroku error and when trying to connect via heroku run bash it never connects.
It's just forever showing Running bash on ⬢ context-mod... / connecting, run.4890 (Hobby).

Any ideas if this is a heroku issue or something with the deploy?

Add api documentation

Once api is stable document endpoints and provide samples for authentication/usage

Implement rule to check for temporal patterns in an author's history

Originally suggested as being able to check for a hiatus/gap here by u/SillyStranger5009

Some examples of things that could be checked (to build a rule against):

Within a date range if there are been a gap in author activity
Within a date range find the baseline for frequency of activity and...
- if there is a std deviation or percentage increase below or above the baseline
- if the baseline frequency matches a user-defined value

How to apply?

Allow specifying these sets of behaviors as criteria with AND/OR operands
Allow specifying if these events must occur 1 or N times (per each or in total)
Allow white/blacklisting subreddits to count events from

Implement image fingerprint database for detecting reposts

May be worthwhile to replicate repost sleuth bot on a (much) smaller level IE subreddit/bot level.

Things that would need to be or need to be considered...

Support more than just redis for database?
Definable (but optional) collections EX known spam, retired memes, all subreddit posts, etc...
UI requirements
- Upload image by file or URL
- Batch upload from local directory?
- Progress indicator for processing
- Stats (number of fingerprints, search peformance)
Rule refactoring to allow searching all, by collection, and/or user history
user-configurable if database should be shared across bot/instance

Show a more descriptive error when request fails due to invalid scope auth response

CACHING=redis doesn't effect subs

Enabling redis caching while in monolith mode I had assumed without a sub/op config the app would take this env and also cache reddit posts/comments but checking redis I'm not seeing anything but web sessions.

Figure out why JSON Schema Viewer won't show examples

Not really context bot specific but it would be super helpful if the viewer also used/displayed the examples I've annotated. Will probably need to run the viewer locally and debug.

Use redis via env?

Is there an env to use redis? I had assumed PROVIDER_STORE=redis would work.

Reduce memory consumption and increase performance for image comparison

Determine memory usage for individual image (both raw and loaded into resemble)
Determine what impact converting image to smaller size has on cumulative memory usage, comparison speed, and general cpu usage
Potentially refactor image comparison object usage (repeat/recent reference objects) based on results from above

Write MVP documentation

Create "getting started" and "starter" configurations for new users

Create some well-documented common configurations that be used as a teaching tool as well as a valid jumping off point for new users.

Add basic rule and action examples
Add advanced concepts guide (caching, ordering, etc.)
Add at least one yaml example
Add full markdown templates to show off advanced mustache features
Add footer example
Add partials example once its implemented
Add at least one complete subreddit example (config, templates, really world config values...)
Add a "Getting Started" document to run through all of the above plus deployment

Implement author include/excludes on rule runs

Implement run() in abstract class to handle run/skip based on include/exclude criteria

Implement image comparison

When

Using attribution, repeat, or recent activity
AND using submission as reference
AND submission is an image

Implement a way to compare submission image to images from submissions in history.

Notes:

What library?
- Resemble.JS
  - Requires node-canvas binary
- Rembrant.JS
  - Requires node-canvas binary
- pixelmatch
  - Can't compare images with different dimensions
- matches-subimage
  - Based on pixelmatch but can handle different sizes
Need to determine if submission URL is an image
- MIME type of downloaded resource?
- By extension ending?

Pull Comment Action content from wiki page

Implement alternate property to allow pulling content from a wiki page.

uncaught exception Reddit returned a 404 for user history

2021-10-20T22:43:49+00:00 warn   : ~u/username~ {r/subreddit} [COM ID] [CHK low xp comment spam] Running rules failed due to uncaught exception Reddit returned a 404 for user history. Likely this user is shadowbanned.
SimpleError: Reddit returned a 404 for user history. Likely this user is shadowbanned.
    at Object.getAuthorActivities (CWD/src/Utils/SnoowrapUtils.js:92:19)
    at async cacheVal.cache.wrap.ttl (CWD/src/Subreddit/SubredditResources.js:340:24)

Create docker

So i can run it on the server

Implement detecting comment reply

If a submission or comment already has been replied to (at top level) from a moderator (or automod) then we don't want to also perform actions on it as we can assume it's already been manually actioned.

This shouldn't be an issue if the bot is using a fast poll time but want to cover all the bases.

Implement rule to check comments against top comments from other sources

Based on the description of karma farming from this thread.

Comments

Check the content of a comment activity against a list of "top" text comments based (and retrieved from) on the submission source:

if submission is external attempt to get comments using some scraping method:
- implement youtube api call to get top comments
  - https://www.googleapis.com/youtube/v3/commentThreads?key=${API_KEY}&textFormat=plainText&order=relevance&part=snippet&filter=snippet&videoId={VIDEO_ID}&maxResults=100
- implement twitter api to get top replies
if submission is reddit-based
- check for cross-posted submissions and get top comments
- search for other submissions with the same title and get top comments
- allow user-defined white/blacklist for subreddits to include submissions from

For detecting a match:

use fuzzy searching with user-defined threshold for sameness
allow user-defined minimum character count

Submissions

Do a reddit search for submission title and use fuzzy searching with user-defined threshold for sameness
- Allow user-defined white/blacklist for subreddits search

cache retrieved comments using a unique id based on submission source
Restricting subreddit search
- Define subreddits to search with this syntax: https://reddit.com/r/mealtimevideos+videos/search?q=${QUERY}&restrict_sr=1
- For blacklist just filter subreddits out of returned results
Search by submission url
- Remove query string from submission url IE remove ?someQueryParam=... since reddit seems to only search by base url
- Use url token in query to search by url IE .../search?q=url:${BASE_URL}

Add caching documentation

There are some good pointers in the ui but would be good to document in a readme along with hierarchy/tuning/etc.

Implement age comparison for comment/submission state

[WEB] Run/Dry run should be buttons

These should be buttons as it doesn't take you to another page which is what a href aka link is for.

Implement "public" rule result summaries

Right now all rule result summary data is uncensored IE thresholds, exact windows/totals are revealed in the text.

Need to implement user-facing summaries that redact specific, critical values so that spammers/bad-actors can't determine how to fly under the rader on triggering rules OR make sure there is enough raw data in the rule results for mods to build it themselves.

Rolling Avg: ~null/s

Noticed this in my logs.

Checks 1 | Rules => Total: 1 Unique: 1 Cached: 0 Rolling Avg: ~null/s | Actions 0

<html> element does not have a [lang] attribute

If a page doesn't specify a lang attribute, a screen reader assumes that the page is in the default language that the user chose when setting up the screen reader. If the page isn't actually in the default language, then the screen reader might not announce the page's text correctly. Learn more.

Provide more variables to contextual data in RuleResult that can be used for templating
On each rule document what data is available (using annotations so its available in schema)

Implement mustache partials in rules

So that users can pre-define mustache rendering fragments for a rule

Add a partials property to rule json that is parsed on startup
Allow partials property to be a string (one partial, name of rule) or object so many partials can be defined per rule
Partial can be string to render or wiki: discriminator that retrieves wiki page contents that should then render
Enforce partial name uniqueness

Improve json schema documentation

Add missing property annotations/comments
Add default values
Refactor interfaces to consolidate repeated properties (will make documentation easier going forward)

Implement configurable delay before processing

A mod may want any other bots processing activities to run before RCB. This could be achieved by:

delaying activity processing after initial retrieval
refreshing activity state after delay before continuing processing

move all css to local

Currently cdnjs.cloudflare.com is used for some CSS.