kockaadmiralac / kockalogger Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 9.0 987 KB

Parses IRC logs of activity across Fandom, then relays it into a Discord channel, searches for spam/vandalism and more.

License: GNU General Public License v3.0

JavaScript 99.39% Shell 0.61%

discord irc-bot slack wikia

kockalogger's People

Contributors

Stargazers

Watchers

Forkers

ladyfurude tokina8937 east-6 universal-omega gustavoicaro shirra2006 soap-team miss-toki nynele

kockalogger's Issues

newusers: Slash command for viewing report history

Description

Misclicks happen, and sometimes we mark something we didn't want to as spam or not spam. There should be a slash command which allows us to check recent classifications, so we don't have to guess who was the user we mistakenly marked as not spam (users marked as spam are technically already logged in the staging channel).

Proposed solution

A /history command which shows last 10 classified profiles ordered by their classification date, which allows paging through the reports 10 by 10. It should only display profile links and their classification status.

Alternative solutions

A report logging channel like the one KockaBot had.

Prettier display of article/blog comments

Description

The logger module displays article/blog comments pretty ugly, in the form of their full page names.

Proposed solution

The page names are parsed with the purpose of displaying their parts separately.

X posted a comment on Y (Talk:Y/@comment-X-t)
X posted a comment on Y's blog: "Z" (User blog comment:Y/Z/@comment-X-t)
X replied to Y's comment on Z (Talk:Y/@comment-X-t/@comment-Z-t)
X replied to Y's comment on Z's blog: "A" (User blog comment:Z/A/@comment-X-t/@comment-Y-t)
Some HTTP requests might be required in order to fetch usernames of users who posted the comments.

Link directly to replies in thread links

Description

Instead of linking to the exact reply on a thread, we are linking only to the thread itself in the logger module.

Expected behavior

Thread links should link to Thread:ID#reply.

Notes

Unless a good way of doing this is found, making thread links link to replies means making at least one HTTP request per ns:1201/2001 edit.

Make KockaLogger into an npm module

Description

Other projects whose source code is private (like WikiaSpam) or too niche to be included in KockaLogger's repository (like the possibly upcoming Scribunto CI for Dev Wiki) may need integration with KockaLogger for easier and more reliable parsing. The current setup has a few issues with being made into an npm module, such as KockaLogger modules always being loaded from the modules folder, that should be looked into.

Proposed solution

Create an index.js file that exports required KockaLogger classes for public use. This includes all files from include/ and parser/ directories, as well as the Loader and Module classes.
Make sure all directories are configurable and that the location of KockaLogger's data files isn't taking the location of the module requesting it into account.
etc.

Alternative solutions

Just create an integration of KockaLogger with Lux and create a Lux service that uses the KockaLogger service for receiving and parsing messages. This also makes requesting interest information an asynchronous process instead of synchronous, further complicating KockaLogger client's structure.

Documentation

Description

Both usage documentation and JSDoc should probably be made for KockaLogger. Even though very little people aside from me (so far 1) use KockaLogger, knowing how it works might be of use to its users.

Proposed solution

Create a docs directory with separate Markdown files for each module that needs to be configured.

Alternative solutions

Document KockaLogger somewhere on FANDOM, or on a secret wiki.

More Discussions AbuseFilter information

Description

As of 18f92ea, KockaLogger has basic support for the Discussions AbuseFilter, as it's still a feature in development and not yet available anywhere. Its format is basic and users would have to visit the hit examination page to figure out what actually happened (unless the post snippet appeared to make it very obvious).

Proposed solution

DiscussionsMessage#fetch would have to be expanded to fetch filter information.

Description

Since 30 March 2021, entries from posting/editing/reporting/deleting comments do no longer appear for unknown reason. And for now the bug is still ongoing. Would it be fixed the problem if the issue is from Logger ? Thank you.

Expected behavior

Entries should appear like <username> (t|c) posted a comment (<size>) on <PAGENAME> for example https://discord.com/channels/244398471044399106/244399776315998208/826410811751792660 (WW)
TokihikoH11 (t|c) protected User:TokihikoH11/RCLogger test ‎edit=autoconfirmed (expires 21:22, 15 November 2021 (UTC)) to (Testing) but User:TokihikoH11/RCLogger test ‎edit=autoconfirmed (expires 21:22, 15 November 2021 (UTC)) appears to be blue like a link but only User:TokihikoH11/RCLogger test should appear in blue (see https://discord.com/channels/499291143201095700/501465557569372160/909916112207085568 - Vocaloid Wiki server)

Use TypeScript

Description

While our JSDoc tries to provide the most out of completion features, TypeScript can provide better type safety. Moving forward, KockaLogger should probably be rewritten in TypeScript at some point.

newusers: Spam prediction

Description

Now that the profile classification results are stored in a database, we can use them as a dataset for a machine learning model that can predict whether a profile is spam or not. We can use prediction results to mark profiles with high probability of being spam, and when it receives high enough accuracy (or whatever other metric we decide to look at) use it to auto-report spam profiles to SOAP.

Proposed solution

This task is for tracking the initial implementation of a machine learning model which can be trained on the existent database and achieve good enough results. The procedure is as follows:

Data Collection: Wait for the dataset to grow large enough. As of writing this, there are about ~1000 classified spam profiles and ~6000 classified non-spam profiles, and the system has been running since August 17 (25 days), which is 40 spam profiles per day and 240 non-spam profiles per day. At this rate, there should be about 10000 profiles in about a year. (I'm not sure if we really need to wait that long.)
Feature Extraction: Decide which features from the dataset to use in the model. Regardless of whether we use a neural network for the model or not, most of the profile data we have is in string form which somehow needs to be transformed before being fed into the model.
Training: Create a model and train it on the dataset. Try several different approaches and parameters and see which work best.
Integration: Load the trained model into KockaLogger and show prediction results in the reports channel, putting a mark on those predicted as likely spam (above a certain threshold of certainty.

Notes

I'm not that skilled in machine learning at the time of writing this issue.

Memory usage

Memory usage needs to be checked and logged.

Complete replacement for #cvn-wikia

Description

See the cvn module.

Diagram

Possible link bug in non-English format on English wikis

Description

As the Thread: namespace is supposed to be translated in JSON files, it may lead to thread links not working when the wiki is English but the format is not.

Reproduction steps

Steps to reproduce the behavior:

Translate the Thread: namespace in one of the non-English JSON files.
Set the format language to English on a non-English wiki in the logger module configuration.
Click on a thread link.

Expected behavior

It works.

Notes

It shouldn't be translated at all but auto-generated from namespace names of the wiki.

Log some bot edits

Description

Sometimes I see a vandalizing edit, go to revert it, and find that SOAP Bot already took care of it, but avoided logging by being a bot.

Proposed solution

Add bot edits to the logger input, but filter out bots other than SOAP Bot.

Alternative solutions

Throttle bot edits from logging more than, say, five per ten minutes regardless of which bot it is.

Notes and media

Dead channel detection

Description

We need to know when no activity has been received in an unusual period of time from a certain channel.

Proposed solution

Have messages of every type clear a timeout which if finished sends us a notification (after the channel goes back up the timeout is restored).

Support delete_redir

happens when you delete a redirect by overwriting it when moving a file to it

Unexpected random errors in IRC client

Description

KockaLogger's IRC connection is one of the most important components of KockaLogger as without it everything else breaks. As such, all issues relating to it are to be tracked:

[11-03-2019] Closing Link: 195.18.217.34 (Ping timeout: 180 seconds) was received in an IRC error message and the bot's IRC client reconnected (presumably after stopping the relay of messages for ~180 seconds beforehand).
[13-03-2019] ENOTFOUND was received when connecting to the IRC server several times over two minutes (2-3 second intervals). It is unknown whether this was an issue on Fandom's or bot's side.
[15-03-2019] ECONNREFUSED received from the IRC server several times over a few hours. The bot also recovered from this.
[27-03-2019] ENOTFOUND twice during the day.
[03-04-2019] ECONNREFUSED around 9 AM CST.

Reproduction steps

No reproduction steps available.

Expected behavior

These issues should be minimized if possible.

Parsing failures due to trimmed messages

Description

IRC message limit is 512 bytes. This is usually okay for messages written in English language but a lot of messages from Greek, Russian and ქართული wikis are failing to parse due to their messages being trimmed without an overflow. There's nothing to connect those to so their log actions fail to parse.

There is also no known way to detect whether a parsing failure was caused by this or by a legitimate issue that needs to be fixed.

Reproduction steps

Steps to reproduce the behavior:

Run KockaLogger for some time.
See parsing errors from Russian, Greek or Georgian wikis.

Expected behavior

KockaLogger should either not report those if they can't be fixed or fix them somehow.

Clean up everything on SIGINT

Description

As of v1.1.5, KockaLogger handles SIGINT. However, it doesn't clean up all resources hence the need for a forced process.exit();. All components of KockaLogger that need it should be given a chance to clean up their resources for a clean shutdown.

Proposed solution

HTTP requests that are currently running should preferably be waited on to finish.
- That most likely means they need to be tracked somewhere in IO and that IO#close is required.
A console loader should be shown with the amount of left callbacks that need to be called/requests that need to finish.
If there is something wrong and KockaLogger isn't shutting down after a minute, report an error and force a shutdown.

Discussion paragraph tags

Description

When a reply to a /d thread is shown, it includes the <p> (and sometimes the ending </p> too).

Proposed solution

Trim out the <p> (and maybe <p>) tags if they exist.

Notes and media

This only seems to happen for the original post, as the replies don't have this issue.

logger: Wiki info fetching does not report proper errors

Description

Certain wikis are throwing 404 errors and I am unable to tell which ones.

Reproduction steps

Steps to reproduce the behavior:

Add a nonexistent wiki to logger module configuration
Start KockaLogger
You cannot see which wiki is it caused by

Expected behavior

Better error messages which tell you exactly what's the issue with info fetching.

Handle "Created page with" summaries

Description

"Created page with" in edit summaries becomes very repetitive after some time.

Proposed solution

Messages like this make it more readable:

KockaAdmiralac created A (edit summary)
KockaAdmiralac created A with content <summary>

Pre-filtering interested messages in logger module

Description

The logger module is telling the client it's interested in messages that are going to be filtered out by the filters afterwards, causing KockaLogger to make unneeded API/Redis requests.

Reproduction steps

Steps to reproduce the behavior:

Set up a Discussions-only logger for a wiki.
Set up an HTTP request logger.
Create a page on the wiki.
Check whether an API request was made to fetch the page title.

Expected behavior

There should be no interest shown in messages that aren't going to be shown.

Notes

Doing filtering inside interested() may create different issues due to titles not being the page's actual titles before the title information is checked. A different configuration option may need to be set up for this.

newusers: Don't report users if their report is already on the page

Description

Reporters can report spam directly to R:S without going through KockaLogger. KockaLogger disregards these reports and duplicates them on R:S. This should not happen.

Proposed solution

KockaLogger checks the page content of R:S before reporting users.

Unneeded API/Redis requests for talkpage title checking

Description

Page title fetching from API was introduced as a method of obtaining the actual page title in case of article comments being submitted on an article whose title is matching article's title in the IRC log but the actual title is much different. Same issue occurs for threads but for a different reason.

However, comments and threads cannot be created for talk namespaces, yet they are being fetched from the API as well. Making KockaLogger skip title checking for talk namespaces is going to save some HTTP requests.

Reproduction steps

Steps to reproduce the behavior:

Set up some HTTP request logging.
Open WikiaRC.
Create a talk page on a wiki with a logger set up.
Check if there's an HTTP request made to fetch the title of that talkpage.
Check if the title logged in WikiaRC is matching the title of the edited page.

Expected behavior

The HTTP request isn't made due to the title in WikiaRC already matching the page's title.

Notes

This needs further testing for verification.
This might require more advanced namespace detection to be implemented instead of basing namespace detection on namespace names fetched from the API.

newusers: Retry mwn login failures

Description

mwn happens to throw login failures while moving reports.

Reproduction steps

Steps to reproduce the behavior:

Move reports for several days
mwn suddenly throws one of the following errors:

{
    "code": "mwn_failedlogin",
    "info": "Login failed",
    "response": {
        "login": {
            "result": "WrongToken"
        }
    }
}
{
    "code": "mwn_failedlogin",
    "info": "Already logged in as ***@***, logout first to re-login",
    "response": {
        "login": {
            "result": "Aborted",
            "reason": "Cannot log in when using MediaWiki\\Session\\BotPasswordSessionProvider sessions."
        }
    }
}

Expected behavior

Just retry the login after a delay.

Bug report | Windows Redis Issue

Description

Redis can not be startable and npm start command fails because of Redis not found.

Reproduction steps

Steps to reproduce the behavior(Windows only):

Install packages.
Start the program with corrent config.

Expected behavior

Redis should up and no errors happen about Redis

Media

Notes

Windows Server 2012 R2 with Redis(Windows Version) Installed.

Missing users in `newusers`

Description

It appears that newusers tries to fetch user information from the API

Reproduction steps

Steps to reproduce the behavior:

Watch KockaLogger output for several hours
See a lot of HTTPErrors leading with a message from the newusers module
Check the username within the Redis key logged in the error message
It doesn't exist

Expected behavior

KockaLogger doesn't log five kilometers (3.10686 feet) of logs and doesn't try five times to retrieve user information just to report a user no longer exists. Maybe the user renamed in the span of 30 minutes since they created their account?

Notes

KockaLogger could also record user ID instead of username to prevent this from happening, but as the feature is supposed to catch spambots, and spambots are unlikely to rename their account shortly after creation, this probably isn't needed.

Regenerating regular expressions blanks JSON files

Description

When run with --generate command-line option, KockaLogger blanks the JSON files instead of generating i18n regular expressions from them.

Reproduction steps

Run node main.js --generate
Run node main.js

Expected behavior

The JSON files are properly saved.

Notes

Generation itself works fine.

Support new log types

Description

In 3f2a988, we introduced the IGNORED_LOGS constant which prevents relaying errors related to certain logs which KockaLogger does not support. Ideally, if these logs are general-purpose enough, we should support them. Some logs I can see that seem general purpose enough are:

review
cargo
import (Special:Import)
interwiki (Special:Interwiki)
merge (Special:MergeHistory)
contentmodel (Special:ChangeContentModel)

Proposed solution

We check which messages are being used for these logs and add them to messages/map.js, parser/log.js etc.

Gamepedia links include /wiki/

Description

Gamepedia wikis have their $wgArticlePath set to /$1.

Reproduction steps

See any logs by KockaLogger on a Gamepedia wiki.
Click the link leading to a page.

Expected behavior

The links don't include /wiki/. This could probably be achieved by having a single method that creates page links that can check for the wiki domain. (util.pageUrl?)

newusers: Only allow slash commands from specific channels

Description

Slash commands can currently be used from any channel on a server. As we don't want people to be using these randomly, we should probably restrict them only to channels which contain information on what these commands do.

Proposed solution

Restrict slash commands for profile reporting-related commands to the reports and staging channels, and display a message when used outside of them.

Unescaping 
 in thread titles

Description


 needs to be unescaped in thread titles.

Reproduction steps

https://undertale.fandom.com/wiki/Thread:153379

Expected behavior

An empty space appears.

Bug report: usernames with ` cause display issues

Description

Usernames with multiple backticks are not escaped, causing some issues.

Example user: https://community.fandom.com/wiki/User:Hapie2cu%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60%60

Expected behavior

Backticks need to be escaped

Media

Tests

Description

Regression tests, unit tests, integration tests, whatever. It becomes a pain having to manually test whether I screwed something up during a rewrite.

Proposed solution

Mocha

Alternative solutions

Chai

Notes and media

Discussions thread titles are not shown on replies

Description

While this is a Wikia issue, a similar caching technique can be applied as to thread titles to get them to show up.

Proposed solution

When a thread with a title is posted, cache the thread title associated with the ID in cache. If a reply to that thread is made, take the title from cache.

If the thread title isn't in cache, use the Service API to fetch the title.

Separate log levels which get logged to Discord from those which do not

Description

I would like to log debug-level messages from newusers which contain information about user reports and report moving, so I can retrieve users from these logs on special occassions. However, I don't want them logged to Discord, as they are not errors.

Proposed solution

KockaLogger's logger should let me specify which log level is being logged to Discord.

Alternative solutions

Separate loggers which log debug and non-debug information in newusers.

Logger reporting to Discord causes fatal errors in KockaLogger

Description

The logger's logging methods are technically asynchronous, but no component of the system calls them as such. This means Discord backend errors can cause KockaLogger to crash due to uncaught promise rejections.

Reproduction steps

Steps to reproduce the behavior:

Wait for Discord backend to get unstable
Get KockaLogger to report an error at that time
See how it crashes and restarts

Expected behavior

It doesn't crash.

Vandalism module improvements

Description

Logged-in user vandalism isn't being caught unless they hit an edit summary filter.
Diff link changes place all the time.
Movepage and upload vandalism isn't being caught.
- This includes cross-namespace moves.
Wiki whitelist will whitelist wikis with the same domain but in different languages (so if community as in community.fandom.com is whitelisted, community.fandom.com/es will also appear whitelisted).

Proposed solution

Make logged-in users appear in the log if they do the same things as anonymous but X times in an Y interval.
Make a completely new format for the vandalism module.
Handle more logs.
Concatenate the domain, subdomain and language in a way that makes sense before checking in the whitelist.

No need to strip Markdown

Discord fixed their Markdown escaping so we do not need to strip Markdown from edit summaries (as much).

kockaadmiralac / kockalogger Goto Github PK

kockalogger's People

Contributors

Stargazers

Watchers

Forkers

kockalogger's Issues

Description

Proposed solution

Alternative solutions

Description

Proposed solution

Description

Expected behavior

Notes

Description

Proposed solution

Alternative solutions

Description

Proposed solution

Alternative solutions

Description

Proposed solution

Links

Description

Expected behavior

Description

Description

Proposed solution

Notes

Description

Diagram

Description

Reproduction steps

Expected behavior

Notes

Description

Proposed solution

Alternative solutions

Notes and media

Description

Proposed solution

Description

Reproduction steps

Expected behavior

Description

Reproduction steps

Expected behavior

Description

Proposed solution

Description

Proposed solution

Notes and media

Description

Reproduction steps

Expected behavior

Description

Proposed solution

Description

Reproduction steps

Expected behavior

Notes

Description

Proposed solution

Description

Reproduction steps

Expected behavior

Notes

Description

Reproduction steps

Expected behavior

Description

Reproduction steps

Expected behavior

Media

Notes

Description

Reproduction steps

Expected behavior

Notes