kockaadmiralac / kockalogger Goto Github PK
View Code? Open in Web Editor NEWParses IRC logs of activity across Fandom, then relays it into a Discord channel, searches for spam/vandalism and more.
License: GNU General Public License v3.0
Parses IRC logs of activity across Fandom, then relays it into a Discord channel, searches for spam/vandalism and more.
License: GNU General Public License v3.0
Misclicks happen, and sometimes we mark something we didn't want to as spam or not spam. There should be a slash command which allows us to check recent classifications, so we don't have to guess who was the user we mistakenly marked as not spam (users marked as spam are technically already logged in the staging channel).
A /history
command which shows last 10 classified profiles ordered by their classification date, which allows paging through the reports 10 by 10. It should only display profile links and their classification status.
A report logging channel like the one KockaBot had.
The logger
module displays article/blog comments pretty ugly, in the form of their full page names.
The page names are parsed with the purpose of displaying their parts separately.
X posted a comment on Y
(Talk:Y/@comment-X-t
)X posted a comment on Y's blog: "Z"
(User blog comment:Y/Z/@comment-X-t
)X replied to Y's comment on Z
(Talk:Y/@comment-X-t/@comment-Z-t
)X replied to Y's comment on Z's blog: "A"
(User blog comment:Z/A/@comment-X-t/@comment-Y-t
)Instead of linking to the exact reply on a thread, we are linking only to the thread itself in the logger
module.
Thread links should link to Thread:ID#reply
.
Unless a good way of doing this is found, making thread links link to replies means making at least one HTTP request per ns:1201/2001 edit.
Other projects whose source code is private (like WikiaSpam) or too niche to be included in KockaLogger's repository (like the possibly upcoming Scribunto CI for Dev Wiki) may need integration with KockaLogger for easier and more reliable parsing. The current setup has a few issues with being made into an npm module, such as KockaLogger modules always being loaded from the modules
folder, that should be looked into.
index.js
file that exports required KockaLogger classes for public use. This includes all files from include/
and parser/
directories, as well as the Loader and Module classes.Just create an integration of KockaLogger with Lux and create a Lux service that uses the KockaLogger service for receiving and parsing messages. This also makes requesting interest information an asynchronous process instead of synchronous, further complicating KockaLogger client's structure.
Both usage documentation and JSDoc should probably be made for KockaLogger. Even though very little people aside from me (so far 1) use KockaLogger, knowing how it works might be of use to its users.
Create a docs
directory with separate Markdown files for each module that needs to be configured.
Document KockaLogger somewhere on FANDOM, or on a secret wiki.
As of 18f92ea, KockaLogger has basic support for the Discussions AbuseFilter, as it's still a feature in development and not yet available anywhere. Its format is basic and users would have to visit the hit examination page to figure out what actually happened (unless the post snippet appeared to make it very obvious).
DiscussionsMessage#fetch
would have to be expanded to fetch filter information.
Since 30 March 2021, entries from posting/editing/reporting/deleting comments do no longer appear for unknown reason. And for now the bug is still ongoing. Would it be fixed the problem if the issue is from Logger ? Thank you.
Entries should appear like <username> (t|c) posted a comment (<size>) on <PAGENAME>
for example https://discord.com/channels/244398471044399106/244399776315998208/826410811751792660 (WW)
TokihikoH11 (t|c) protected User:TokihikoH11/RCLogger test edit=autoconfirmed (expires 21:22, 15 November 2021 (UTC)) to (Testing)
but User:TokihikoH11/RCLogger test edit=autoconfirmed (expires 21:22, 15 November 2021 (UTC))
appears to be blue like a link but only User:TokihikoH11/RCLogger test
should appear in blue (see https://discord.com/channels/499291143201095700/501465557569372160/909916112207085568 - Vocaloid Wiki server)
While our JSDoc tries to provide the most out of completion features, TypeScript can provide better type safety. Moving forward, KockaLogger should probably be rewritten in TypeScript at some point.
Now that the profile classification results are stored in a database, we can use them as a dataset for a machine learning model that can predict whether a profile is spam or not. We can use prediction results to mark profiles with high probability of being spam, and when it receives high enough accuracy (or whatever other metric we decide to look at) use it to auto-report spam profiles to SOAP.
This task is for tracking the initial implementation of a machine learning model which can be trained on the existent database and achieve good enough results. The procedure is as follows:
I'm not that skilled in machine learning at the time of writing this issue.
Memory usage needs to be checked and logged.
As the Thread:
namespace is supposed to be translated in JSON files, it may lead to thread links not working when the wiki is English but the format is not.
Steps to reproduce the behavior:
Thread:
namespace in one of the non-English JSON files.logger
module configuration.It works.
It shouldn't be translated at all but auto-generated from namespace names of the wiki.
Sometimes I see a vandalizing edit, go to revert it, and find that SOAP Bot already took care of it, but avoided logging by being a bot.
Add bot edits to the logger input, but filter out bots other than SOAP Bot.
Throttle bot edits from logging more than, say, five per ten minutes regardless of which bot it is.
We need to know when no activity has been received in an unusual period of time from a certain channel.
Have messages of every type clear a timeout which if finished sends us a notification (after the channel goes back up the timeout is restored).
happens when you delete a redirect by overwriting it when moving a file to it
KockaLogger's IRC connection is one of the most important components of KockaLogger as without it everything else breaks. As such, all issues relating to it are to be tracked:
Closing Link: 195.18.217.34 (Ping timeout: 180 seconds)
was received in an IRC error message and the bot's IRC client reconnected (presumably after stopping the relay of messages for ~180 seconds beforehand).ENOTFOUND
was received when connecting to the IRC server several times over two minutes (2-3 second intervals). It is unknown whether this was an issue on Fandom's or bot's side.ECONNREFUSED
received from the IRC server several times over a few hours. The bot also recovered from this.ENOTFOUND
twice during the day.ECONNREFUSED
around 9 AM CST.No reproduction steps available.
These issues should be minimized if possible.
IRC message limit is 512 bytes. This is usually okay for messages written in English language but a lot of messages from Greek, Russian and ქართული wikis are failing to parse due to their messages being trimmed without an overflow. There's nothing to connect those to so their log actions fail to parse.
There is also no known way to detect whether a parsing failure was caused by this or by a legitimate issue that needs to be fixed.
Steps to reproduce the behavior:
KockaLogger should either not report those if they can't be fixed or fix them somehow.
As of v1.1.5, KockaLogger handles SIGINT. However, it doesn't clean up all resources hence the need for a forced process.exit();
. All components of KockaLogger that need it should be given a chance to clean up their resources for a clean shutdown.
IO
and that IO#close
is required.Certain wikis are throwing 404 errors and I am unable to tell which ones.
Steps to reproduce the behavior:
Better error messages which tell you exactly what's the issue with info fetching.
"Created page with" in edit summaries becomes very repetitive after some time.
Messages like this make it more readable:
KockaAdmiralac created A (edit summary)
KockaAdmiralac created A with content <summary>
The logger
module is telling the client it's interested in messages that are going to be filtered out by the filters afterwards, causing KockaLogger to make unneeded API/Redis requests.
Steps to reproduce the behavior:
There should be no interest shown in messages that aren't going to be shown.
Doing filtering inside interested()
may create different issues due to titles not being the page's actual titles before the title information is checked. A different configuration option may need to be set up for this.
Reporters can report spam directly to R:S without going through KockaLogger. KockaLogger disregards these reports and duplicates them on R:S. This should not happen.
KockaLogger checks the page content of R:S before reporting users.
Page title fetching from API was introduced as a method of obtaining the actual page title in case of article comments being submitted on an article whose title is matching article's title in the IRC log but the actual title is much different. Same issue occurs for threads but for a different reason.
However, comments and threads cannot be created for talk namespaces, yet they are being fetched from the API as well. Making KockaLogger skip title checking for talk namespaces is going to save some HTTP requests.
Steps to reproduce the behavior:
The HTTP request isn't made due to the title in WikiaRC already matching the page's title.
mwn
happens to throw login failures while moving reports.
Steps to reproduce the behavior:
{
"code": "mwn_failedlogin",
"info": "Login failed",
"response": {
"login": {
"result": "WrongToken"
}
}
}
{
"code": "mwn_failedlogin",
"info": "Already logged in as ***@***, logout first to re-login",
"response": {
"login": {
"result": "Aborted",
"reason": "Cannot log in when using MediaWiki\\Session\\BotPasswordSessionProvider sessions."
}
}
}
Just retry the login after a delay.
Redis can not be startable and npm start command fails because of Redis not found.
Steps to reproduce the behavior(Windows only):
Redis should up and no errors happen about Redis
Windows Server 2012 R2 with Redis(Windows Version) Installed.
It appears that newusers
tries to fetch user information from the API
Steps to reproduce the behavior:
HTTPError
s leading with a message from the newusers
moduleKockaLogger doesn't log five kilometers (3.10686 feet) of logs and doesn't try five times to retrieve user information just to report a user no longer exists. Maybe the user renamed in the span of 30 minutes since they created their account?
KockaLogger could also record user ID instead of username to prevent this from happening, but as the feature is supposed to catch spambots, and spambots are unlikely to rename their account shortly after creation, this probably isn't needed.
When run with --generate
command-line option, KockaLogger blanks the JSON files instead of generating i18n regular expressions from them.
node main.js --generate
node main.js
The JSON files are properly saved.
Generation itself works fine.
In 3f2a988, we introduced the IGNORED_LOGS
constant which prevents relaying errors related to certain logs which KockaLogger does not support. Ideally, if these logs are general-purpose enough, we should support them. Some logs I can see that seem general purpose enough are:
review
cargo
import
(Special:Import)interwiki
(Special:Interwiki)merge
(Special:MergeHistory)contentmodel
(Special:ChangeContentModel)We check which messages are being used for these logs and add them to messages/map.js
, parser/log.js
etc.
Gamepedia wikis have their $wgArticlePath
set to /$1
.
The links don't include /wiki/
. This could probably be achieved by having a single method that creates page links that can check for the wiki domain. (util.pageUrl
?)
Slash commands can currently be used from any channel on a server. As we don't want people to be using these randomly, we should probably restrict them only to channels which contain information on what these commands do.
Restrict slash commands for profile reporting-related commands to the reports and staging channels, and display a message when used outside of them.
needs to be unescaped in thread titles.
https://undertale.fandom.com/wiki/Thread:153379
An empty space appears.
Usernames with multiple backticks are not escaped, causing some issues.
Backticks need to be escaped
While this is a Wikia issue, a similar caching technique can be applied as to thread titles to get them to show up.
When a thread with a title is posted, cache the thread title associated with the ID in cache. If a reply to that thread is made, take the title from cache.
If the thread title isn't in cache, use the Service API to fetch the title.
I would like to log debug-level messages from newusers which contain information about user reports and report moving, so I can retrieve users from these logs on special occassions. However, I don't want them logged to Discord, as they are not errors.
KockaLogger's logger should let me specify which log level is being logged to Discord.
Separate loggers which log debug and non-debug information in newusers.
The logger's logging methods are technically asynchronous, but no component of the system calls them as such. This means Discord backend errors can cause KockaLogger to crash due to uncaught promise rejections.
Steps to reproduce the behavior:
It doesn't crash.
community
as in community.fandom.com
is whitelisted, community.fandom.com/es
will also appear whitelisted).Discord fixed their Markdown escaping so we do not need to strip Markdown from edit summaries (as much).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.