Code Monkey home page Code Monkey logo

rescuesocialtech / sna-ah-case-reddit Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 93.42 MB

Reddit - Social Network Analysis on Amber Heard's Case Example from Data Analysts, Researchers, and Scientists. --- Reddit Analysis - It's important to look at Reddit in the context of cross-platform operations and comparing - e.g., ages of accounts during peaks, statistical differences or similarities, natural language processing, timelines, repeated same texts in quantifications, timings, and amplifications.

HTML 55.87% Jupyter Notebook 44.13% Python 0.01%

sna-ah-case-reddit's Introduction

Reddit Social Network Analysis Against Influence Operation "Amber Heard"

Reports alt text

Reddit - Social Network Analysis on Amber Heard's Case Example from Data Analysts, Researchers, and Scientists.

Reddit Analysis - It's important to look at Reddit in the context of cross-platform operations and comparing - e.g., ages of accounts during peaks, statistical differences or similarities, natural language processing, timelines, repeated same texts in quantifications, timings, and amplifications.

  • 4 Reddit Reports for Per Year for 2018-2021 and a Users Report.
  • Data from Reddit: 164,530 Contributions, 15,896 Submissions, 71,319 Accounts, Links
  • 5,025 Banned Reddit Accounts are the highest amount of contributors in years of operations and get the highest percentage of Upscores (often hidden on Reddit profiles).
  • Banned accounts with highest upvotes can show references to the Disinformation Operations from 2019-2021, e.g., '3rdPrizeIsYourFired,' 'the-speed-of-pain,' 'TruthbeThePrejudice.'
  • Threats exist in all platforms, including Reddit.
  • In influence operations, 'call out' postings on platforms are used to confuse workers at social media companies to hide that they're amplified by bots. Reddit (as shown in prior precedence in multi-platform research papers) can be a sandbox to confuse workers and victims. E.g., 'execute 99' and 'good soldiers follow orders' showing within the Dec 2020-Jan 2021 Peaks of YouTube 'Amber Heard's 2020 Takeaway: Adapt and Survive' Simulation showing tens of thousands of the same texts with 'not a victim.' The statistics show that Reddit bans accounts seeing automation, and that minimum scores can also go to harmful users. - See Research Papers folder

Notes from Researchers:
These reports are a part of Case Study: “Reddit Social Network Analysis Against Influence Operation” by analyzing accounts posting/commenting against a victim of a Social Bot Disinformation/Influence Operation.

We have four main datasets scraped from Reddit
1. A dataset with submissions & comments data (2020).
2. Users Data (from 2006 to 2020).
3. A merged dataset (submissions & comments data, users data).
4. Daily creation data (# of accounts created per day from 2006 to 2020).
The data Timezone is Epoch/UTC.

Notes: Reddit began to achieve a notable level of popularity in mid-2010, and it has expanded its reach since. It had become “really popular” in early 2013. Reddit was launched in June 2005.
The effect of 2020 is obvious, but we can see a small effect from 2021 as the data collected till May of this 2021.
Reddit does not remove contributions and submissions from banned accounts.

Newly created accounts in 2018, 2019, 2020, 2021
As the data collected from 2018 to 2021, the most suspicious accounts are the new ones which are created between 2018 and 2021.

Included Types of Analysis on Reddit:

  • New, Banned, Unverified Accounts Analysis
  • Same and Repeated Texts Analysis
  • NLP, wordclouds, negative texts and accounts
  • Upscore/Downscore Ratios for banned, new, unverified account layers
  • Top contributing accounts and banned accounts
  • Accounts and same texts posting within seconds/multiple subreddits analysis - timings
  • Timelines, peaks, anomalies, timings - comparing statistics
  • Submissions, contributions, and accounts analysis

These are high volume operations, so these are merely some examples. Images and Highlights provide more, e.g., the November 2020 peak from mostly 2011 accounts, whereas, the Februrary 2020 peak is mostly from newly created accounts. Contrast this with the other platforms and their statistics.

Examples from Overviews of Yearly Reports - See Report PDFs and Python Notebooks for full reports, graphs, and analysis.

Overview - 2019 Reddit

Top users statistics in 2019:
"3rdPrizeIsYourFired"(banned) these user contributions got the highest scores.
(Since this user is banned, we have no user information but, we can further investigate his contributions)
made 6 contributions
submission 5
comment 1
Peregrino234(banned) this user contribution got the minimum scores.
he made 6 different comments in one subreddit in the same day 2019-03-03
"imsrikant"(banned): this user made 22 submissions with the same text in the same time in 6 different subreddits

Overview - 2020

Top users statistics in 2020:
Armpit-lover, this user made 20 contributions with the same text, 13 of them in one subreddit.
rodrigohernandez4477 (banned), this user made 25 contributions with the same text in 25
2020-12-05 (13) in 20 minutes.
2020-11-28 (12) in 13 minutes.
cracksniffer666 (banned)
This user's contributions got the highest scores.
the highest score is (36.8K)
(Since this user is banned, we have no user information )
This user made 99 contributions (98 comments and only one submission).
89 contributions from 99 were made at (10 Feb 2020)!
WouldYouKindley88 (banned)
These user contributions got the minimum scores.
Agree-with-you
unverified account with the largest comment karma.
rMemesMods (unverified)
This user's contributions got the lowest scores.
Lucaswebb (unverified)
This user's contributions got the highest scores.

Peak Dates page 13
Feb 4, 2020
Feb 8, 2020
Feb 2, 2020
Nov 7, 2020
Feb 7, 2020

Investigating the Submission Text (Submissions with the most comments and replies) page 34
22.17% of the peak day contributions (04/02/2020) were made by banned (6.77% → 652) unverified (15.4% → 1484).
20% of submissions of the peak day submissions (04/02/2020) were made by banned accounts (14.9% → 1175) unverified accounts (6.48% → 511).

Overview - 2021

Top users statistics 2021:
"90police": this user made 17 submissions within only 5 minutes
"Truthbetheprejudice": this user made 4 submissions with the same text "Remove Amber Heard from Aquaman 2" within only 30 seconds!!
"the-speed-of-pain": this user made 2 submissions with the same text "Remove Amber Heard from Aquaman 2" within only 3 seconds!!
It is obvious, it is a campaign to remove AH from the movie.

Further Investigate The Most Commented Users In 2021 page 17
AutoModerator
CelebBattleVoteBot
charliedba
Stanley_Elkind
It is a regular contribution rate, as the maximum contributions over days are 4.
LoveAmberHeard42286
Truthbetheprejudice
gaul66
sadwook
Beatplayer

About 20% of 2021 submissions were made by banned and unverified accounts.

Rush Days page 9
Apr 17, 2021
Feb 20, 2021
Feb 28, 2021

The percentage of 2021 contributions made by Banned accounts 7.23%
Total banned accounts contributions in 2021 is 1323.
Total banned accounts comments in 2021 is 1095.
The percentage % of 2021 submissions made by Banned accounts 11.84 %
Total banned accounts submissions in 2021 228.

The above are examples from the Reports and provide some guidance of where to find anomalies. These analyses serve as blueprints for finding red flags of other operations.

sna-ah-case-reddit's People

Contributors

adelabuhashim avatar christinataft avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.