dragosrotaru / ppeforfree Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 5.0 4.56 MB

Collective sensemaking for mutual aid groups manufacturing PPE during COVID.

Home Page: https://ppeforfree.org

License: GNU General Public License v3.0

HTML 7.71% CSS 5.08% TypeScript 87.21%

ppe-initiatives

ppeforfree's People

Contributors

Stargazers

Watchers

Forkers

kurtvan marchesed ollie-codeaid epsom-software mindoodoo

ppeforfree's Issues

Facebook Group Table Search with FuseJS

Visualizing Facebook Posts (News Feed)

Pre-requisites

See #9

We need a page on the website with a high-level view of all the Facebook posts and links being shared globally. This is the community MegaPhone or Signal Amplifier.

This could be as simple as a Reddit style News Feed. But we can also visualize where original ideas are coming from? What group leads the community by being the first to post fresh information. ETC.

New Home Page

Our website right now is very cluttered and confusing when we drop in the directory right away.

Scraping Facebook Groups Posts

Note on FB Scraping, Data Privacy, Future Roadmap

See #5

Pre-requisite: Seed Data

See #6

Requirements

Don't commit data, private info, credentials, etc.
write your script in a new folder "scripts/facebook-group-posts-scraper
use any language you want. Preferably Python or NodeJS/TypeScript. (please)
use conservative rate-limiting and a dynamic DOM renderer like selenium or Puppeteer.
the script should get FB_USERNAME, FB_PASSWORD + MongoDB credentials via .env file
the script should get group ids from #6 (see comments)
the script saves data to local MongoDB instance (see schema below)

Scraping Facebook Groups Posts

We will use this data to make a news aggregator and to keep an eye out for more data for coalition-building purposes.

I started a script in scripts/facebook-group-posts-scraper using this library:
https://github.com/kevinzg/facebook-scraper

It works ok, but it doesn't work with 100% consistency, you will have to troubleshoot and maybe edit the script.

How your script will store and normalize the data

Database will be MongoDB

Schema

type Post = {
  id: UUID,
  createdAt: TimeStamp,
  text: string,
  link: URL,
  likes: number,
  shares: number,
  comments: number,
  groupID: UUID,
  scrapedAt: TimeStamp,
  scrapeID: UUID,
}

Misc

Random Lib I found: https://github.com/ParvJain/Facebook-Group-Scraper (please look through)

Group Locations (Array of Cities) to Latitude and Longitude

Connect Google Maps Geocoding API
Create a function that takes an array of locations, geocodes them, and gives you the Latitude and Longitude of the middle of the smallest jurisdiction that covers them all (are all the cities in the same state? same county? same country)

Add youtube Script here or in a separate Repo

Seed Data

OSCMS has a roster on google sheets I have gone through and grabbed every Facebook group (and page) id from (187 total). Roster:

https://docs.google.com/spreadsheets/d/1JH5uL3WW6PwvwFRe4wqXkheK0-jcGYqaPmb9J3Dr6Ac/edit?fbclid=IwAR3FX_xPe-bYbXQmjsXF5FUr7aISp27wGwHXuNIWzh92ScdQQSgVVrbixBo#gid=179139280

The data is available in data/facebook-group-ids-unclean.txt

Not all ID's are for FB groups though, so we need to pre-process them. Salty_Steve wrote a script to do that in Python but it doesn't work for all. Maybe it just needs rate-limiting implemented. See scripts/facebook-group-id-validator.

fix scripts/facebook-group-id-validator and product a clean file of group IDs called data/facebook-group-ids.txt. This is critical, please manually check your work. We need clean data.

Mutual Aid UK Groups Scraper

https://mutualaid.wiki/

api: https://mutualaid.wiki/api/group/get

Help / Get Involved

We should have a "help" page that offers multiple ways to engage:

Advocate - Share us on social media
Local Supplier - Join our Mailing List for updates (need to create a Mailchimp list)
Other - Contact Us (see #31 )
Join the Team - (see on-boarding process v1.1 in map)

we need to clearly explain how these different types of website visitors can get engaged.

probably in the header as an icon, in which case make the GitHub link an icon too

Processing Facebook Groups General Information

Pre-requisite: Scraping Facebook Groups General Information

See #8

Requirements

Don't commit data, private info, credentials, etc.
write your script in a new folder "scripts/facebook-groups-info-processor
use any language you want. Preferably Python or NodeJS/TypeScript. (please)
the script should get MongoDB credentials via .env file
the script should output data to JSON in the data folder with name data/facebook-posts-[timestamp].csv

How do we process the groupa? IDK, up to you. Let's get creative. The purpose is to enable a community map and directory. Work closely with DataViz.

CanadaSews Scraper

Write a scraper for all the Facebook group urls on the HTML of this page.

https://www.canadasews.ca/regions

Processing Facebook Groups Posts

Pre-requisite: Scraping Facebook Groups Posts

See #7

Requirements

Don't commit data, private info, credentials, etc.
the script should output data to JSON in the data folder with name data/facebook-posts-[timestamp].csv

How do we process the posts? IDK, up to you. Let's get creative. The purpose is to enable a megaphone feature or signal amplifier on the community. We've got to track links being shared (normalized of course, without query parameters) and track measure the virality of the content. We will want to display it on a community board that in an engaging way, contextualized with info about where it came from (maybe). Who posted it first? etc.

This can get really interesting. Can we track the propagation of information through the community? How do we display the info? What sorting features would we want? Work closely with the DataViz contributor.

Facebook Group Node Graph

A graph (as in graph theory) where vertices represent Groups. The radius of a vertex represents the size of the group, and edge lengths represent the number of members these groups have in common (closer means more members in common).

SEO, Social Media, Branding

OSCMS Roster Recurring Scraper

Programmatically Download CSV using this link here:
https://docs.google.com/spreadsheets/u/3/d/1JH5uL3WW6PwvwFRe4wqXkheK0-jcGYqaPmb9J3Dr6Ac/export?format=csv&id=1JH5uL3WW6PwvwFRe4wqXkheK0-jcGYqaPmb9J3Dr6Ac&gid=0

Contact Details

@dragosrotaru create [email protected] inbox in GSuite
Display email somewhere

Dont nest Scripts in the project. Scripts and the React app should be side by side

files like tsconfig.json and package.json inherit from their parent folder. This makes it hard to understand whats going on. Its better for projects to not be sub-nested. But deployment needs to be updated as well if that is the case.

Facebook Group Directory Table too wide

Describe the bug
Directory table creates horizontal scroll

To Reproduce
Steps to reproduce the behavior:

Go to 'https://ppeforfree.org/'
See horizontal scroll bar

Additional context
May not be implemented yet

Database of Localized Resources During COVID 19 Outbreak Scraper

https://docs.google.com/spreadsheets/d/1HEdNpLB5p-sieHVK-CtS8_N7SIUhlMpY6q1e8Je0ToY/export?format=csv

Prusa Scraper

The Prusa site runs on a GraphQL Endpoint. We want to scrape user IDs (46k) and Full Group Details (A few hundred). Don't publish this data, We are going to scrape the user ids so we can send them 1 message within the Prusa platform asking if they know of any local initiatives. Then we can process any links they reply to us with. Then if they are responsive, we provide them with a link to our site and thank them.

Scrape admin list from private Facebok Groups (its different than public groups)

Show internal docs publicly

Make Internal Documentation Accessible from the website

Add Internationalization support

We have our first translated copy of the manifesto being worked on in French, We need to support this on the site.

Scraping Facebook Groups General Information

Note on FB Scraping, Data Privacy, Future Roadmap

See #5

Prerequisite: Seed Data

See #6

Requirements

Don't commit data, private info, credentials, etc.
write your script in a new folder "scripts/facebook-group-info-scraper
use any language you want. Preferably Python.
use conservative rate-limiting and a dynamic DOM renderer like selenium or Puppeteer.
the script should get FB_USERNAME, FB_PASSWORD + MongoDB credentials via .env file
the script should get group ids from #6 (see comments)
the script saves data to local MongoDB instance (see schema below)

Scraping Facebook Groups General Information

We need data on all the Facebook groups in the community.

The data available on public FB groups (not including content like posts, pics, events, etc) I have found by manually going through 2 FB group pages includes:

Note: I compiled this by manually going through 2 FB group pages, please go through a few more pages yourself to see if some groups have more, less or differing public data available and we will update our schema

id
name
isPublic
description
foundedOn
memberCount
adminCount
moderatorCount
memberCountIncreaseWeekly
postCountIncreaseMonthly
postCountIncreaseDaily
moderatorList, adminList, memberList, pageList (pages can be in a group! these are lists of ids)

We will not get any other information about individuals other than their facebook id. This data is needed because we want to see how connected groups are (how many individuals they have in common) and we want to reach out to those individuals that are in a shit ton of groups! Very useful for coalition-building

Scraping Posts

I started a script in scripts/facebook-group-posts-scraper using this library:
https://github.com/kevinzg/facebook-scraper

It works well! But! We NEED to collect the timestamp on all the posts. It doesnt work with 100% consistency, you will have to troubleshoot. We will use this data to make a news aggregator and to keep an eye out for more data for coalition-building purposes.

How your script will store and normalize the data

Database will be MongoDB

Schema

type Group = {
  id: UUID,
  name: string,
  foundedOn: TimeStamp,
  public: boolean,
  description: string,
  memberCount: number,
  adminCount: number,
  moderatorCount: number,
  memberCountIncreaseWeekly: number,
  postCountIncreaseMonthly: number,
  postCountIncreaseDaily: number,
  memberList: UUID[],
  adminList: UUID[],
  moderatorList: UUID[],
  pageList: UUID[],
  scrapedAt: TimeStamp,
  scrapeID: UUID,
}

Misc

Random lib I found: https://github.com/ParvJain/Facebook-Group-Scraper (please look through)

OSINT with Reddit + PushShift

Here is something to research:

Using PushShift (see example API call below) to track down discussions and automate outreach.

https://reddit-api.readthedocs.io/en/latest/#comments-search

https://api.pushshift.io/reddit/comment/search?sort=desc&sort_type=created_utc&after=1523588521&before=1586632105&size=10000&subreddit=&q=%22PPE+3D%22&metadata=&

Facebook Groups Scraper: description is being cut off because of "show more" button on FB

Facebook Groups Scraper: description is being cut off because of "show more" button on FB. example groups: 208780533550355, 809393332878811

Visualizing Facebook Groups General Information

Pre-requisites

See #10

We need a page on the website with a high-level view of all the Facebook Groups.

Ideas for Views:

A Global Dashboard
A Map View (location needs to be coordinated with data-processing / scraping contributors)
A Table View
A Graph View with node size showing the number of members and edge length showing the number of members in common.
A Detail view with graphs over time showing an individual group's growth, etc.

Facebook Scraping, Data Privacy, Future Roadmap

As it stands:

I will run the scrapers daily on my computer using my login credentials.
Not all data scraped will be public. Processing scripts will produce publishable data that I will upload to the repo in JSON format for the front-end to use. This is v0.1 of our API.
The original scraped data will be stored on my machine until we have cloud infra in place.
Collaborators can get access to the full dataset by asking for it.

Scraping will be phased out by:

getting Public Group API access via FB Dev App program
getting Group Admins to "claim" their Facebook page on our site by authorizing our app.

Facebook Group Detail View

Make the Scraper Reliable

Implement partial scraping - recent members only, detect field change frequency, prioritize based on group size, history of growth
don't do batch jobs, let the scraper run as a chron job or background task
use counts vs list.length to detect issues
save stderr and scraper parameters
connect to existing browser or save session (no relogin)
make auto-scrolling work by waiting for network idle
create phony public and private groups for integration tests
deploy scraper on multiple personal computers (Scraping@Home V0.0.1)
use exponential backoffs

Find most practical / effective way to determine location of Groups

- manually
- Named Entity Recognition (NLP) from Title / Description / Posts
- image metadata (probably scrubbed by facebook?)
- members
- pages

dragosrotaru / ppeforfree Goto Github PK

ppeforfree's People

Contributors

Stargazers

Watchers

Forkers

ppeforfree's Issues

Note on FB Scraping, Data Privacy, Future Roadmap

Pre-requisite: Seed Data

Requirements

Scraping Facebook Groups Posts

How your script will store and normalize the data

Misc

Seed Data

Pre-requisite: Scraping Facebook Groups General Information

Requirements

Pre-requisite: Scraping Facebook Groups Posts

Requirements

Note on FB Scraping, Data Privacy, Future Roadmap

Prerequisite: Seed Data

Requirements

Scraping Facebook Groups General Information

Scraping Posts

How your script will store and normalize the data

Misc

Pre-requisites

Recommend Projects

Recommend Topics

Recommend Org