eben0 / snooshift Goto Github PK
View Code? Open in Web Editor NEWJavaScript wrapper library for Pushshift with Snoowrap support.
License: MIT License
JavaScript wrapper library for Pushshift with Snoowrap support.
License: MIT License
Thanks for making a useful wrapper!
I noticed when I try to get new posts on some subreddits, including /r/askscience and others, I get zero posts:
const searchParams = {
size: 25,
sort_type: 'new',
subreddit: 'askscience'
};
const res = await snoo.searchSubmissions(searchParams); // zero results
I have also tried setting sort_type
to created_utc
.
Any insight on why this would be the case?
Is there a a way to get this package from jsdeliver or unpkg to use with the <script> tag on page instead of having to get it from npm?
Title is pretty straight forward on the issue.
My use case: I am scrapping meme subreddits and using upvotes/downvotes to create a way to bet on whether a meme is dank or not (at least according to the subreddit it was posted in). I have a system fully functional for this in python. I wanted to move the system to TS for many quality of life improvements. But sadly, the submissions returned by snooshift have incorrect upvote_ratio
and score
which are 99% of the time both a value of 1
, hence making the downvotes 0
. The properties on snooshift's Submission
interface is nearly identical to the Submission
class in python praw package. I literally use the same named properties, upvote_ratio
and score
.
To reproduce, simply use snooshift.searchSubmissions
and view the upvote_ratio
and score
across several Submission
s
I attach my code snippet below. I would REALLY LOVE to be able to use snooshift for this instead of python. Much more performative and clean.
@Injectable()
export class WebScraperService {
private readonly logger = new Logger(WebScraperService.name);
private readonly subreddits = ["dankmemes", "memes"];
private readonly exts = [".jpg", ".jpeg", ".png"];
private readonly snoo = new SnooShift();
private redditScraperMutex: boolean = false;
private imgflipScraperMutex: boolean = false;
constructor(
private readonly redditMemeService: RedditMemeService,
private readonly redditorService: RedditorService,
private readonly imgflipTemplateService: ImgflipTemplateService
) {}
// @Cron(CRON_SCHEDULES[ECronJobRegistry.RedditMemeScrapper], { name: ECronJobRegistry.RedditMemeScrapper })
async redditorMemeScrapper() {
if (this.redditScraperMutex) return;
else this.redditScraperMutex = true;
for (const subreddit of this.subreddits) {
this.logger.log(`RUNNING REDDIT MEME SCRAPPER: r/${subreddit}`);
try {
await this.scrapeSubReddit({ subreddit });
} catch (error) {
this.logger.error(error.message, error.stack);
}
}
this.logger.log("DONE REDDIT MEME SCRAPPER");
this.redditScraperMutex = false;
}
private async scrapeSubReddit({ subreddit, gracePeriod = 7 }: { subreddit: string; gracePeriod?: number }) {
const endAt = dayjs().startOf("h").subtract(gracePeriod, "d");
const maxCreatedAt = await this.redditMemeService.repo.max("createdAt");
let startAt = maxCreatedAt && maxCreatedAt.result ? dayjs(maxCreatedAt.result) : dayjs().startOf("d").subtract(62, "day");
while (startAt < endAt) {
this.logger.log(`SCRAPPING STARTING AT ${startAt}`);
const after = startAt.unix(),
before = endAt.unix();
const unfilteredSubmissions = (await this.snoo.searchSubmissions({
subreddit,
after,
before,
size: 100,
stickied: false,
})) as Submission[];
// console.log("unfilteredSubmissions", unfilteredSubmissions);
// throw new Error("check");
const submissions = unfilteredSubmissions.filter(({ url }) => this.exts.some((ext) => url.endsWith(ext)));
const usernames = submissions.map(({ author_fullname }) => author_fullname);
const redditors = await this.redditorService.repo.find({ where: { username: In(usernames) } });
const usernameToOldRedditor = redditors.reduce<Record<string, RedditorEntity>>(
(prev, redditor) => ({ [redditor.username]: redditor, ...prev }),
{}
);
const urls = submissions.map(({ url }) => url);
const redditMemes = await this.redditMemeService.repo.find({ select: ["url"], where: { url: In(urls) } });
const urlSet = new Set(redditMemes.map(({ url }) => url));
const dedupSubmissions = submissions.filter(({ url }) => url && !urlSet.has(url));
const usernameToNewRedditor = dedupSubmissions
.filter(({ author_fullname }) => !usernameToOldRedditor[author_fullname])
.reduce<Record<string, RedditorEntity>>(
(prev, { author_fullname }) => ({ [author_fullname]: this.redditorService.repo.create({ username: author_fullname }), ...prev }),
{}
);
await this.redditorService.repo.save(Object.values(usernameToNewRedditor));
const usernameToRedditor = { ...usernameToOldRedditor, ...usernameToNewRedditor };
const urlToNewRedditMeme = dedupSubmissions.reduce<Record<string, RedditMemeEntity>>(
(prev, { id, num_comments, title, score, created_utc, upvote_ratio, url, author_fullname }) => ({
[url]: this.redditMemeService.repo.create({
redditId: id,
numComments: num_comments,
upvotes: score,
createdAt: dayjs(created_utc * 1000).toDate(),
downvotes: Math.round(score / upvote_ratio) - score,
title,
url,
upvoteRatio: upvote_ratio,
redditorId: usernameToRedditor[author_fullname].id,
subreddit,
}),
...prev,
}),
{}
);
await this.redditMemeService.repo.save(Object.values(urlToNewRedditMeme));
startAt = dayjs(1000 * Math.max(...submissions.map(({ created_utc }) => created_utc)));
await new Promise((r) => setTimeout(r, 5000));
}
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.