Code Monkey home page Code Monkey logo

snooshift's People

Contributors

eben0 avatar raphael0010 avatar wasserholz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

snooshift's Issues

Cannot read sort_type `new` posts for `/r/askscience`

Thanks for making a useful wrapper!

I noticed when I try to get new posts on some subreddits, including /r/askscience and others, I get zero posts:

const searchParams = {
	size: 25,
	sort_type: 'new',
	subreddit: 'askscience'
};
const res = await snoo.searchSubmissions(searchParams); // zero results

I have also tried setting sort_type to created_utc.

Any insight on why this would be the case?

script tag usage?

Is there a a way to get this package from jsdeliver or unpkg to use with the <script> tag on page instead of having to get it from npm?

Incorrect upvote_ratio and score on submissions

Title is pretty straight forward on the issue.

My use case: I am scrapping meme subreddits and using upvotes/downvotes to create a way to bet on whether a meme is dank or not (at least according to the subreddit it was posted in). I have a system fully functional for this in python. I wanted to move the system to TS for many quality of life improvements. But sadly, the submissions returned by snooshift have incorrect upvote_ratio and score which are 99% of the time both a value of 1, hence making the downvotes 0. The properties on snooshift's Submission interface is nearly identical to the Submission class in python praw package. I literally use the same named properties, upvote_ratio and score.

To reproduce, simply use snooshift.searchSubmissions and view the upvote_ratio and score across several Submissions

I attach my code snippet below. I would REALLY LOVE to be able to use snooshift for this instead of python. Much more performative and clean.

@Injectable()
export class WebScraperService {
  private readonly logger = new Logger(WebScraperService.name);
  private readonly subreddits = ["dankmemes", "memes"];
  private readonly exts = [".jpg", ".jpeg", ".png"];
  private readonly snoo = new SnooShift();
  private redditScraperMutex: boolean = false;
  private imgflipScraperMutex: boolean = false;

  constructor(
    private readonly redditMemeService: RedditMemeService,
    private readonly redditorService: RedditorService,
    private readonly imgflipTemplateService: ImgflipTemplateService
  ) {}

  // @Cron(CRON_SCHEDULES[ECronJobRegistry.RedditMemeScrapper], { name: ECronJobRegistry.RedditMemeScrapper })
  async redditorMemeScrapper() {
    if (this.redditScraperMutex) return;
    else this.redditScraperMutex = true;
    for (const subreddit of this.subreddits) {
      this.logger.log(`RUNNING REDDIT MEME SCRAPPER: r/${subreddit}`);
      try {
        await this.scrapeSubReddit({ subreddit });
      } catch (error) {
        this.logger.error(error.message, error.stack);
      }
    }
    this.logger.log("DONE REDDIT MEME SCRAPPER");
    this.redditScraperMutex = false;
  }

  private async scrapeSubReddit({ subreddit, gracePeriod = 7 }: { subreddit: string; gracePeriod?: number }) {
    const endAt = dayjs().startOf("h").subtract(gracePeriod, "d");
    const maxCreatedAt = await this.redditMemeService.repo.max("createdAt");
    let startAt = maxCreatedAt && maxCreatedAt.result ? dayjs(maxCreatedAt.result) : dayjs().startOf("d").subtract(62, "day");
    while (startAt < endAt) {
      this.logger.log(`SCRAPPING STARTING AT ${startAt}`);
      const after = startAt.unix(),
        before = endAt.unix();
      const unfilteredSubmissions = (await this.snoo.searchSubmissions({
        subreddit,
        after,
        before,
        size: 100,
        stickied: false,
      })) as Submission[];
      // console.log("unfilteredSubmissions", unfilteredSubmissions);
      // throw new Error("check");
      const submissions = unfilteredSubmissions.filter(({ url }) => this.exts.some((ext) => url.endsWith(ext)));
      const usernames = submissions.map(({ author_fullname }) => author_fullname);
      const redditors = await this.redditorService.repo.find({ where: { username: In(usernames) } });
      const usernameToOldRedditor = redditors.reduce<Record<string, RedditorEntity>>(
        (prev, redditor) => ({ [redditor.username]: redditor, ...prev }),
        {}
      );
      const urls = submissions.map(({ url }) => url);
      const redditMemes = await this.redditMemeService.repo.find({ select: ["url"], where: { url: In(urls) } });
      const urlSet = new Set(redditMemes.map(({ url }) => url));
      const dedupSubmissions = submissions.filter(({ url }) => url && !urlSet.has(url));
      const usernameToNewRedditor = dedupSubmissions
        .filter(({ author_fullname }) => !usernameToOldRedditor[author_fullname])
        .reduce<Record<string, RedditorEntity>>(
          (prev, { author_fullname }) => ({ [author_fullname]: this.redditorService.repo.create({ username: author_fullname }), ...prev }),
          {}
        );
      await this.redditorService.repo.save(Object.values(usernameToNewRedditor));
      const usernameToRedditor = { ...usernameToOldRedditor, ...usernameToNewRedditor };

      const urlToNewRedditMeme = dedupSubmissions.reduce<Record<string, RedditMemeEntity>>(
        (prev, { id, num_comments, title, score, created_utc, upvote_ratio, url, author_fullname }) => ({
          [url]: this.redditMemeService.repo.create({
            redditId: id,
            numComments: num_comments,
            upvotes: score,
            createdAt: dayjs(created_utc * 1000).toDate(),
            downvotes: Math.round(score / upvote_ratio) - score,
            title,
            url,
            upvoteRatio: upvote_ratio,
            redditorId: usernameToRedditor[author_fullname].id,
            subreddit,
          }),
          ...prev,
        }),
        {}
      );
      await this.redditMemeService.repo.save(Object.values(urlToNewRedditMeme));
      startAt = dayjs(1000 * Math.max(...submissions.map(({ created_utc }) => created_utc)));
      await new Promise((r) => setTimeout(r, 5000));
    }
  }
 }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.