Code Monkey home page Code Monkey logo

sumo's People

Contributors

xbc5 avatar

Watchers

 avatar

sumo's Issues

Create a generic feed interface

So that users can provide rules for parsing custom pages

  • light scraper (#6);
  • full headless browser (#7);

One interface, for the above contexts + RSS.

Allow users to specify a predefined content type (#8) -- e.g. LiveStream, ShortVOD (aka short), Article, LongVOD. This is for filtering, and organisation -- for example, a YouTube channel has one URL, but possibly multiple feeds (uploads/live stream/shorts). This is hard and error prone to tag, and needs to be specified at rule creation time.

The client can apply ordinary tagging: by source, key words, and/or by type. This allows users to create complex rules to organise their feeds.

Create a scraper

Use selectors to parse a source into a generic interface (#5) -- e.g. like the one used for RSS.

Consider a YouTube search feed

Not all topics/people have a dedicated or official channel. Create a feature that subscribes to search results.

Some caveats:

  • the results change often
  • duplication of effort

For both caveats we should use a caching mechanism, whether it's the database, or the file system. Regardless, this will result in a two-stage process, which may contradict the generic feed interface (#5). The feed interface will expect a one-shot, fetch and format process; but the two-stage process will require 1) perform the search, 2) for each result, check the cache, if it misses, then fetch. Additionally, this will make another request to pull the item using the scraper approach (#6).

In conclusion, #5 will need to abstract the search, caching mechanism, and the scraping of each item. Only then that we can create/emit the finished feed type.

In short: scrape YouTube search for URLs; filter only new items, then scrape each one.

Apply custom tags

Use a pipeline. Make fetching the feed somewhere near the start of it, and after it's fetched, apply custom user tags to both the feed and articles.

This application will depend on user set rule:

  • regex filters
  • source
  • more?

Support YouTube shorts and live streams

  • shorts
  • live streams

Treat these as distinct, such that the user can tag and filter for shorts or live streams.

These should have their own type associated with it (#5), and conform to the generic interface (#5). This allows users to specify their on scraping rules (#6) for custom websites (e.g. Twitch).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.