Code Monkey home page Code Monkey logo

Comments (6)

telotortium avatar telotortium commented on August 21, 2024 2

Thanks for the pointers - I'll add a PR for a new module when I have time.

from hpi.

seanbreckenridge avatar seanbreckenridge commented on August 21, 2024 1

Thats correct. I have a personal module that uses newsboat, though that just tracks when I added/removed feeds not the contents of the RSS feed

But it sounds like youre talking about using rss as a source for data, not particularly tracking your feeds -- correct?

If you want to track hackernews and reddit, could you use their respective modules instead?

https://github.com/karlicoss/HPI/blob/master/my/hackernews/dogsheep.py using https://github.com/dogsheep/hacker-news-to-sqlite

https://github.com/karlicoss/HPI/tree/master/my/reddit using https://github.com/karlicoss/rexport, see setup

That'll also likely have more info than an RSS feed

from hpi.

seanbreckenridge avatar seanbreckenridge commented on August 21, 2024

Not saying some other functionality to use RSS as a data source couldn't be added -- but it would likely widely vary depending on the site, I'd assume

from hpi.

seanbreckenridge avatar seanbreckenridge commented on August 21, 2024

And just to expand a bit, HPI probably wouldnt connect to a live RSS feed -- goes against one of its core principles

Likely, we'd create some basic separate tool/repo, like export_rss_data, which does something like:

In [25]: import requests, xmltodict; data = xmltodict.parse(requests.get("https://xkcd.com/rss.xml").text)

In [26]: data['rss']['channel']['item'][0]
Out[26]: 
OrderedDict([('title', 'Consensus Time'),
             ('link', 'https://xkcd.com/2594/'),
             ('description',
              '<img src="https://imgs.xkcd.com/comics/consensus_time.png" title="Now, you may argue that the varying hour lengths and feedback effects would cause chaos. To which I say, yeah, and I\'m also curious to see how the weekday cycle interacts with it! So, you in?" alt="Now, you may argue that the varying hour lengths and feedback effects would cause chaos. To which I say, yeah, and I\'m also curious to see how the weekday cycle interacts with it! So, you in?" />'),
             ('pubDate', 'Wed, 16 Mar 2022 04:00:00 -0000'),
             ('guid', 'https://xkcd.com/2594/')])

saving rss feeds you give it to a file, say in ~/data/rss_data

You'd run some cli command like export_rss_data url1 url2 url3 --to ~/data/rss_data, which you could put in cron or run manually, to save stuff every couple hours/daily

and then a file would be created like my.feed.rss_data.py, which can parse those files, removing any duplicates because of overlapping backups

If thats something youre interested in, let me know -- otherwise can just use the hackernews and reddit modules

from hpi.

telotortium avatar telotortium commented on August 21, 2024

No, I'd like to use the hackernews and reddit modules. But what do you actually put in your promnesia/config.py file to use the my.hackernews.dogsheep HPI module?

from hpi.

seanbreckenridge avatar seanbreckenridge commented on August 21, 2024

I think dogsheep is a relatively new module, so it doesnt have a corresponding source in promnesia yet.

Given its structure, it shouldn't be hard to add one -- can just use the url and permalink attributes on the object

I think a src/promnesia/sources/dogsheep.py would look very similar to something like this -- where it calls the dogsheep.items function, returning a visit for the permalink, and then using the extract_urls helper to extract any URLs from the text.

I don't use hackernews, so is a bit hard for me to write one myself, but feel free to open an issue in promnesia or create a PR with something basic like this if you feel up to it.

from hpi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.