Comments (6)
Thanks for the pointers - I'll add a PR for a new module when I have time.
from hpi.
Thats correct. I have a personal module that uses newsboat, though that just tracks when I added/removed feeds not the contents of the RSS feed
But it sounds like youre talking about using rss as a source for data, not particularly tracking your feeds -- correct?
If you want to track hackernews and reddit, could you use their respective modules instead?
https://github.com/karlicoss/HPI/blob/master/my/hackernews/dogsheep.py using https://github.com/dogsheep/hacker-news-to-sqlite
https://github.com/karlicoss/HPI/tree/master/my/reddit using https://github.com/karlicoss/rexport, see setup
That'll also likely have more info than an RSS feed
from hpi.
Not saying some other functionality to use RSS as a data source couldn't be added -- but it would likely widely vary depending on the site, I'd assume
from hpi.
And just to expand a bit, HPI probably wouldnt connect to a live RSS feed -- goes against one of its core principles
Likely, we'd create some basic separate tool/repo, like export_rss_data
, which does something like:
In [25]: import requests, xmltodict; data = xmltodict.parse(requests.get("https://xkcd.com/rss.xml").text)
In [26]: data['rss']['channel']['item'][0]
Out[26]:
OrderedDict([('title', 'Consensus Time'),
('link', 'https://xkcd.com/2594/'),
('description',
'<img src="https://imgs.xkcd.com/comics/consensus_time.png" title="Now, you may argue that the varying hour lengths and feedback effects would cause chaos. To which I say, yeah, and I\'m also curious to see how the weekday cycle interacts with it! So, you in?" alt="Now, you may argue that the varying hour lengths and feedback effects would cause chaos. To which I say, yeah, and I\'m also curious to see how the weekday cycle interacts with it! So, you in?" />'),
('pubDate', 'Wed, 16 Mar 2022 04:00:00 -0000'),
('guid', 'https://xkcd.com/2594/')])
saving rss feeds you give it to a file, say in ~/data/rss_data
You'd run some cli command like export_rss_data url1 url2 url3 --to ~/data/rss_data
, which you could put in cron or run manually, to save stuff every couple hours/daily
and then a file would be created like my.feed.rss_data.py
, which can parse those files, removing any duplicates because of overlapping backups
If thats something youre interested in, let me know -- otherwise can just use the hackernews and reddit modules
from hpi.
No, I'd like to use the hackernews and reddit modules. But what do you actually put in your promnesia/config.py
file to use the my.hackernews.dogsheep
HPI module?
from hpi.
I think dogsheep is a relatively new module, so it doesnt have a corresponding source in promnesia yet.
Given its structure, it shouldn't be hard to add one -- can just use the url
and permalink
attributes on the object
I think a src/promnesia/sources/dogsheep.py
would look very similar to something like this -- where it calls the dogsheep.items
function, returning a visit for the permalink, and then using the extract_urls
helper to extract any URLs from the text.
I don't use hackernews, so is a bit hard for me to write one myself, but feel free to open an issue in promnesia or create a PR with something basic like this if you feel up to it.
from hpi.
Related Issues (20)
- my.pdfs error; updates from pdfannots
- hpi install/update command HOT 3
- HPI local installation caches Reddit exported data and does not refresh HOT 4
- mypy check fails in hpi config check: FileNotFoundError HOT 2
- Remove vendorzied python3.7 datetime.isoformat
- Make logs configurable from CLI/mention HPI_LOGS HOT 1
- Add atexit check to core/structure to warn if there are leftover files HOT 5
- configuring `all` modules: catching AttributeErrors on missing blocks? HOT 3
- smscalls: parse mms data from xml export HOT 2
- Possible feature: Parse binary data using Kaitai Struct HOT 2
- Social Media - Aggregate Updates HOT 3
- docs: add instructions on how to setup google_takeout_parser
- Email history HOT 5
- location fallback system HOT 12
- find alternative to dataset library?
- add semantic location history to my.location.google_takeout HOT 1
- improve usability/interface for photos module, use extracted geolocation data in location fallback HOT 3
- smscalls: make model stricter HOT 1
- allow user to add flag to bypass PEP 668 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hpi.