lemon24 / reader Goto Github PK
View Code? Open in Web Editor NEWA Python feed reader library.
Home Page: https://reader.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
A Python feed reader library.
Home Page: https://reader.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
74edf72 makes get_entries() too slow for practical use, so I disabled it.
For a DB with 1000 unread entries (1500 total):
get_entries() generates the entries from a database cursor; if I understand this correctly, this means that an unconsumed get_entries() will hold a shared lock until it is garbage collected, preventing any writes to the database.
By default, the reader API should minimize the amount of time the database cannot be written to (possibly with the risk of having some missing/duplicated entries).
This can be fixed with scrolling window queries (i.e. pagination) called behind the scenes by get_entries().
I want to be able to add a feed from the web app instead of the CLI.
When removing a feed, its entries need to be removed explicitly. This wouldn't be needed if foreign keys used ON DELETE CASCADE
.
Lines 57 to 70 in 824a2f7
Using MechanicalSoup with a WSGI app:
import flask
import requests
import wsgiadapter
import mechanicalsoup
app = flask.Flask(__name__)
@app.route('/')
def root():
return """
<html>
<a href='/path'>link</a>
"""
@app.route('/path')
def path():
return """
<html>
ok
"""
s = requests.Session()
s.mount('http://app/', wsgiadapter.WSGIAdapter(app))
b = mechanicalsoup.StatefulBrowser(s)
b.open('http://app/')
b.follow_link(b.links()[0])
assert b.get_url() == 'http://app/path'
The entry list for a feed with no entries returns a 404 instead of showing the title of the feed, the navigation links and some "no entries for this feed" message.
Related to #4.
The reader API needs to have a delete_feed method.
The reader has functionality for refreshing a whole feed, regardless of caching-related headers or the age of the feed/entries.
Historically this was used only during database migrations, and there is no way to exercise this code from the reader API; it can't be tested without modifying the database directly.
It could simply be removed. On the other hand, it might be needed for future migrations, so it makes sense to expose it internally.
There is no way of getting the entries of a single feed, so stuff like this happens:
Lines 44 to 45 in 3d0adea
'cause there are none.
entry.content and entry.enclosures are lists of dicts, offering no guarantees to the user regarding what keys/values are available.
They should be lists of namedtuples residing in the types module.
There is no way to get a list of feeds.
This is probably because entries aren't using heading tags for titles.
At the moment, there's no way of telling if a feed exists or not; because of this, the web app returns 404s for inexistent feeds and for feeds with no entries.
#15 was closed with a commit that allows deleting a feed from the feed list only.
Currently test_storage_errors_locked only tests mark_as_read.
What does typical usage look like?
Currently the reader methods are leaking whatever exceptions the underlying storage raises; they should be raising custom exceptions. Also, some methods should be raising additional exceptions.
add_feed
remove_feed
update_feed
feedparser.parse
seems to swallow all exceptions, including network issues; is this desirable?update feeds
update_feeds
tries to go through all the feeds, so it should probably suppress most exceptions update_feed
would raisemark_as_read
and mark_as_unread
Some of the slow tests that wait for an SQL command to time out (namely test_update_blocking and test_storage_errors_locked) can be made faster by lowering the busy_timeout (which defaults to 5 seconds).
E.g. for test_storage_errors_locked, PRAGMA busy_timeout = 0;
decreased test time from 5s to less than 0.3s.
There is no way to see a list of all the feeds.
Depends on #19 .
New feeds have a link to http://localhost:8080/None before being updated.
Reader
is strongly coupled with feedparser.parse()
.
This makes it hard to mock feed retrieving/parsing; currently tests write a feed to disk, so they're slower than they could be.
Fixing this should also help with #22.
Part of #41.
At the moment, an entry is read if it has the read
tag; the read/unread status of an entry is a main feature and should be abstracted away from the underlying implementation.
Add a read
attribute to Entry objects.
Related to #1.
If there's an error, a message is flashed and the user is redirected to the source page (after #35 is closed, at least).
Sometimes this isn't appropriate, e.g. if marking an entry as read or deleting a feed fails, the message is flashed at the top of the page, but the user is redirected to the same entry (so they might not see it).
"mark all as read" isn't needed on the read entries page.
"mark all as unread" isn't needed on the unread entries page.
Part of #41.
At the moment, an entry is marked as read/unread by adding/removing the read
tag; the read/unread status of an entry is a main feature and should be abstracted away from the underlying implementation.
Add methods to mark an entry as read/unread.
The methods should be idempotent (e.g. no exception should be raised when marking as read an already read entry).
Related to #2.
During #21, it was noted that sqlite3 exceptions don't expose the underlying result code.
In principle, one could get them with CFFI; see if it's possible.
(This is overkill of the highest kind, but it will be fun.)
More precisely, feeds should be sorted by title.
They should flash a message and redirect to the source page instead.
The DB cannot be written to for the whole time update_feeds
runs, probably because the cursor getting the feeds is open, which holds a shared lock.
This time can be too long if external requests made during update_feeds
(e.g. feedparser.parse
) are slow or time out.
I want to be able to delete a feed from the web app.
Entries with no title don't have a clickable link (web app).
Part of #41.
Which is the root cause of #5. Additionally, it makes hard to style stuff.
This is useful in at least 2 cases:
The elements of a page have accreted wherever it was more convenient to add them.
This is causing the UI to be inconsistent and most probably hard to use for new users. Additionally, it complicates testing since there's no model of how the user interacts with the page or where the buttons are.
Fixing this will go somewhat like this:
For each interaction, there should be two versions: one for plain HTML, and one for HTML+JS. Only the plain HTML pages should be coded for now; the JS part will be done as part of a different milestone.
All related issues will be part of the Redesign #1 milestone.
New feeds are not updated until the update command is run. This is annoying when using the web app.
After a feed is deleted from the feed page, the user is redirected back to the feed (which doesn't exist anymore, so they get a 404). Redirect somewhere else (e.g. the main page).
Reader has no docstrings. Add some.
On form errors, redirect to the source page instead of redirecting to next.
Clicking the "I really want to" checkbox label on the feeds page ticks the first checkbox regardless of the feed.
Part of #41.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.