The reader from lemon24

get_entries() query chunking too slow

74edf72 makes get_entries() too slow for practical use, so I disabled it.

For a DB with 1000 unread entries (1500 total):

with no chunking, the first page loads in 2 seconds, and the process RSS goes from 29M to 33M for a clean start
with the default chunk size (256), the first page doesn't finish loading (stopped it after 5 minutes), and the process RSS goes from 29M to over 500M

Tests unrelated to feed updating are parametrized by feed type

Tests unrelated to feed updating are parametrized by feed type, which doesn't really test anything. E.g.:

Lines 134 to 135 in 1a56d29

    
           @pytest.mark.parametrize('feed_type', ['rss', 'atom']) 
        
           def test_mark_as_read_unread(reader, feed_type):

reader/tests/test_reader.py

Lines 163 to 164 in 1a56d29

    
           @pytest.mark.parametrize('feed_type', ['rss', 'atom']) 
        
           def test_add_remove_feed(reader, feed_type):

get_entries() can lock the database longer than needed

get_entries() generates the entries from a database cursor; if I understand this correctly, this means that an unconsumed get_entries() will hold a shared lock until it is garbage collected, preventing any writes to the database.

By default, the reader API should minimize the amount of time the database cannot be written to (possibly with the risk of having some missing/duplicated entries).

This can be fixed with scrolling window queries (i.e. pagination) called behind the scenes by get_entries().

Cannot add a feed from the web app

I want to be able to add a feed from the web app instead of the CLI.

Entries need to be removed before removing a feed

When removing a feed, its entries need to be removed explicitly. This wouldn't be needed if foreign keys used ON DELETE CASCADE.

reader/reader/reader.py

Lines 57 to 70 in 824a2f7

    
               def remove_feed(self, url): 
        
                   with self.db: 
        
                       self.db.execute(""" 
        
                           DELETE FROM entry_tags 
        
                           WHERE feed = :url; 
        
                       """, locals()) 
        
                       self.db.execute(""" 
        
                           DELETE FROM entries 
        
                           WHERE feed = :url; 
        
                       """, locals()) 
        
                       self.db.execute(""" 
        
                           DELETE FROM feeds 
        
                           WHERE url = :url; 
        
                       """, locals())

Add basic web app tests

Using MechanicalSoup with a WSGI app:

import flask
import requests
import wsgiadapter
import mechanicalsoup

app = flask.Flask(__name__)

@app.route('/')
def root():
    return """
        <html>
        <a href='/path'>link</a>
    """

@app.route('/path')
def path():
    return """
        <html>
        ok
    """

s = requests.Session()
s.mount('http://app/', wsgiadapter.WSGIAdapter(app))

b = mechanicalsoup.StatefulBrowser(s)

b.open('http://app/')
b.follow_link(b.links()[0])

assert b.get_url() == 'http://app/path'

Entry list for a feed with no entries is broken

The entry list for a feed with no entries returns a 404 instead of showing the title of the feed, the navigation links and some "no entries for this feed" message.

Related to #4.

There is no way to delete feeds

The reader API needs to have a delete_feed method.

API for marking a feed as stale

The reader has functionality for refreshing a whole feed, regardless of caching-related headers or the age of the feed/entries.

Historically this was used only during database migrations, and there is no way to exercise this code from the reader API; it can't be tested without modifying the database directly.

It could simply be removed. On the other hand, it might be needed for future migrations, so it makes sense to expose it internally.

Can't get entries for a single feed

There is no way of getting the entries of a single feed, so stuff like this happens:

reader/reader/app.py

Lines 44 to 45 in 3d0adea

    
           if feed_url: 
        
               entries = [(f, e) for f, e in entries if f.url == feed_url]

Add basic CLI tests

'cause there are none.

entry.content and entry.enclosures are lists of dicts

entry.content and entry.enclosures are lists of dicts, offering no guarantees to the user regarding what keys/values are available.

They should be lists of namedtuples residing in the types module.

Can't get a list of feeds

There is no way to get a list of feeds.

Clicking a feed title should lead to its entries

Entries look weird in Lynx

This is probably because entries aren't using heading tags for titles.

Skip slow tests by default

test_mark_as_read_during_get_entries with chunk_size 0 takes 5 seconds on my machine (because it's waiting for the database call to time out).

See this for how to skip it.

Can't tell if a feed exists or not

At the moment, there's no way of telling if a feed exists or not; because of this, the web app returns 404s for inexistent feeds and for feeds with no entries.

Cannot delete feed from the feed page

#15 was closed with a commit that allows deleting a feed from the feed list only.

test_storage_errors_locked should test all public methods

Currently test_storage_errors_locked only tests mark_as_read.

It's not clear how to deploy/use the webapp

What does typical usage look like?

Reader methods should raise custom exceptions

Currently the reader methods are leaking whatever exceptions the underlying storage raises; they should be raising custom exceptions. Also, some methods should be raising additional exceptions.

all methods
- failure in talking with the underlying storage (open/read/write)
add_feed
- feed exists already
remove_feed
- feed does not exist
update_feed
- feed does not exist
- issues getting/parsing feed; feedparser.parse seems to swallow all exceptions, including network issues; is this desirable?
update feeds
- issues getting/parsing feeds; update_feeds tries to go through all the feeds, so it should probably suppress most exceptions update_feed would raise
mark_as_read and mark_as_unread
- feed/entry does not exist

Make slow tests less slow

Some of the slow tests that wait for an SQL command to time out (namely test_update_blocking and test_storage_errors_locked) can be made faster by lowering the busy_timeout (which defaults to 5 seconds).

E.g. for test_storage_errors_locked, PRAGMA busy_timeout = 0; decreased test time from 5s to less than 0.3s.

There is no public API for getting only read/unread entries

There is no public API for getting only read/unread entries.

The provisional interface isn't necessarily pretty and has no tests (which caused #16).

reader/reader/reader.py

Line 268 in 1df87b1

def get_entries(self, _unread_only=False, _read_only=False):

Cannot see a list of all the feeds

There is no way to see a list of all the feeds.

Depends on #19 .

Feed link broken for new feeds

New feeds have a link to http://localhost:8080/None before being updated.

Reader is strongly coupled with feedparser.parse()

Reader is strongly coupled with feedparser.parse().

This makes it hard to mock feed retrieving/parsing; currently tests write a feed to disk, so they're slower than they could be.

Fixing this should also help with #22.

Define an exhaustive list of user interactions

Part of #41.

I need to get the tags of an entry to see if it's read

At the moment, an entry is read if it has the read tag; the read/unread status of an entry is a main feature and should be abstracted away from the underlying implementation.

Add a read attribute to Entry objects.

Related to #1.

Errors pertaining to an item shouldn't flash at the top of the page

If there's an error, a message is flashed and the user is redirected to the source page (after #35 is closed, at least).

Sometimes this isn't appropriate, e.g. if marking an entry as read or deleting a feed fails, the message is flashed at the top of the page, but the user is redirected to the same entry (so they might not see it).

Unneeded buttons on read/unread entries page

"mark all as read" isn't needed on the read entries page.

"mark all as unread" isn't needed on the unread entries page.

Map user interaction groups to pages

Part of #41.

I need to add a tag to mark an entry as read

At the moment, an entry is marked as read/unread by adding/removing the read tag; the read/unread status of an entry is a main feature and should be abstracted away from the underlying implementation.

Add methods to mark an entry as read/unread.

The methods should be idempotent (e.g. no exception should be raised when marking as read an already read entry).

Related to #2.

See if the SQLite result code can be obtained from Python

During #21, it was noted that sqlite3 exceptions don't expose the underlying result code.

In principle, one could get them with CFFI; see if it's possible.

(This is overkill of the highest kind, but it will be fun.)

get_feeds() feed order should not be arbitrary

More precisely, feeds should be sorted by title.

mark-as-read/unread lead to an error page on error

They should flash a message and redirect to the source page instead.

Entry tags not needed

Entry tags are currently used to store the read/unread status of an issue. I can't think of any other use case for them, so maybe they're not needed; remove them.

Depends on #1 and #2.

DB cannot be written to for the whole time update_feeds runs

The DB cannot be written to for the whole time update_feeds runs, probably because the cursor getting the feeds is open, which holds a shared lock.

This time can be too long if external requests made during update_feeds (e.g. feedparser.parse) are slow or time out.

Podcast MP3 files (enclosures) have inconsistent tags

Cannot delete a feed from the web app

I want to be able to delete a feed from the web app.

Entries with no title don't have a clickable link

Entries with no title don't have a clickable link (web app).

Define controls for user interaction groups

Part of #41.

Pages aren't using semantic markup

Which is the root cause of #5. Additionally, it makes hard to style stuff.

Cannot have a custom title for a feed

This is useful in at least 2 cases:

two feeds with the same title
a feed with a non-descriptive title, like "Writing"

Pages have no design

The elements of a page have accreted wherever it was more convenient to add them.

This is causing the UI to be inconsistent and most probably hard to use for new users. Additionally, it complicates testing since there's no model of how the user interacts with the page or where the buttons are.

Fixing this will go somewhat like this:

define an exhaustive list of user interactions
put the interactions into logical groups
assign each group to a page
define controls for each group/interaction
code the pages

For each interaction, there should be two versions: one for plain HTML, and one for HTML+JS. Only the plain HTML pages should be coded for now; the JS part will be done as part of a different milestone.

All related issues will be part of the Redesign #1 milestone.

	@pytest.mark.parametrize('feed_type', ['rss', 'atom'])
	def test_mark_as_read_unread(reader, feed_type):

	@pytest.mark.parametrize('feed_type', ['rss', 'atom'])
	def test_add_remove_feed(reader, feed_type):

	def remove_feed(self, url):
	with self.db:
	self.db.execute("""
	DELETE FROM entry_tags
	WHERE feed = :url;
	""", locals())
	self.db.execute("""
	DELETE FROM entries
	WHERE feed = :url;
	""", locals())
	self.db.execute("""
	DELETE FROM feeds
	WHERE url = :url;
	""", locals())

	if feed_url:
	entries = [(f, e) for f, e in entries if f.url == feed_url]

lemon24 / reader Goto Github PK

reader's People

Contributors

Stargazers

Watchers

Forkers

reader's Issues

Recommend Projects

Recommend Topics

Recommend Org