Code Monkey home page Code Monkey logo

reader's People

Contributors

dependabot-preview[bot] avatar lemon24 avatar mirekdlugosz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reader's Issues

get_entries() query chunking too slow

74edf72 makes get_entries() too slow for practical use, so I disabled it.

For a DB with 1000 unread entries (1500 total):

  • with no chunking, the first page loads in 2 seconds, and the process RSS goes from 29M to 33M for a clean start
  • with the default chunk size (256), the first page doesn't finish loading (stopped it after 5 minutes), and the process RSS goes from 29M to over 500M

get_entries() can lock the database longer than needed

get_entries() generates the entries from a database cursor; if I understand this correctly, this means that an unconsumed get_entries() will hold a shared lock until it is garbage collected, preventing any writes to the database.

By default, the reader API should minimize the amount of time the database cannot be written to (possibly with the risk of having some missing/duplicated entries).

This can be fixed with scrolling window queries (i.e. pagination) called behind the scenes by get_entries().

Entries need to be removed before removing a feed

When removing a feed, its entries need to be removed explicitly. This wouldn't be needed if foreign keys used ON DELETE CASCADE.

reader/reader/reader.py

Lines 57 to 70 in 824a2f7

def remove_feed(self, url):
with self.db:
self.db.execute("""
DELETE FROM entry_tags
WHERE feed = :url;
""", locals())
self.db.execute("""
DELETE FROM entries
WHERE feed = :url;
""", locals())
self.db.execute("""
DELETE FROM feeds
WHERE url = :url;
""", locals())

Add basic web app tests

Using MechanicalSoup with a WSGI app:

import flask
import requests
import wsgiadapter
import mechanicalsoup

app = flask.Flask(__name__)

@app.route('/')
def root():
    return """
        <html>
        <a href='/path'>link</a>
    """

@app.route('/path')
def path():
    return """
        <html>
        ok
    """

s = requests.Session()
s.mount('http://app/', wsgiadapter.WSGIAdapter(app))

b = mechanicalsoup.StatefulBrowser(s)

b.open('http://app/')
b.follow_link(b.links()[0])

assert b.get_url() == 'http://app/path'

API for marking a feed as stale

The reader has functionality for refreshing a whole feed, regardless of caching-related headers or the age of the feed/entries.

Historically this was used only during database migrations, and there is no way to exercise this code from the reader API; it can't be tested without modifying the database directly.

It could simply be removed. On the other hand, it might be needed for future migrations, so it makes sense to expose it internally.

Skip slow tests by default

test_mark_as_read_during_get_entries with chunk_size 0 takes 5 seconds on my machine (because it's waiting for the database call to time out).

See this for how to skip it.

Can't tell if a feed exists or not

At the moment, there's no way of telling if a feed exists or not; because of this, the web app returns 404s for inexistent feeds and for feeds with no entries.

Reader methods should raise custom exceptions

Currently the reader methods are leaking whatever exceptions the underlying storage raises; they should be raising custom exceptions. Also, some methods should be raising additional exceptions.

  • all methods
    • failure in talking with the underlying storage (open/read/write)
  • add_feed
    • feed exists already
  • remove_feed
    • feed does not exist
  • update_feed
    • feed does not exist
    • issues getting/parsing feed; feedparser.parse seems to swallow all exceptions, including network issues; is this desirable?
  • update feeds
    • issues getting/parsing feeds; update_feeds tries to go through all the feeds, so it should probably suppress most exceptions update_feed would raise
  • mark_as_read and mark_as_unread
    • feed/entry does not exist

Reader is strongly coupled with feedparser.parse()

Reader is strongly coupled with feedparser.parse().

This makes it hard to mock feed retrieving/parsing; currently tests write a feed to disk, so they're slower than they could be.

Fixing this should also help with #22.

I need to get the tags of an entry to see if it's read

At the moment, an entry is read if it has the read tag; the read/unread status of an entry is a main feature and should be abstracted away from the underlying implementation.

Add a read attribute to Entry objects.

Related to #1.

Errors pertaining to an item shouldn't flash at the top of the page

If there's an error, a message is flashed and the user is redirected to the source page (after #35 is closed, at least).

Sometimes this isn't appropriate, e.g. if marking an entry as read or deleting a feed fails, the message is flashed at the top of the page, but the user is redirected to the same entry (so they might not see it).

I need to add a tag to mark an entry as read

At the moment, an entry is marked as read/unread by adding/removing the read tag; the read/unread status of an entry is a main feature and should be abstracted away from the underlying implementation.

Add methods to mark an entry as read/unread.

The methods should be idempotent (e.g. no exception should be raised when marking as read an already read entry).

Related to #2.

Entry tags not needed

Entry tags are currently used to store the read/unread status of an issue. I can't think of any other use case for them, so maybe they're not needed; remove them.

Depends on #1 and #2.

Pages have no design

The elements of a page have accreted wherever it was more convenient to add them.

This is causing the UI to be inconsistent and most probably hard to use for new users. Additionally, it complicates testing since there's no model of how the user interacts with the page or where the buttons are.

Fixing this will go somewhat like this:

  • define an exhaustive list of user interactions
  • put the interactions into logical groups
  • assign each group to a page
  • define controls for each group/interaction
  • code the pages

For each interaction, there should be two versions: one for plain HTML, and one for HTML+JS. Only the plain HTML pages should be coded for now; the JS part will be done as part of a different milestone.

All related issues will be part of the Redesign #1 milestone.

Don't show a 404 after a feed is deleted

After a feed is deleted from the feed page, the user is redirected back to the feed (which doesn't exist anymore, so they get a 404). Redirect somewhere else (e.g. the main page).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.