passiomatic / coldsweat Goto Github PK

View Code? Open in Web Editor NEW

144.0 9.0 22.0 3.89 MB

Web RSS aggregator and reader compatible with the Fever API

Home Page: https://lab.passiomatic.com/coldsweat/

License: MIT License

Python 36.47% HTML 11.45% JavaScript 3.96% Makefile 0.51% SCSS 47.61%

rss python fever-api

coldsweat's Introduction

Coldsweat

Coldsweat is a self-hosted Python 3 web RSS aggregator and reader compatible with the Fever API. This means that you can connect Coldsweat to a variety of clients like Reeder for iOS or Mac OS X ReadKit app and use it to sync them together.

Features

Web interface to read and add feeds
Compatible with existing Fever desktop and mobile clients
Multi-user support
Basic support for grouping of similar items

Installation and quick setup

Let's see how you can take a peek at what Coldsweat offers running it on your machine.

Note: you can install Coldsweat in the main Python environment of your machine or in a virtual environment, which is the recommended approach, since its dependencies may clash with packages you have already installed. Learn more about virtual environments here.

Install

Coldsweat is Flask application distributed as a Python wheel, hence you can install it from PyPI using the, hopefully familiar, pip utility:

$ pip install coldsweat

The install procedure will also create a coldsweat command, available in your terminal.

Create a user

Once installed, create a new user specifing email and password with the setup command:

$ coldsweat setup [email protected] -p somepassword

If you prefer you can enter the password interactively:

$ coldsweat setup [email protected]  
Enter password for user [email protected]: ************
Enter password (again): ************
Setup completed for [email protected]

Email and password will be needed to access the web UI and use the Fever API sync with your favourite RSS client.

Import your feeds

Like other RSS software Coldsweat uses the OPML format to import multiple feeds with a single operation:

$ coldsweat import /path/to/subscriptions.opml [email protected] -f

The -f option tells Coldsweat to fetch the feeds right after the import step.

Fetch feeds

To update all the feeds run the fetch command:

$ coldsweat fetch

You should use cron or similar utilities to schedule feed fetches periodically.

Run the web UI

Then you can run the Flask development web server and access the web UI:

$ coldsweat run 
* Serving Flask app 'coldsweat'
* Debug mode: off
* Running on http://127.0.0.1:5000
...

See Setup and Deploy pages for additional information.

Upgrading from a previous version

Upgrade to the latest Coldsweat version with:

$ pip install -U coldsweat

Note: there's no upgrade path from previous 0.9.x releases. Your best bet if to export OPML subscriptions and import them in the new 0.10 release.

Contributing

See Contributing page.

0.10 technical underpinnings

Runs on Python 3.9 and up
Completely rebuilt using Flask web framework
Supports SQLite, PostgreSQL, and MySQL databases
HTTP-friendly fetcher
Tested with latest versions of Chrome, Safari, and Firefox

Motivation

I'm fed up of online services that are here today and gone tomorrow. Years ago, after the Google Reader shutdown it was clear to me that the less we rely on external services the more the data we care about are preserved. With this in mind I'm writing Coldsweat. It is my personal take at consuming feeds today.

Coldsweat started in July 2013 as a fork of Bottle Fever by Rui Carmo. After several years of pause I've restarted to develop Coldsweat using Python 3 and the latest crop of web technologies.

coldsweat's People

Contributors

Stargazers

Watchers

coldsweat's Issues

Fever API: implement XML response serialization

I don't know any client which uses XML serialization instead of JSON so I create the issue for documentation purposes.

Import groups from OPML files

Actually OPML import procedure ignores groups and just import feeds. Import procedure should recognize groups, create them and assign feeds accordingly.

Implement RSS/Atom feed autodiscovery

While adding a feed via web let user to specify site homepage and figure out RSS feed via autodiscovery. This is an usability boost, since sometimes it's difficult to figure out if (and where) a site exposes RSS feeds to syndicate its contents.

Autodiscovery UI is not straightforward to implement, since it has to include corner cases. The three scenarios are:

A. Web page with one feed link

User copies into the location feed a web page address
Coldsweat issues a GET and sniffs the page contents. The sniff routine determines that it is an actual web page and scans it looking for a relevant RSS link. If the content is a feed go to case C step 2.
Coldsweat find one feed link, adds the link to the feeds collection and fetch it.

B. Web page with more than one feed link

User copy into the location feed a web page address
Coldsweat issues a GET and sniffs the page contents. The sniff routine determines that it is an actual web page and scans it looking for a relevant RSS link. If the content is a feed go to case C step 2.
Coldsweat find more than one feed link, it shows the various feeds found to the user and allow to select one link (or more?), adds the link to the feed collection and fetch it.

C. Valid feed link (current implementation)

User copy into the location feed a feed link address
Coldsweat adds the link to the feed collection and fetch it.

In all three scenarios above Coldsweat needs to halts the procedure if a broken (or gone) resource is encountered.

References

OperationalError: database is locked

Unable to authenticate while fetch script is updating the database.

Traceback (most recent call last):
  File "/home/passiomatic/lab.passiomatic.com/coldsweat/coldsweat/app.py", line 219, in __call__
    appiter = self.app(environ, start_response)

  File "/home/passiomatic/lab.passiomatic.com/coldsweat/coldsweat/app.py", line 106, in dispatch_request
    status, headers, body = handler(request, filler, *args)

  File "/home/passiomatic/lab.passiomatic.com/coldsweat/coldsweat/fever.py", line 179, in endpoint
    user = User.get((User.api_key == api_key) & (User.is_enabled == True))

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 2299, in get
    return sq.get()

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1602, in get
    return clone.execute().next()

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1637, in execute
    self._qr = ResultWrapper(self.model_class, self._execute(), query_meta)

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1395, in _execute
    return self.database.execute_sql(sql, params, self.require_commit)

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1805, in execute_sql
    res = cursor.execute(sql, params or ())

OperationalError: database is locked

Allow to specify a group while adding a feed

Update feed URL after a "301 Moved permanently"

Coldsweat currently tries to handle 301 status codes but fails given that Request package follows redirects automatically and response's status_code not reflect the original status returned.

To fix that Coldsweat should check the request history attr:
http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and-history

Case in point: http://feeds.feedburner.com/maxvoltar now http://maxvoltar.com/feed

HTTP/1.1 301 Moved Permanently
Location: http://maxvoltar.com/feed/
Content-Type: text/html; charset=UTF-8
Date: Sun, 28 Jul 2013 09:57:33 GMT
Expires: Sun, 28 Jul 2013 09:57:33 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked

<HTML>
<HEAD>
<TITLE>Moved Permanently</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://maxvoltar.com/feed/">here</A>.
</BODY>
</HTML>

Note: Perhaps feed.alternate_link should also be updated after a 301 message.

Fix get_last_refreshed_on_time value

Add dialog to show feed information/unsubscribe

Add a (modal) dialog to show feed settings and actions. The dialog should show:

√ Feed URL
√ Feed alternate URL
√ Timestamp of the last check
√ HTTP status code on the last check
√ Status: disabled or enabled and error count
√ A list of groups where the feed appears

Buttons

√ button to unsubscribe the feed for current user
√ A button to enable feed again and reset error count

Impossible to subscribe HTTPS feeds

Requests gives the following exceptions:

AttributeError: VerifiedHTTPSConnection instance has no attribute '_tunnel_host'

E.g.: https://news.ycombinator.com/

Implement missing Fever API commands

Implement missing Fever API commands. The planned ones are:

mark=group (done)
mark=feed (done)
unread_recently_read

Is the XML serialization used? No idea. Actually the xml flag is always ignored. See issue #33.

Future development:

links command will always return an empty list, see #68.
spark and kindling information

Fever API documentation: https://raw.github.com/passiomatic/coldsweat/master/docs/fever_api.html

'ImportError: cannot import name run' on install

Apologies if this is a user issue as I'm not very familiar with Python.

On running the Coldsweat setup with:

python sweat.py setup

Traceback (most recent call last):
File "sweat.py", line 6, in
from coldsweat.commands import run
ImportError: cannot import name run

ReadKit crashes when used with coldsweat

This seems to be the only app for OS X supporting Fever at the moment. It crashes after you add coldsweat as a Fever source, though.
I also tested my coldsweat installation with Reeder on iOS. It works.

I know that this is most likely related to ReadKit, but I think that the devs might not support third party Fever API implementations. I'll also submit a bug report to them anyway.

Investigate on HTTPS support

Add mobile stylesheets

Add some CSS media queries to optimize layout on smartphones.

Expand relative URLs in Atom feeds

Feed like http://intertwingly.net/blog/index.atom has relative paths for link:alternate and icon URLs. Coldsweat should expand these to full URLs upon fetching.

<icon>../favicon.ico</icon>
<title>Sam Ruby</title>
...
<updated>2013-09-08T01:36:45-07:00</updated>
<link href="/blog/"/>

More info here: http://pythonhosted.org/feedparser/resolving-relative-links.html

Strip HTML from feed.title and entry.title

Avoid multiple database Integrity errors while marking a feed/group/all as read

An IntegrityError exceptions show up in log file while marking a feed/group/all as read, due to a dumb query:

q = Entry.select().join(Feed).join(Subscription).where(
  (Subscription.user == user) &
  (Entry.last_updated_on < before)
).naive()

Coldsweat should filter the result set to exclude entries already marked as read.

Capture Requests' exceptions.DecodeError

The exception below should be catched and feed-error_count increased:

requests.packages.urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, 
but failed to decode it.', error('Error -3 while decompressing: incorrect header check',))

For an example see the techdirt RSS feed: http://www.techdirt.com/techdirt_rss.xml (Appears to be fixed now).

Publish a feed or custom entries for Coldsweat fetcher status

Every Coldsweat installation could generate a feed for its status information. Such feed would include:

Feeds disabled by the system due to many errors
Fetch results

This is not as straightforward as one may think. This imply one of these approaches:

Keep an history of events, a sort of log in a separate database table. Such table is queried every time such feed is requested. We can take the last 10 events/last week to be exposed.
Save an atom feed on disk after each status update. Such feed need to be read from disk, parsed, probably trimmed and updated on request.

Extend truncated feed items using regex

Hi everybody,

it would be great if it would be possible for coldsweat to extend truncated feed items by fetching the relevant text from the website using Regex. In particular, it would be great if one could build feed specific configurations options such as in https://github.com/lformella/rss-extender.

Thanks in advance

Andy

Implement full-text search

Sometimes there's the need to quickly find out a feed or entries by looking up their title or content.

Possible implementation

Peewee has and extension to activate full-text search with SQLite but AFAIK none with MySQL. This poses the problem to implement the search mechanism across at least to SQLite and MySQL. Using Whoosh could be a viable solution.

Each time a entry is added to the database Whoosh database is updated too. An existing Coldsweat installation could be updated una tantum via a sweat subcommand.

Previous work

https://pypi.python.org/pypi/Whoosh/
Bottle Fever has a prototype implementation of full-text search within entries using whoosh.
Peewee full-text search with SQLIte docs
There is also some basic support for Peewee full-text search with MySQL

Better multiprocessing and MySQL integration to avoid "Commands out of sync" errors

While fetching feeds and using MySQL with multiprocessing sometimes database returns errors like this:

Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method Cursor.__del__ of <MySQLdb.cursors.Cursor object at 0x101826410>> ignored

This seems caused by fetcher processes sharing the same connection thus open cursors. The Typical solution is to let each spawn process should open/close its own Peewee database connection. Incidentally is what Bottle Fever does.

It seems to me that it's the closing phase that matters: after all the connection is already open by the starter process just before spawning the subprocesses. So by explicitly closing the connection helps to close open cursors.

This Peewee issue has more details: coleifer/peewee#67

Add PostgreSQL support

Peewee has excellent support for PostgreSQL so it makes sense to add it to Coldsweat too.

Periodically refresh a feed favicon

Figure out a way to refresh a feed favicon.

It probably makes sense to add a new column last_updated_on to keep track of the freshness of the icon. This way Coldsweat periodically can ask Google favicon service to fetch it again.

More on that at this point it makes sense to get rid of the Icon model altogether, since every feed has an icon (1:1 relation). Two new fields icon (Base64) e icon_last_updated_on (Datetime) would be added to Feed model.

Then the Fever icons command would cycle thru the feeds and return the Feed.id and the Feed.icon data.

sqlite issue

Syncing does not work automatically. Updating via refresh_feeds.py does work tho.

Have not tried mysql yet...

localhost - - [18/Jul/2013:11:29:16 +0000] 16543 ERROR Traceback (most recent call last):
  File "/var/coldsweat/coldsweat/app.py", line 189, in __call__
    app_iter = self.app(environ, start_response)

  File "/var/coldsweat/coldsweat/app.py", line 249, in __call__
    return self._initial(environ, start_response)

  File "/var/coldsweat/coldsweat/app.py", line 240, in _initial
    return self.app(environ, session_response)

  File "/var/coldsweat/coldsweat/app.py", line 71, in __call__
    r = view(ctx, *args)

  File "/var/coldsweat/coldsweat/fever.py", line 250, in endpoint
    handler(ctx.request, user, result)

  File "/var/coldsweat/coldsweat/fever.py", line 31, in feeds_command
    result.feeds = get_feeds_for_user(user)

  File "/var/coldsweat/coldsweat/fever.py", line 304, in get_feeds_for_user
    'favicon_id'          : feed.icon.id,

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 610, in __get__
    return self.get_object_or_id(instance)

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 601, in get_object_or_id
    obj = self.rel_model.get(self.rel_model._meta.primary_key==rel_id)

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 2295, in get
    return sq.get()

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 1609, in get
    return clone.execute().next()

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 1131, in next
    obj = self.iterate()

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 1115, in iterate
    row = self.cursor.fetchone()

ProgrammingError: Cannot operate on a closed database.

Use new peewee.IntegrityError

Peewee 2.1.7 added a common set of exceptions to wrap DB-API 2 driver-specific exception classes, e.g. peewee.IntegrityError.

Add web sessions and user authentication

Web sessions are mandatory for the merging of the UI branch. Sessions will be stored into the Coldsweat database and manipulated via the Peewee ORM.

A good chunk of needed code could be grabbed and adapted from Wsgistate module: https://pypi.python.org/pypi/wsgistate/

Implement a standalone fetch script

Implement a standalone fetch script to be triggered by cron or similar service.

Check for bozo bit after a feed has been parsed

See http://pythonhosted.org/feedparser/reference-bozo.html

This could help to ignore malformed feeds or increment Feed.error_count field.

Upgrade to WebOb 1.3.x

Aggressively process entry content for Readability-like formatting

Currently Coldsweat does very little to format feed entries. It optionally parses entries looking for images and links having blacklisted domains and removes it and nothing more.

This yields to entries which are mostly rendered as-is then applying generic CSS styles which mostly works. However, there are entries which are written like this:

Aenean lacinia bibendum nulla sed consectetur. Sed posuere consectetur est at lobortis. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Cras justo odio, dapibus ac facilisis in, egestas eget quam.
<br>
<br>
Etiam porta sem malesuada magna mollis euismod. Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Cras mattis consectetur purus sit amet fermentum. Vestibulum id ligula porta felis euismod semper. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.

Or worse:

Etiam porta sem malesuada magna mollis euismod. Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Cras mattis consectetur purus sit amet fermentum. Vestibulum id ligula porta felis euismod semper. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
<br>
<br>
<br>
<br>
<br>
<br>
[eof]

This causes a huge padding added at the end of entry. Which makes me think that there's room for improvement.

One idea is to pass each entry content thru a processor which strips most of the HTML tags, keeping only the necessary formatting hints. Think of something like HTML -> Markdown and then Markdown -> HTML.

Empty elements

Empty elements like <p></p> or <td></td>will be stripped. Multiple consecutive occurrences of <br> will be removed too.

Allowed tags

Non empty tags left as-is while parsing will be: p, table and all its child elements, ul, ol, dl, li, dt, dd, bold, blockquote, strong, i, em, code, var, kdb, img, figure and figcaption, ecc.

Script blocks

Script blocks are already removed by Feedparser.

Allowed attributes

Most formatting attributes like "style", "align", etc. will be stripped. This will help us to reformat content, especially replaced-inline elements like embedded images.

References

Readability - http://www.readability.com/

Do not fetch a feed if no one is subscribed

A future version of Coldsweat could allow a user to unsubscribe a feed (which is different to disable a feed), see issue #43. Hence a query should be performed to check if at least one user is subscribed to a given feed.

Fix behavior on "mark all as read"command

Reeder issue a mark=group with id=0 to mark everything as read. Actually Coldsweat issue a INFO:

 localhost - - [30/Jun/2013:15:01:38 +0000] 5904 INFO could not find requested group 0, ignored

Fetch and show entry comments

It would be nice to track comments to a given entry and display those in the detail view. This feature should be off by default, since it creates a significant overhead to the fetch process.

The CommentAPI and Slash namescape

The CommentAPI namespace (WFW) allows to specify a related comments feed. Recent versions of WordPress support this feature, so potentially every WP installation has this information already available:

<rss version="2.0"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    >
       ...

So for a given feed entry we have the corresponding feed URL:

<wfw:commentRss>http://redpunk.com/articoli/su-vice-com-un-commento-che-merita/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>

Also note the slash:comments specifiy the numbers of comments. This information could be used to avoid to keep fetch empty comments feeds.

Feedparser recognizes both WFW and Slash namespaces so extract values is trivial.

Where to show the comments

Comments should not mess up unread, saved, etc. lists in the aggregator app nor in the web reader. An idea is to append the comments as an HTML list after the entry content before serving the entry via Fever or web reader.

Changing the database schema

Should comments be stored in a dedicated "comments" table, with a reference to the parent entry along with author and timestamp information? Aren't comments entries too?

What to do with the comment feeds?

Should Coldsweat add the comment feeds to its database? This creates a number of feeds which soon become inactive since people are unlikely to comment older entries. Discussion on an entry naturally dies after few days.

More References

Special handling for FeedBurner feeds

While fetching entries from feeds served via FeedBurner grab link value from feedburner:origlink instead of canonical entry:link. See http://code.google.com/p/feedparser/issues/detail?id=171

This way GUID generation will be more accurate.

Write a /guide page to configure various Fever clients

Something like this:
http://blog.bazqux.com/2013/09/reeder-press-and-readkit-via-fever-api.html

Upgrade to Font Awesome 4.0.3

Add configuration for nginx with FastCGI

To make coldsweat work with nginx and fcgi it needs to bind to a socket, I changed the dispatch.fcgi accordingly.

#!/usr/bin/env python
"""
Boostrap file for FastCGI environments
"""

try:
    from flup.server.fcgi_fork import WSGIServer
except ImportError, exc:
    print 'Error: unable to import Flup package.\nColdsweat needs Flup to run as a FastCGI process.\nDownload it from PyPI: http://pypi.python.org/pypi/flup'
    raise exc

from coldsweat.app import setup_app

if __name__ == '__main__':
    WSGIServer(setup_app(),bindAddress='/tmp/coldsweat-fcgi.sock').run()

A nginx site config that at least makes the API work looks like this:

server {
    server_name coldsweat.example.com;
    location / {
        include fastcgi_params;
        fastcgi_param PATH_INFO $fastcgi_script_name;
        fastcgi_param SCRIPT_NAME "";
        fastcgi_pass unix:/tmp/coldsweat-fcgi.sock;
    }
}

I still got the error that the webinterface can't find the static folder for css. Setting the document root also didn't help. Maybe somebody else can help me with that.

API key check is case-sensitive

User.make_api_key returns lower-case hex digits, the popular Mr. Reader client offers upper-case, so login fails

Implement entry content "scrubbing"

Sometimes there's some garbage in a feed entry: link ads, "share this" junk, 1x1 img elements added for tracking purposes.

It would be cool if, right after feed fetching there was a "scrubber" which removes all unwanted markup, filtering out links and images using a blacklist containing hostnames, e.g.: feeds.feedburner.com, feedsportal.com, etc.

Such hostname blacklist should be configurable, possibly using an external "scrubfile" or the regular etc/conf file.

Implement unified management script to refresh/import feeds and user creation

It would be handy to have a batch importer and user management script baked together the current refresh script. Something like:

Start a feed refresh:

./sweat refresh

Import all feeds in the OPML file:

./sweat import mysubscriptions.xml

Add new user:

./sweat setup -u username

Avoid entries duplication if a feed appears in more than one group

Create OPML "export" command for sweat utility

Linked to #16.

Faster feed refresh?

I have 60 feeds in my reading list (I know, my life is cluttered...), and the refresh takes quite a bit of time on a Raspberry Pi (ednit: was RasPi).

real 0m4.135s
user 0m3.720s
sys 0m0.260s

I'm wondering it the code could be somewhat made faster?

Optimize database queries

Several queries spawn quite a lot of subqueries due to the way they are constructed. The obvious ones in fever.py are:

get_feeds, see feed.icon.id
get_feed_groups, see s.feed.id

The underlying problem is that xxx.icon.id and xxx.feed.id fields are not formally retrieved by Peewee, even if they are there in the record, so an extra query per row is performed.

The Peewee .switch(model) method could do the trick: http://peewee.readthedocs.org/en/latest/peewee/api.html#Query.switch

Add support for enclosure/podcast?

Se this how to: http://www.podcast411.com/howto_1.html

Dashboard question

Hello,

my dashboard looks completely different and I cannot add a user. I only see the top right box called "Configure Your Feed Reader".

I attached a screenshot to make the problem more clear:

Prefer feed:icon over favicon?

Perhaps not worth it, since feed icon is rarely specified anyway. Also a bigger icon than 16x16 should be resized to feed favicon dimensions.

See http://pythonhosted.org/feedparser/reference-feed-icon.html

Caught exceptions are still logged

A client may mark items read multiple times due to bad connectivity.

fever.py line 164 deals with that, but peewee still generates a scary log entry.

I noticed that the code uses exceptions-as-control-flow all over the place. Is this a good idea?

passiomatic / coldsweat Goto Github PK

coldsweat's Introduction

Coldsweat

Features

Installation and quick setup

Install

Create a user

Import your feeds

Fetch feeds

Run the web UI

Upgrading from a previous version

Contributing

0.10 technical underpinnings

Motivation

coldsweat's People

Contributors

Stargazers

Watchers

Forkers

coldsweat's Issues

A. Web page with one feed link

B. Web page with more than one feed link

C. Valid feed link (current implementation)

References

Buttons

python sweat.py setup

Possible implementation

Previous work

Empty elements

Allowed tags

Script blocks

Allowed attributes

References

The CommentAPI and Slash namescape

Where to show the comments

Changing the database schema

What to do with the comment feeds?

More References

Remove expired sessions

Use a log in decorator to access certain views

Recommend Projects

Recommend Topics

Recommend Org