Code Monkey home page Code Monkey logo

coldsweat's Introduction

Coldsweat

Coldsweat is a self-hosted Python 3 web RSS aggregator and reader compatible with the Fever API. This means that you can connect Coldsweat to a variety of clients like Reeder for iOS or Mac OS X ReadKit app and use it to sync them together.

Screenshot

Features

  • Web interface to read and add feeds
  • Compatible with existing Fever desktop and mobile clients
  • Multi-user support
  • Basic support for grouping of similar items

Installation and quick setup

Let's see how you can take a peek at what Coldsweat offers running it on your machine.

Note: you can install Coldsweat in the main Python environment of your machine or in a virtual environment, which is the recommended approach, since its dependencies may clash with packages you have already installed. Learn more about virtual environments here.

Install

Coldsweat is Flask application distributed as a Python wheel, hence you can install it from PyPI using the, hopefully familiar, pip utility:

$ pip install coldsweat

The install procedure will also create a coldsweat command, available in your terminal.

Create a user

Once installed, create a new user specifing email and password with the setup command:

$ coldsweat setup [email protected] -p somepassword

If you prefer you can enter the password interactively:

$ coldsweat setup [email protected]  
Enter password for user [email protected]: ************
Enter password (again): ************
Setup completed for [email protected]

Email and password will be needed to access the web UI and use the Fever API sync with your favourite RSS client.

Import your feeds

Like other RSS software Coldsweat uses the OPML format to import multiple feeds with a single operation:

$ coldsweat import /path/to/subscriptions.opml [email protected] -f

The -f option tells Coldsweat to fetch the feeds right after the import step.

Fetch feeds

To update all the feeds run the fetch command:

$ coldsweat fetch 

You should use cron or similar utilities to schedule feed fetches periodically.

Run the web UI

Then you can run the Flask development web server and access the web UI:

$ coldsweat run 
* Serving Flask app 'coldsweat'
* Debug mode: off
* Running on http://127.0.0.1:5000
...

See Setup and Deploy pages for additional information.

Upgrading from a previous version

Upgrade to the latest Coldsweat version with:

$ pip install -U coldsweat

Note: there's no upgrade path from previous 0.9.x releases. Your best bet if to export OPML subscriptions and import them in the new 0.10 release.

Contributing

See Contributing page.

0.10 technical underpinnings

  • Runs on Python 3.9 and up
  • Completely rebuilt using Flask web framework
  • Supports SQLite, PostgreSQL, and MySQL databases
  • HTTP-friendly fetcher
  • Tested with latest versions of Chrome, Safari, and Firefox

Motivation

I'm fed up of online services that are here today and gone tomorrow. Years ago, after the Google Reader shutdown it was clear to me that the less we rely on external services the more the data we care about are preserved. With this in mind I'm writing Coldsweat. It is my personal take at consuming feeds today.

Coldsweat started in July 2013 as a fork of Bottle Fever by Rui Carmo. After several years of pause I've restarted to develop Coldsweat using Python 3 and the latest crop of web technologies.

coldsweat's People

Contributors

dontneedgithubaccount avatar passiomatic avatar ssheldon avatar tewe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coldsweat's Issues

Import groups from OPML files

Actually OPML import procedure ignores groups and just import feeds. Import procedure should recognize groups, create them and assign feeds accordingly.

Implement RSS/Atom feed autodiscovery

While adding a feed via web let user to specify site homepage and figure out RSS feed via autodiscovery. This is an usability boost, since sometimes it's difficult to figure out if (and where) a site exposes RSS feeds to syndicate its contents.

Autodiscovery UI is not straightforward to implement, since it has to include corner cases. The three scenarios are:

A. Web page with one feed link

  1. User copies into the location feed a web page address
  2. Coldsweat issues a GET and sniffs the page contents. The sniff routine determines that it is an actual web page and scans it looking for a relevant RSS link. If the content is a feed go to case C step 2.
  3. Coldsweat find one feed link, adds the link to the feeds collection and fetch it.

B. Web page with more than one feed link

  1. User copy into the location feed a web page address
  2. Coldsweat issues a GET and sniffs the page contents. The sniff routine determines that it is an actual web page and scans it looking for a relevant RSS link. If the content is a feed go to case C step 2.
  3. Coldsweat find more than one feed link, it shows the various feeds found to the user and allow to select one link (or more?), adds the link to the feed collection and fetch it.

C. Valid feed link (current implementation)

  1. User copy into the location feed a feed link address
  2. Coldsweat adds the link to the feed collection and fetch it.

In all three scenarios above Coldsweat needs to halts the procedure if a broken (or gone) resource is encountered.

References

OperationalError: database is locked

Unable to authenticate while fetch script is updating the database.

Traceback (most recent call last):
  File "/home/passiomatic/lab.passiomatic.com/coldsweat/coldsweat/app.py", line 219, in __call__
    appiter = self.app(environ, start_response)

  File "/home/passiomatic/lab.passiomatic.com/coldsweat/coldsweat/app.py", line 106, in dispatch_request
    status, headers, body = handler(request, filler, *args)

  File "/home/passiomatic/lab.passiomatic.com/coldsweat/coldsweat/fever.py", line 179, in endpoint
    user = User.get((User.api_key == api_key) & (User.is_enabled == True))

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 2299, in get
    return sq.get()

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1602, in get
    return clone.execute().next()

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1637, in execute
    self._qr = ResultWrapper(self.model_class, self._execute(), query_meta)

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1395, in _execute
    return self.database.execute_sql(sql, params, self.require_commit)

  File "/home/passiomatic/local/lib/python2.6/site-packages/peewee.py", line 1805, in execute_sql
    res = cursor.execute(sql, params or ())

OperationalError: database is locked

Update feed URL after a "301 Moved permanently"

Coldsweat currently tries to handle 301 status codes but fails given that Request package follows redirects automatically and response's status_code not reflect the original status returned.

To fix that Coldsweat should check the request history attr:
http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and-history

Case in point: http://feeds.feedburner.com/maxvoltar now http://maxvoltar.com/feed

HTTP/1.1 301 Moved Permanently
Location: http://maxvoltar.com/feed/
Content-Type: text/html; charset=UTF-8
Date: Sun, 28 Jul 2013 09:57:33 GMT
Expires: Sun, 28 Jul 2013 09:57:33 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked

<HTML>
<HEAD>
<TITLE>Moved Permanently</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://maxvoltar.com/feed/">here</A>.
</BODY>
</HTML>

Note: Perhaps feed.alternate_link should also be updated after a 301 message.

Add dialog to show feed information/unsubscribe

Add a (modal) dialog to show feed settings and actions. The dialog should show:

  • √ Feed URL
  • √ Feed alternate URL
  • √ Timestamp of the last check
  • √ HTTP status code on the last check
  • √ Status: disabled or enabled and error count
  • √ A list of groups where the feed appears

Buttons

  • √ button to unsubscribe the feed for current user
  • √ A button to enable feed again and reset error count

'ImportError: cannot import name run' on install

Apologies if this is a user issue as I'm not very familiar with Python.

On running the Coldsweat setup with:

python sweat.py setup

Traceback (most recent call last):
File "sweat.py", line 6, in
from coldsweat.commands import run
ImportError: cannot import name run

ReadKit crashes when used with coldsweat

This seems to be the only app for OS X supporting Fever at the moment. It crashes after you add coldsweat as a Fever source, though.
I also tested my coldsweat installation with Reeder on iOS. It works.

I know that this is most likely related to ReadKit, but I think that the devs might not support third party Fever API implementations. I'll also submit a bug report to them anyway.

Avoid multiple database Integrity errors while marking a feed/group/all as read

An IntegrityError exceptions show up in log file while marking a feed/group/all as read, due to a dumb query:

q = Entry.select().join(Feed).join(Subscription).where(
  (Subscription.user == user) &
  (Entry.last_updated_on < before)
).naive()

Coldsweat should filter the result set to exclude entries already marked as read.

Capture Requests' exceptions.DecodeError

The exception below should be catched and feed-error_count increased:

requests.packages.urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, 
but failed to decode it.', error('Error -3 while decompressing: incorrect header check',))

For an example see the techdirt RSS feed: http://www.techdirt.com/techdirt_rss.xml (Appears to be fixed now).

Publish a feed or custom entries for Coldsweat fetcher status

Every Coldsweat installation could generate a feed for its status information. Such feed would include:

  • Feeds disabled by the system due to many errors
  • Fetch results

This is not as straightforward as one may think. This imply one of these approaches:

  • Keep an history of events, a sort of log in a separate database table. Such table is queried every time such feed is requested. We can take the last 10 events/last week to be exposed.
  • Save an atom feed on disk after each status update. Such feed need to be read from disk, parsed, probably trimmed and updated on request.

Implement full-text search

Sometimes there's the need to quickly find out a feed or entries by looking up their title or content.

Possible implementation

Peewee has and extension to activate full-text search with SQLite but AFAIK none with MySQL. This poses the problem to implement the search mechanism across at least to SQLite and MySQL. Using Whoosh could be a viable solution.

Each time a entry is added to the database Whoosh database is updated too. An existing Coldsweat installation could be updated una tantum via a sweat subcommand.

Previous work

Better multiprocessing and MySQL integration to avoid "Commands out of sync" errors

While fetching feeds and using MySQL with multiprocessing sometimes database returns errors like this:

Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method Cursor.__del__ of <MySQLdb.cursors.Cursor object at 0x101826410>> ignored

This seems caused by fetcher processes sharing the same connection thus open cursors. The Typical solution is to let each spawn process should open/close its own Peewee database connection. Incidentally is what Bottle Fever does.

It seems to me that it's the closing phase that matters: after all the connection is already open by the starter process just before spawning the subprocesses. So by explicitly closing the connection helps to close open cursors.

This Peewee issue has more details: coleifer/peewee#67

Add PostgreSQL support

Peewee has excellent support for PostgreSQL so it makes sense to add it to Coldsweat too.

Periodically refresh a feed favicon

Figure out a way to refresh a feed favicon.

It probably makes sense to add a new column last_updated_on to keep track of the freshness of the icon. This way Coldsweat periodically can ask Google favicon service to fetch it again.

More on that at this point it makes sense to get rid of the Icon model altogether, since every feed has an icon (1:1 relation). Two new fields icon (Base64) e icon_last_updated_on (Datetime) would be added to Feed model.

Then the Fever icons command would cycle thru the feeds and return the Feed.id and the Feed.icon data.

sqlite issue

Syncing does not work automatically. Updating via refresh_feeds.py does work tho.

Have not tried mysql yet...

localhost - - [18/Jul/2013:11:29:16 +0000] 16543 ERROR Traceback (most recent call last):
  File "/var/coldsweat/coldsweat/app.py", line 189, in __call__
    app_iter = self.app(environ, start_response)

  File "/var/coldsweat/coldsweat/app.py", line 249, in __call__
    return self._initial(environ, start_response)

  File "/var/coldsweat/coldsweat/app.py", line 240, in _initial
    return self.app(environ, session_response)

  File "/var/coldsweat/coldsweat/app.py", line 71, in __call__
    r = view(ctx, *args)

  File "/var/coldsweat/coldsweat/fever.py", line 250, in endpoint
    handler(ctx.request, user, result)

  File "/var/coldsweat/coldsweat/fever.py", line 31, in feeds_command
    result.feeds = get_feeds_for_user(user)

  File "/var/coldsweat/coldsweat/fever.py", line 304, in get_feeds_for_user
    'favicon_id'          : feed.icon.id,

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 610, in __get__
    return self.get_object_or_id(instance)

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 601, in get_object_or_id
    obj = self.rel_model.get(self.rel_model._meta.primary_key==rel_id)

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 2295, in get
    return sq.get()

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 1609, in get
    return clone.execute().next()

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 1131, in next
    obj = self.iterate()

  File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 1115, in iterate
    row = self.cursor.fetchone()

ProgrammingError: Cannot operate on a closed database.

Use new peewee.IntegrityError

Peewee 2.1.7 added a common set of exceptions to wrap DB-API 2 driver-specific exception classes, e.g. peewee.IntegrityError.

Aggressively process entry content for Readability-like formatting

Currently Coldsweat does very little to format feed entries. It optionally parses entries looking for images and links having blacklisted domains and removes it and nothing more.

This yields to entries which are mostly rendered as-is then applying generic CSS styles which mostly works. However, there are entries which are written like this:

Aenean lacinia bibendum nulla sed consectetur. Sed posuere consectetur est at lobortis. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Cras justo odio, dapibus ac facilisis in, egestas eget quam.
<br>
<br>
Etiam porta sem malesuada magna mollis euismod. Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Cras mattis consectetur purus sit amet fermentum. Vestibulum id ligula porta felis euismod semper. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.

Or worse:

Etiam porta sem malesuada magna mollis euismod. Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Cras mattis consectetur purus sit amet fermentum. Vestibulum id ligula porta felis euismod semper. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
<br>
<br>
<br>
<br>
<br>
<br>
[eof]

This causes a huge padding added at the end of entry. Which makes me think that there's room for improvement.

One idea is to pass each entry content thru a processor which strips most of the HTML tags, keeping only the necessary formatting hints. Think of something like HTML -> Markdown and then Markdown -> HTML.

Empty elements

Empty elements like <p></p> or <td></td>will be stripped. Multiple consecutive occurrences of <br> will be removed too.

Allowed tags

Non empty tags left as-is while parsing will be: p, table and all its child elements, ul, ol, dl, li, dt, dd, bold, blockquote, strong, i, em, code, var, kdb, img, figure and figcaption, ecc.

Script blocks

Script blocks are already removed by Feedparser.

Allowed attributes

Most formatting attributes like "style", "align", etc. will be stripped. This will help us to reformat content, especially replaced-inline elements like embedded images.

References

Do not fetch a feed if no one is subscribed

A future version of Coldsweat could allow a user to unsubscribe a feed (which is different to disable a feed), see issue #43. Hence a query should be performed to check if at least one user is subscribed to a given feed.

Fix behavior on "mark all as read"command

Reeder issue a mark=group with id=0 to mark everything as read. Actually Coldsweat issue a INFO:

 localhost - - [30/Jun/2013:15:01:38 +0000] 5904 INFO could not find requested group 0, ignored

See also issue #3.

Fetch and show entry comments

It would be nice to track comments to a given entry and display those in the detail view. This feature should be off by default, since it creates a significant overhead to the fetch process.

The CommentAPI and Slash namescape

The CommentAPI namespace (WFW) allows to specify a related comments feed. Recent versions of WordPress support this feature, so potentially every WP installation has this information already available:

<rss version="2.0"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    >
       ...

So for a given feed entry we have the corresponding feed URL:

<wfw:commentRss>http://redpunk.com/articoli/su-vice-com-un-commento-che-merita/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>

Also note the slash:comments specifiy the numbers of comments. This information could be used to avoid to keep fetch empty comments feeds.

Feedparser recognizes both WFW and Slash namespaces so extract values is trivial.

Where to show the comments

Comments should not mess up unread, saved, etc. lists in the aggregator app nor in the web reader. An idea is to append the comments as an HTML list after the entry content before serving the entry via Fever or web reader.

Changing the database schema

Should comments be stored in a dedicated "comments" table, with a reference to the parent entry along with author and timestamp information? Aren't comments entries too?

What to do with the comment feeds?

Should Coldsweat add the comment feeds to its database? This creates a number of feeds which soon become inactive since people are unlikely to comment older entries. Discussion on an entry naturally dies after few days.

More References

Add configuration for nginx with FastCGI

To make coldsweat work with nginx and fcgi it needs to bind to a socket, I changed the dispatch.fcgi accordingly.

#!/usr/bin/env python
"""
Boostrap file for FastCGI environments
"""

try:
    from flup.server.fcgi_fork import WSGIServer
except ImportError, exc:
    print 'Error: unable to import Flup package.\nColdsweat needs Flup to run as a FastCGI process.\nDownload it from PyPI: http://pypi.python.org/pypi/flup'
    raise exc

from coldsweat.app import setup_app

if __name__ == '__main__':
    WSGIServer(setup_app(),bindAddress='/tmp/coldsweat-fcgi.sock').run()

A nginx site config that at least makes the API work looks like this:

server {
    server_name coldsweat.example.com;
    location / {
        include fastcgi_params;
        fastcgi_param PATH_INFO $fastcgi_script_name;
        fastcgi_param SCRIPT_NAME "";
        fastcgi_pass unix:/tmp/coldsweat-fcgi.sock;
    }
}

I still got the error that the webinterface can't find the static folder for css. Setting the document root also didn't help. Maybe somebody else can help me with that.

Implement entry content "scrubbing"

Sometimes there's some garbage in a feed entry: link ads, "share this" junk, 1x1 img elements added for tracking purposes.

It would be cool if, right after feed fetching there was a "scrubber" which removes all unwanted markup, filtering out links and images using a blacklist containing hostnames, e.g.: feeds.feedburner.com, feedsportal.com, etc.

Such hostname blacklist should be configurable, possibly using an external "scrubfile" or the regular etc/conf file.

Faster feed refresh?

I have 60 feeds in my reading list (I know, my life is cluttered...), and the refresh takes quite a bit of time on a Raspberry Pi (ednit: was RasPi).

real 0m4.135s
user 0m3.720s
sys 0m0.260s

I'm wondering it the code could be somewhat made faster?

Optimize database queries

Several queries spawn quite a lot of subqueries due to the way they are constructed. The obvious ones in fever.py are:

  • get_feeds, see feed.icon.id
  • get_feed_groups, see s.feed.id

The underlying problem is that xxx.icon.id and xxx.feed.id fields are not formally retrieved by Peewee, even if they are there in the record, so an extra query per row is performed.

The Peewee .switch(model) method could do the trick: http://peewee.readthedocs.org/en/latest/peewee/api.html#Query.switch

Dashboard question

Hello,

my dashboard looks completely different and I cannot add a user. I only see the top right box called "Configure Your Feed Reader".

I attached a screenshot to make the problem more clear:
coldsweat

Caught exceptions are still logged

A client may mark items read multiple times due to bad connectivity.

fever.py line 164 deals with that, but peewee still generates a scary log entry.

I noticed that the code uses exceptions-as-control-flow all over the place. Is this a good idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.