Code Monkey home page Code Monkey logo

feed_seeker's Introduction

Feed Seeker

It slant rhymes with "heat seeker"

Build Status Coverage

A library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the mediacloud project. An incremental improvement over feedfinder2, which was itself based on feedfinder, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death.

Quickstart

By default, the library uses requests to grab html and inspect it and find the most likely feed url:

from feed_seeker import find_feed_url

>>> find_feed_url('https://github.com/ColCarroll/feed_seeker')
'https://github.com/ColCarroll/feed_seeker/commits/master.atom'

To do a more thorough search, use generate_feed_urls, which returns more likely candidates first.

from feed_seeker import generate_feed_urls

>>> for url in generate_feed_urls('https://xkcd.com'):
...     print(url)
...
https://xkcd.com/atom.xml
https://xkcd.com/rss.xml

For the most thorough search, add a spider argument to do depth-first spidering of urls on the same hostname. Note the below call takes nearly four minutes, compared to 0.5 seconds for find_feed_url.

>>> for url in generate_feed_urls('https://github.com/ColCarroll/feed_seeker', spider=1):
...     print(url)
...
https://github.com/ColCarroll/feed_seeker/commits/master.atom
https://github.com/ColCarroll/feed_seeker/commits/a8f7b86eac2cedd9209ac5d2ddcceb293d2404c9.atom
https://github.com/ColCarroll/feed_seeker/commits/3b5245b46a10fb3647a1f08b8e584b471683fbbd.atom
https://github.com/ColCarroll/feed_seeker/commits/659311b8853c4c4a67e3b4bc67a78461d825a064.atom
https://github.com/ColCarroll/feed_seeker/commits/3e93490cb91f7652325c2fe41ef29a5be4558d6a.atom
https://github.com/index.atom
https://github.com/articles.atom
https://github.com/dfm/feedfinder2/commits/master.atom
https://github.com/ColCarroll.atom
https://github.com/blog.atom
https://github.com/blog/all.atom
https://github.com/blog/broadcasts.atom

Installation

The library is not yet available on PyPI, so installation is via github only for now:

pip install git+https://github.com/ColCarroll/feed_seeker

Differences with feedfinder2

The biggest difference is that all functions are implemented as generators, and are evaluated lazily. Candidate feed links are actually accessed and inspected to determine whether or not they are a feed, which can be quite time consuming. We expose a function to find the most likely feed link, and another to lazily generate links in rough order from most prominent to least.

There are also a few more heuristics based on our experience at mediacloud.

feed_seeker's People

Contributors

colcarroll avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.