Code Monkey home page Code Monkey logo

dato.rss's Introduction

DATO.RSS

A seamless RSS Search Engine experience with a hint of Machine Learning.

SEED

An SQL dump of the database with over 3 million entries extracted in over a year can be downloaded at https://davidesantangelo.gumroad.com/l/nkyymb

BETA

Dato.RSS is in beta, and will likely see many changes in the near future.

If you have comments or suggestions, please send them to us using the Issues TAB.

Thanks for trying the beta!

Alt Text

Search Engine: Quickly search through the millions of available RSS feeds.

RESTful API: Turns feed data into an awesome API. The API simplifies how you handle RSS, Atom, or JSON feeds. You can add and keep track of your favourite feed data with a simple, fast and clean REST API. All entries are enriched by Machine Learning and Semantic engines.

Example

curl 'https://<yourhost>/api/searches?q=news' | json_pp

{
  "data": [
    {
      "id": "86b0f829-e300-4eef-82e1-82f34d03aff6",
      "type": "entry",
      "attributes": {
        "title": "\"Pandemic, Infodemic\": 2 Cartoon Characters Battling Fake News In Assam",
        "url": "https://www.ndtv.com/india-news/coronavirus-pandemic-infodemic-2-cartoon-characters-battling-fake-news-in-assam-2222333",
        "published_at": 1588448805,
        "body": "An English daily in Assam's Guwahati has been publishing a cartoon strip to tackle the fake news related to the coronavirus pandemic. The two central characters- \"Pandemic and Infodemic\"- are being...<img src=\"http://feeds.feedburner.com/~r/NDTV-LatestNews/~4/lEmH201Q8jI\" height=\"1\" width=\"1\" alt=\"\"/>",
        "text": "An English daily in Assam's Guwahati has been publishing a cartoon strip to tackle the fake news related to the coronavirus pandemic. The two central characters- \"Pandemic and Infodemic\"- are being...",
        "categories": [
          "all india"
        ],
        "sentiment": null,
        "parent": {
          "id": "c97bdae6-b5d1-4966-b9f3-615e29d4d47d",
          "title": "NDTV News  -  Special",
          "url": "feed:http://feeds.feedburner.com/NDTV-LatestNews",
          "rank": 99
        },
        "tags": []
      },
      "relationships": {
        "feed": {
          "data": {
            "id": "c97bdae6-b5d1-4966-b9f3-615e29d4d47d",
            "type": "feed"
          }
        }
      }
    },
  ]
}

Search

Search is just implemented with Full Text Search Postgres feature.

I used the pg_search Gem, which can be used in two ways:

Multi Search: Search across multiple models and return a single array of results. Imagine having three models: Product, Brand, and Review. Using Multi Search we could search across all of them at the same time, seeing a single set of search results. This would be perfect for adding federated search functionality to your app.

Search Scope: Search within a single model, but with greater flexibility.

    execute <<-SQL
      ALTER TABLE entries
      ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('simple', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('simple', coalesce(body,'')), 'B') ||
        setweight(to_tsvector('simple', coalesce(url,'')), 'C')
      ) STORED;
    SQL

Feed Rank

Feed Ranking is provided by openrank a free root domain authority metric based on the common search pagerank dataset. The value is normilized by

((Math.log10(domain_rank) / Math.log10(100)) * 100).round

Machine Learning

Machine Learning is provided by dandelion API Semantic Text Analytics as a service, from text to actionable data. Extract meaning from unstructured text and put it in context with a simple API.

Add Feed

You can add as many feeds as you want for the automatic crawler to handle.

https:///feeds/new

Wiki

All API documentation is in the Wiki section. Feel free to make it better, of course.

https://github.com/davidesantangelo/dato.rss/wiki

To use some features such as adding a new feed you need a token with write permission. Currently only I can enable it. In case contact me

Built With

  • Ruby on Rails — Our back end API is a Rails app. It responds to requests RESTfully in JSON.
  • PostgreSQL — Our main data store is in Postgres.
  • Redis — We use Redis as a cache and for transient data.
  • Feedjira — Feedjira is a Ruby library designed to parse feeds.
  • Dandelion — Semantic Text Analytics as a service.
  • Sidekiq — Simple, efficient background processing for Ruby.
  • JSON:API Serialization — A fast JSON:API serializer for Ruby Objects..
  • PgSearch — PgSearch builds named scopes that take advantage of PostgreSQL's full text search.
  • TailwindCSS — A utility-first CSS framework for rapidly building custom user interfaces.

Plus lots of Ruby Gems, a complete list of which is at /main/Gemfile.

Sponsor me

If you want to support me in server costs to keep dato.ess free and up, consider sponsorize! Thanks!

GitHub sponsor

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/davidesantangelo/dato.rss. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

dato.rss's People

Contributors

davidesantangelo avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dato.rss's Issues

Feed for my personal site, VA3ZZA.com

I'm assuming its ok to submit my own stuff. If not, close this issue.

My personal tech experimentation blog has the following feed:

https://va3zza.com/rss.xml

Content focuses around obscure tech, and general writeups of personal projects

Feedback on form for adding new sources

Hi there,

Thank you very much for creating dato.rss, it is quite an interesting project.

I do have one comment on the placeholder for adding new feeds, located on https://datorss.com/feeds/new : placeholder for adding new sources is a bit confusing with "add feed:https://example.com/rss" because when you try adding someting in that format

add feed:https://example.com/rss

then form returns:

bad URI(is not URI?): "http://add/ https://example.com/rss"

Maybe the placeholder wording can be a bit different like

feed:https://example.com/rss

Or something similar? It should not be a problem to open PR for this however I wanted to open this ssue because i might be missing something in how dato.rss works

Feeds for 'Rest of World'

Request to add the RSS feed for Rest of World (https://restofworld.org/)

Main RSS feed:
https://restofworld.org/feed/latest/

From their about page:

Rest of World is an international nonprofit journalism organization. We document what happens when technology, culture and the human experience collide, in places that are typically overlooked and underestimated. We believe the story about technology is as big as the world that’s using it, and that everyone — from those building technology to those using it — can benefit from a broader global perspective.

Section specific ones can be found here:
https://restofworld.org/platforms/

mention the wiki

I am increasingly interested in this project.
I returned to learn more, but found an empty README.
You may want to mention that there is more documentation in the wiki, give people a pointer where to look. Or at least have a paragraph or two of introduction.

When I last looked at it, there was a huge software stack. Kind of scared me off.

Next question, I am practicing Software archeology. Trying to figure out how this thing is bult. A paragraph or two on architecture with some links to where the code is would be most helpful. I do not have rails experience, so I am not quie sure where to look. Eventullly I will figure it out thought.

Just Brilliant

Hats off to whoever created this. You get my highest compliments. It changed my thinking.

There is a huge problem with the MSM blackout of progressive news. It is not like Tienamen Square in China, where there are literally no mentions of the massacre. The web pages are there, just the MSM algorithms do not show us the pages. How do we circumvent the the media blackout. Well with RSS.

I have quite a few RSS feeds and a custom build feed reader, I curate and republish the news, but this is really the missing piece.

Okay, so where do I find your list of RSS feeds?

I run https://GreenMaps.US. I have quite a large stack or RSS feeds.

I am not quite sure how I will interact with this project., but I am certainly going to be thinking about it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.