Comments (5)
For more clarity on the duplicates issue, we require a "write" enabled developer key (must be approved) to submit new feeds to the index. Quite a few networks and platforms are auto-submitting their shows to us now, so the index is very clean. We aren't going to make this a free-for-all where someone can go rogue and just dump 5000 clone feeds in there. Nobody wants that. I've just finished up a roll-back feature where every addition gets attributed to a key and can be rolled back in batches if necessary. We want things very clean and we're moving slow to make sure that happens.
from docs-api.
I'll leave technical explanations to Dave, but we have 10 years of rss aggregation experience and have written many exceptions etc to combat this.
from docs-api.
I don't see duplicates being a real big issue. They just have to be managed. There are several ways, and of course you're never going to get them all.
Regarding Broken Feeds: I'm not sure if Dave shared this with everyone, but I believe broken feeds will be removed after a specific number of tries over a period of time.
Regarding Duplicates: The goal of the whole project isn't to become a gate keeper of the podcasting world - in fact it's just the opposite. Duplicate feeds can arise from many factors. For example, when iTunes cracked down on keyword stuffing many podcasts got removed, and then were re-added. Some directories didn't remove the deleted feeds. So that created a large amount of duplicates. And then you're going to get intentional duplicates. Where the entire feed is pretty much the same but the feed URL is different. These are pretty easy to identify based on the episode titles, release date, and episode length. The easiest way to detect would be to set the feed url as a unique key. Not a perfect solution, and of course people can add query strings to circumvent the system. Then there are what I call partial duplicates. I learned this weekend that there are just about 96,000 episodes duplicated all related to Dungeons and Dragons. I suspect these are fans that create custom feeds based on favorite game action. They don't duplicate the whole feed, but just select episodes. Should they be removed -- I think not. But to answer your question, yes duplicate and dead feeds will be addressed.
from docs-api.
Thanks for addressing this Mike. Correct on all fronts.
Broken feeds: Each feed has an "errors" counter. This counter is incremented based on the severity of the error. Each time the feed is pulled (downloaded), this happens. There is also a "parse_errors" counter for the same purpose on the parser side of things. The worse the error, the faster this counter goes up. When the puller error count tops 100, it gets marked as "dead" and the aggregators stop pulling the feed regularly, and it gets relegated to a single "error" aggregator that just gives best effort. Most errors just increment by 1. ENOTFOUND, ECONNREFUSED increment by 10. 4xx statuses increment by 4. 5xx http statuses increment by 5. If the best effort aggregator ever brings a feed back from the dead all the counters are reset to zero.
Duplicates: I haven't worried to much about this so far. As Mike says, they should be fairly easy to spot by just doing comparisons. We'll have a script at some point that will sweep across and check for the obvious ones. I'm about to create a new API endpoint listing recent feeds added to the index. That'll be a good firehose for checking shenanigans too. I'm open to any and all bright ideas on this front.
from docs-api.
Thank you very much for all the replies.
We want things very clean and we're moving slow to make sure that happens.
This is what I hoped to hear. Duplicates like unofficial mirrors or old feeds without redirect (still returning a valid feed) can make the search function pretty much unusable for average users - at least from my experience with the gpodder.net search feature.
from docs-api.
Related Issues (20)
- Multiple type issues HOT 1
- document optional properties HOT 1
- Podcasts byFeedId Endpoint bug HOT 4
- How to query episode based upon podcast? HOT 1
- Authenticate-free way of getting a feed and item information by GUID? HOT 2
- Podcasts By Feed ID endpoint `/podcasts/byfeedid` accepts non-numeric ids HOT 4
- Are URLs to podcasting platforms returned from the API? HOT 1
- Feature request: trailers API endpoint
- Feature request: add pagination for loading episodes
- Blocked by Cloudflare HOT 2
- Suggestion: Move Docs Site to Subdomain HOT 3
- Podcasts By Feed ID endpoint /podcasts/byfeedid multiple IDs returning unexpected results? HOT 12
- Incorrect description of title in by person schema HOT 1
- Enhancement - Update docs for /episodes/byfeedid, /episodes/byfeedurl, /episodes/bypodcastguid to include feedUrl, podcastGuid HOT 2
- Pagination in search endpoint HOT 5
- Recent feeds not returning all expected feeds HOT 5
- Order `/episodes/live` by `datePublished` rather than `id` HOT 1
- Wrong schema description and example for /search/byperson HOT 2
- Type inconsistency for the /podcasts/bytag endpoint HOT 4
- Strange issue with /search/byperson HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docs-api.