Comments (9)
The problem is that false results could lead to subsequent errors in parsing and handling of the entire feed.
Maybe it's an option to inject your own DOMDocument
or a decorator clean up / recover the feeds before they are parsed by the reader.
Originally posted by @froschdesign at zendframework/zend-feed#73 (comment)
from laminas-feed.
You can use the recovery mode yourself:
// Import by URI
$httpClient = Zend\Feed\Reader\Reader::getHttpClient();
$response = $httpClient->get(
'https://github.com/zendframework/zend-feed/releases.atom'
);
$xmlString = $response->getBody();
// Create DOMDocument
$dom = new DOMDocument;
$dom->recover = true;
$dom->loadXML(trim($xmlString));
// Detect type
$type = Zend\Feed\Reader\Reader::detectType($dom);
// Create reader
if (0 === strpos($type, 'rss')) {
$reader = new Zend\Feed\Reader\Feed\Rss($dom, $type);
}
if (0 === strpos($type, 'atom')) {
$reader = new Zend\Feed\Reader\Feed\Atom($dom, $type);
}
var_dump($reader->getTitle()); // "Release notes from zend-feed"
Originally posted by @froschdesign at zendframework/zend-feed#73 (comment)
from laminas-feed.
Thanks for help! This is indeed what I ended up doing:
https://gitlab.com/DeepRSS/Reader/blob/3667b1b10c11b9c067de1e3242f15eaf2a1de261/src/Core/Service/ZendReader/FeedParser.php#L35
Originally posted by @Isinlor at zendframework/zend-feed#73 (comment)
from laminas-feed.
@Isinlor
Thanks for the fast response! π
Can you provide a link to a feed which is malformed and needs the recovery mode?
Originally posted by @froschdesign at zendframework/zend-feed#73 (comment)
from laminas-feed.
Here is one example: http://itbrokeand.ifixit.com/atom.xml
Code I used for testing:
<?php
$libxmlErrflag = libxml_use_internal_errors(true);
$oldValue = libxml_disable_entity_loader(true);
$dom = new \DOMDocument;
//$dom->recover = true; // Allows to parse slightly malformed feeds
$status = $dom->loadXML(file_get_contents("http://itbrokeand.ifixit.com/atom.xml"));
if (!$status) {
// Build error message
$error = libxml_get_last_error();
if ($error instanceof \LibXMLError && $error->message != '') {
$error->message = trim($error->message);
$errormsg = "DOMDocument cannot parse XML: {$error->message}";
} else {
$errormsg = "DOMDocument cannot parse XML: Please check the XML document's validity";
}
throw new Exception($errormsg);
}
Originally posted by @Isinlor at zendframework/zend-feed#73 (comment)
from laminas-feed.
@Isinlor
Perfect, this helps a lot. I collect various problems to create some test scenarios.
Originally posted by @froschdesign at zendframework/zend-feed#73 (comment)
from laminas-feed.
I think your initial reaction was correct.
The problem is that false results could lead to subsequent errors in parsing and handling of the entire feed.
I missed it when I was working on it myself. But indeed, even tough $dom->recover = true;
seems to work, Zend Feed is not able to handle it correctly.
I'm really curious how Firefox handle it, because I have no issues if I open:
- https://blog.noredink.com/rss
- http://itbrokeand.ifixit.com/atom.xml
- http://aasnova.org/feed/
- https://blog.floydhub.com/rss/
Originally posted by @Isinlor at zendframework/zend-feed#73 (comment)
from laminas-feed.
@Isinlor
I will check all links this evening and will give a feedback.
Originally posted by @froschdesign at zendframework/zend-feed#73 (comment)
from laminas-feed.
https://blog.noredink.com/rss
There were some problems, but now I have not found anything.
http://itbrokeand.ifixit.com/atom.xml
Problem is <title>Web Operations D&D</title>
and therefore not well-formed. Should be reported at ifixit.com. Everything else means ugly replacements.
(Also fails in a browser.)
http://aasnova.org/feed/
Two problems: 403 and wrong header.
(Also fails in a browser. [Download])
https://blog.floydhub.com/rss/
Many feeds contain characters out of the legal range.
Try the following preg_replace
:
preg_replace(
'/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u',
' ',
$string
)
This should eliminate problems like "CData section not finished".
(Also fails in a browser.)
Thanks for the examples. At the moment I do not know if we should do something in zend-feed, because it opens the door to many pitfalls or ugly workarounds. I see the benefit for the user but also the problem of maintain.
I remain open to suggestions and improvements.
Originally posted by @froschdesign at zendframework/zend-feed#73 (comment)
from laminas-feed.
Related Issues (20)
- Add usage in a Mezzio application for laminas-feed
- Workaround for blocked Laminas user agent HOT 1
- itunes:explicit values have changed HOT 3
- Getting "failed to open stream: No such file or directory in /autoload.php on line 5" when loading package HOT 5
- Atom Feed canβt read Entry content if type is application/xml
- Replace usage of deprecated libxml_disable_entity_loader() HOT 2
- Psalm integration HOT 2
- PHP 8.0 support HOT 1
- PHPDoc return types incorrect/outdated HOT 1
- Method getDateModified of RSS reader doesn't iterate over different formats HOT 2
- PubSubHubbub/Publisher not sending requests to hubs
- Itunes categories with same main category are overwritten HOT 9
- Laminas-http mishandles multiple headers causing failure HOT 2
- feedLink() is broken, returns invalid data if not found HOT 3
- CRLF in vendor/laminas/laminas-feed/CHANGELOG.md HOT 5
- Add code example for feed-writer to add a media object via the enclosure element
- Dependency Dashboard
- Error 500 when rendering a feed with some set*() methods HOT 2
- Add support for laminas/laminas-servicemanager 4.0 so this library can be used on PHP 8.2 + psr/container 2.0. HOT 4
- DateTime::createFromFormat(): Argument #2 ($datetime) must be of type string, bool given HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from laminas-feed.