davorg-cpan / xml-feed Goto Github PK

The CPAN module XML::Feed

Perl 100.00%

perl cpan xml web-feed hacktoberfest

xml-feed's Introduction

This is XML::Feed, an abstraction above the RSS and Atom syndication
feed formats. It supports both parsing and autodiscovery of feeds.

PREREQUISITES

    * Class::ErrorHandler
    * XML::RSS
    * XML::Atom
    * DateTime
    * DateTime::Format::Mail
    * DateTime::Format::W3CDTF
    * List::Util
    * Feed::Find
    * URI::Fetch

INSTALLATION

XML::Feed installation is straightforward. If your CPAN shell
is set up, you should just be able to do

    % perl -MCPAN -e 'install XML::Feed'

Alternatively, you can download it, unpack it, and then build it like this
(using Module::Build):

    % perl Build.PL
    % ./Build installdeps
    % ./Build
    % ./Build test

Then install it:

    % ./Build install

Six Apart / [email protected]

xml-feed's People

Contributors

Stargazers

Watchers

xml-feed's Issues

error reading html content inside rss feed

i´m trying to parse and read the content of a rss feed and getting an error.

xml feed for testing

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:Test="http://www.Test.com">
<channel>
    <title>Deportes - Test.com</title>
       <link>http://www.Test.com</link>
           <description>Últimas noticias de deportes</description>
           <item>
                    <title><![CDATA[El 'Chacho' Coudet es el nuevo entrenador de Rosario Central]]></title>
                    <link>http://442.Test.com/2014-12-15-326653-coudet-fue-presentado-como-nuevo-dt-de-central/</link>
                    <description><![CDATA[El&nbsp;Chacho, Tengo mucha alegr&iacute;a y ganas de empezar a trabajar.&nbsp;No esperaba que sea ac&aacute;&rdquo;, reconoci&oacute;.]]></description>
                    <category><![CDATA[Deportes]]></category>
                    <pubDate>15 12 2014 06:15:0 +0000</pubDate>
                    <enclosure url="http://www.Test.com/__export/1418678333348/sites/diarioTest/img/2014/12/15/deportes/1215_coudet_g_fb.jpg" type="image/jpeg"><![CDATA[El Chacho Coudet]]></enclosure>
                    <author><![CDATA[]]></author>
                    <content><![CDATA[<p>Eduardo Coudet&nbsp;fue presentado como nuevo entrenador         deRosario Central&nbsp.</p>
]]></content>
        </item>
    </channel>
</rss>

This is my test.pl script file.

#!/usr/bin/perl
use strict;
use warnings;
use XML::Feed;

my $feed = XML::Feed->parse("test.xml");

for my $entry ($feed->entries) {

   print $entry->content;

}

when i run this code i get this error.

Can't use string ("<p>Eduardo Coudet&nbsp;fue prese"...) as a HASH ref while "st
rict refs" in use at C:/Strawberry/perl/site/lib/XML/Feed/Entry/Format/RSS.pm li
ne 91.

i think is a bug inside XML::Feed

Add version number to every module

Would it be ok to add our $VERSION = '0.51'; to every module?
That will eliminate the confusion in the CPAN client that tells me things like this:

XML::Feed::Content undef 0 DAVECROSS/XML-Feed-0.51.tar.gz

URI::Fetch dependency has critical unresolved bug

Currently module cannot be installed on latest clean Perl because URI::Fetch fails to pass Configure phase: https://rt.cpan.org/Public/Bug/Display.html?id=133491

Looks like this is only used as simple wrapper for checking 410 status:
https://github.com/davorg/xml-feed/blob/572b8de969454d9a8b95e75eb28a282ab8f4c56e/lib/XML/Feed.pm#L99

Is there any specific reason to have both LWP::UserAgent and URI::Fetch in dependencies?
With all respect to URI::Fetch author - last release was 5 years ago, critical bug is not resolved for a week and module does not have any modern CI setup that will warn about such situations. Which leads to conclusions that this module is stalled.

Looser date/time parsing should be optional

There are a few places where we're rather lax in our parsing of date/time strings. I know Postel's law, but it might be nice to make this optional.

See lib/XML/Feed/Util.pm

Use of DateTime::Format::Flexible
Use of the loose option on the DateTime::Format::Mail constructor

unused $Id$ tags?

Are those $Id$ in the code still in use or can they be removed?

Wrong datetime format for datetime with non-UTC time zone

XML::Feed::Util::format_w3cdtf function returns wrong value for a DateTime object with non-UTC time zone.

use DateTime;
use XML::Feed::Util qw(format_w3cdtf);

say format_w3cdtf(DateTime->now(time_zone => 'Asia/Tokyo'));
# Actual:   2021-05-20T18:02:33+09:00Z
# Expected: 2021-05-20T18:02:33+09:00

Date format invalidates atom feeds

See https://rt.cpan.org/Ticket/Display.html?id=48337 for details

Relative Link attributes not being processed properly [rt.cpan.org #53661]

Migrated from rt.cpan.org#53661 (status was 'open')

Requestors:

[email protected]

From [email protected] on 2010-01-13 18:00:01
:

The link attribute does not get processed correctly from:  
http://earthquake.usgs.gov/earthquakes/catalogs/7day-M5.xml

Notice the links are relative and should be appended by the base 
attribute.  To test here is a subset of the XML file.  Just copy into a 
file name 7day-M5.xml

<?xml version="1.0"?>
<feed xml:base="http://earthquake.usgs.gov/" 
xmlns="http://www.w3.org/2005/Atom" 
xmlns:georss="http://www.georss.org/georss">
  <updated>2010-01-13T17:24:37Z</updated>
  <title>USGS M5+ Earthquakes</title>
  <subtitle>Real-time, worldwide earthquake list for the past 7 
days</subtitle>
  <link rel="self" href="/earthquakes/catalogs/7day-M5.xml"/>
  <link href="http://earthquake.usgs.gov/earthquakes/"/>
  <author><name>U.S. Geological Survey</name></author>
  <id>http://earthquake.usgs.gov/</id>
  <icon>/favicon.ico</icon>
  <entry><id>urn:earthquake-usgs-gov:us:2010rkb8</id><title>M 5.3, 
Tonga</title><updated>2010-01-13T16:21:24Z</updated><link 
rel="alternate" type="text/html" 
href="/earthquakes/recenteqsww/Quakes/us2010rkb8.php"/><summary 
type="html"><![CDATA[<img 
src="http://earthquake.usgs.gov/images/globes/-15_-175.jpg" 
alt="15.741&#176;S 174.695&#176;W" align="left" hspace="20" 
/><p>Wednesday, January 13, 2010 16:21:24 UTC<br>Thursday, January 14, 
2010 06:21:24 AM at epicenter</p><p><strong>Depth</strong>: 10.00 km 
(6.21 mi)</p>]]></summary><georss:point>-15.7409 
-174.6951</georss:point><georss:elev>-10000</georss:elev><category 
label="Age" term="Past day"/></entry>
</feed>


And run this command
 perl -MXML::Feed -e 'my $feed = XML::Feed->parse("7day-M5.xml"); 
foreach my $e ($feed->entries) { print  $e->link , "\n"; } '


I could not figure out if this bug lies in XML::Feed or XML::Atom.  I 
ran the code through the debugger but for the life of me could not tell 
how it works.

Thanks for you time.


Jason

From [email protected] on 2016-02-13 09:59:48
:

On Wed Jan 13 13:00:01 2010, [email protected] wrote:
> The link attribute does not get processed correctly from:  
> http://earthquake.usgs.gov/earthquakes/catalogs/7day-M5.xml
> 
> Notice the links are relative and should be appended by the base 
> attribute.  To test here is a subset of the XML file.  Just copy into a 
> file name 7day-M5.xml
> 
> <?xml version="1.0"?>
> <feed xml:base="http://earthquake.usgs.gov/" 
> xmlns="http://www.w3.org/2005/Atom" 
> xmlns:georss="http://www.georss.org/georss">
>   <updated>2010-01-13T17:24:37Z</updated>
>   <title>USGS M5+ Earthquakes</title>
>   <subtitle>Real-time, worldwide earthquake list for the past 7 
> days</subtitle>
>   <link rel="self" href="/earthquakes/catalogs/7day-M5.xml"/>
>   <link href="http://earthquake.usgs.gov/earthquakes/"/>
>   <author><name>U.S. Geological Survey</name></author>
>   <id>http://earthquake.usgs.gov/</id>
>   <icon>/favicon.ico</icon>
>   <entry><id>urn:earthquake-usgs-gov:us:2010rkb8</id><title>M 5.3, 
> Tonga</title><updated>2010-01-13T16:21:24Z</updated><link 
> rel="alternate" type="text/html" 
> href="/earthquakes/recenteqsww/Quakes/us2010rkb8.php"/><summary 
> type="html"><![CDATA[<img 
> src="http://earthquake.usgs.gov/images/globes/-15_-175.jpg" 
> alt="15.741&#176;S 174.695&#176;W" align="left" hspace="20" 
> /><p>Wednesday, January 13, 2010 16:21:24 UTC<br>Thursday, January 14, 
> 2010 06:21:24 AM at epicenter</p><p><strong>Depth</strong>: 10.00 km 
> (6.21 mi)</p>]]></summary><georss:point>-15.7409 
> -174.6951</georss:point><georss:elev>-10000</georss:elev><category 
> label="Age" term="Past day"/></entry>
> </feed>
> 
> 
> And run this command
>  perl -MXML::Feed -e 'my $feed = XML::Feed->parse("7day-M5.xml"); 
> foreach my $e ($feed->entries) { print  $e->link , "\n"; } '
> 
> 
> I could not figure out if this bug lies in XML::Feed or XML::Atom.  I 
> ran the code through the debugger but for the life of me could not tell 
> how it works.

It's not clear to me that this is a bug. Is there some standard which says that we should be returning absolute links if the feed contains relative links?

But whether or not the current method does the right thing, we are simply passing on the value that we get from XML::Atom. Adapting your example, we get:

$ perl -MXML::Atom::Feed -e 'my $feed = XML::Atom::Feed->new("7day-M5.xml"); foreach my $e ($feed->entries) { print $e->link->href , "\n"; } '
/earthquakes/recenteqsww/Quakes/us2010rkb8.php

So if there is a bug, it is a bug in XML::Atom and should be reported there.

Dave...

From [email protected] on 2016-02-13 10:05:28
:

On Wed Jan 13 13:00:01 2010, [email protected] wrote:
> The link attribute does not get processed correctly from:  
> http://earthquake.usgs.gov/earthquakes/catalogs/7day-M5.xml
> 
> Notice the links are relative and should be appended by the base 
> attribute.  To test here is a subset of the XML file.  Just copy into a 
> file name 7day-M5.xml
> 
> <?xml version="1.0"?>
> <feed xml:base="http://earthquake.usgs.gov/" 
> xmlns="http://www.w3.org/2005/Atom" 
> xmlns:georss="http://www.georss.org/georss">
>   <updated>2010-01-13T17:24:37Z</updated>
>   <title>USGS M5+ Earthquakes</title>
>   <subtitle>Real-time, worldwide earthquake list for the past 7 
> days</subtitle>
>   <link rel="self" href="/earthquakes/catalogs/7day-M5.xml"/>
>   <link href="http://earthquake.usgs.gov/earthquakes/"/>
>   <author><name>U.S. Geological Survey</name></author>
>   <id>http://earthquake.usgs.gov/</id>
>   <icon>/favicon.ico</icon>
>   <entry><id>urn:earthquake-usgs-gov:us:2010rkb8</id><title>M 5.3, 
> Tonga</title><updated>2010-01-13T16:21:24Z</updated><link 
> rel="alternate" type="text/html" 
> href="/earthquakes/recenteqsww/Quakes/us2010rkb8.php"/><summary 
> type="html"><![CDATA[<img 
> src="http://earthquake.usgs.gov/images/globes/-15_-175.jpg" 
> alt="15.741&#176;S 174.695&#176;W" align="left" hspace="20" 
> /><p>Wednesday, January 13, 2010 16:21:24 UTC<br>Thursday, January 14, 
> 2010 06:21:24 AM at epicenter</p><p><strong>Depth</strong>: 10.00 km 
> (6.21 mi)</p>]]></summary><georss:point>-15.7409 
> -174.6951</georss:point><georss:elev>-10000</georss:elev><category 
> label="Age" term="Past day"/></entry>
> </feed>
> 
> 
> And run this command
>  perl -MXML::Feed -e 'my $feed = XML::Feed->parse("7day-M5.xml"); 
> foreach my $e ($feed->entries) { print  $e->link , "\n"; } '
> 
> 
> I could not figure out if this bug lies in XML::Feed or XML::Atom.  I 
> ran the code through the debugger but for the life of me could not tell 
> how it works.

I have just checked and, given a feed containing relative links, XML::RSS has exactly the same behaviour (the relative links are not converted to absolute links). I'm therefore becoming more convinced that our current behaviour is correct.

Dave...

XML External Entities Vulnerability

This module is vulnerable to a XML External Entities Exploit, as described here:

http://mikeknoop.com/lxml-xxe-exploit/

Try parsing the following feed on a Linux system and you'll see the contents of your /etc/passwd included in the output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE title [ <!ELEMENT title ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
    <title>The Blog</title>
    <link>http://example.com/</link>
    <description>A blog about things</description>
    <lastBuildDate>Mon, 03 Feb 2014 00:00:00 -0000</lastBuildDate>
    <item>
        <title>&xxe;</title>
        <link>http://example.com</link>
        <description>a post</description>
        <author>[email protected]</author>
        <pubDate>Mon, 03 Feb 2014 00:00:00 -0000</pubDate>
    </item>
</channel>
</rss>

[email protected]

Attachments:

signature.asc

From [email protected] on 2009-02-03 19:32:07
:

XML::Atom has a bizarre API where by default, text is returned as a string of
UTF-8 bytes without the Unicode flag set. XML::RSS::Feed doesn't do this.

To make the output of XML::Feed the same in both cases, XML::Feed should
probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" around each read
access to the XML::Atom object's accessor functions, resulting in a
switch to Unicode output that matches XML::RSS::Feed.

This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
feeds; it ends up "double-escaping" the entries as they're written into the
cache. For instance, U+8217 closing single quote goes into the cache file as
the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", rather than the correct 3-byte
sequence "\xE2\x80\x99"; the effect is as if the string was encoded as
UTF-8, decoded as Latin-1, then encoded as UTF-8 again.

    Simon

From [email protected] on 2009-07-07 12:52:13
:

I too have the same problem. And setting $XML::Atom::ForceUnicode = 1;
fixes this for me. But I'm afraid that it's a global variable and I
can't set it in my module AnyEvent::Feed which uses XML::Feed.

Greetings,
   Robin

From [email protected] on 2009-11-17 02:02:41
:

Hmm, I'm not entirely sure what the best way to handle this is - setting
ForceUnicode is kind of a nuclear option which could screw up other
modules in, say, a mod_perl environment.

I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.

From [email protected] on 2010-05-20 19:18:50
:

On Mon Nov 16 21:02:41 2009, SIMONW wrote:
> Hmm, I'm not entirely sure what the best way to handle this is - setting
> ForceUnicode is kind of a nuclear option which could screw up other
> modules in, say, a mod_perl environment.
> 
> I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.

I discovered this solution myself. I'd love to see XML::Atom have an object attribute to force 
decoding to utf8. Frankly, it should be enabled by default.

Best,

David

From [email protected] on 2011-11-24 11:28:26
:

Hi all,

I've been bitten by this bug myself now when trying to combine my
blogs.perl.org's blog feed, which is only provided in Atom (why??), into
the rest of the feeds.  The ForceUnicode setting workaround that is
described in this thread works nicely, but there should be a more
permanent solution.

Regards,

-- Shlomi Fish

From [email protected] on 2011-11-24 11:37:42
:

On Tue Feb 03 14:32:07 2009, [email protected] wrote:
> XML::Atom has a bizarre API where by default, text is returned as a
> string of UTF-8 bytes without the Unicode flag set. XML::RSS::Feed 
> doesn't do this.
> 
> To make the output of XML::Feed the same in both cases, XML::Feed
> should probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" 
> around each read access to the XML::Atom object's accessor functions, 
> resulting in a switch to Unicode output that matches XML::RSS::Feed.
> 
> This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
> feeds; it ends up "double-escaping" the entries as they're written
> into the cache. For instance, U+8217 closing single quote goes into 
> the cache file as the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", 
> rather than the correct 3-byte sequence "\xE2\x80\x99"; the effect is 
> as if the string was encoded as UTF-8, decoded as Latin-1, then 
> encoded as UTF-8 again.
> 
>    Simon

Does it make sense to discuss this here? Isn't it a bug in XML::Atom?

Or am I misunderstanding?

Dave...

From [email protected] on 2011-11-24 12:01:22
:

On Thu, 24 Nov 2011 at 06:37:43 -0500, Dave Cross via RT wrote:
> Does it make sense to discuss this here? Isn't it a bug in XML::Atom?
> 
> Or am I misunderstanding?

I agree that this needs discussion with the author of XML::Atom. I don't
know how you Cc people "correctly" in RT, it's not a bug tracker I'm
particularly familiar with.

As far as I'm concerned, the bug in X::F is that it doesn't produce the
same data type for RSS and Atom feeds (breaking encapsulation), and the
underlying bugs in X::A that make it hard for X::F to do the right
thing are:

1) produces a byte-string of UTF-8, rather than a Unicode string, by default
   (might not be considered to be a bug, since it's documented in
   XML::Atom::Feed; or might be considered to be a bug but unfixable, since
   that would be an API break)

2) can only be directed to produce Unicode by setting a global variable
   (this is an API design problem, rather than not behaving as documented)

Three possible solutions:

* If (1) is considered to be a bug, make XML::Atom::ForceUnicode the default,
  and XML::Feed doesn't need any changes; requires changes to X::A only.

* If (1) is as designed or is unfixable, fix (2) instead (e.g. add
  $feed->unicode(1) setter) and then change XML::Feed to use it; requires
  changes to both X::A and X::F. I'd be inclined to say this one is the
  most correct.

* If (1) is as designed, postprocess the XML::Atom output through
  Encode::decode('utf-8', $bytes) in XML::Feed; requires changes to X::F only,
  but will break if (1) is changed in a later version of X::A.

Which one is correct is up to you and the author of XML::Atom.

For now, IkiWiki sets "local $XML::Atom::ForceUnicode = 1" around each
invocation of XML::Feed, because we know that it's single-threaded, so the
usual problems with global variables are less of a concern. I realise this
would be unacceptable in a library, though.

    S

Reduce code complexity

Some of the files have ridiculously high scores for code complexity.

See https://kritika.io/users/davorg/repos/2672765739262019/

Would be good to reduce some of these. Below 10 is good!

Re: XML::Feed Date Parsing [rt.cpan.org #57730]

Migrated from rt.cpan.org#57730 (status was 'new')

Requestors:

[email protected]

Attachments:

xml-feed-date-parsing.patch

From [email protected] on 2010-05-21 19:07:14
:

Here's a patch. I extracted some shitty dates from some feeds I'm parsing, plus threw in a bunch of others. The test ensures that they all work in pubDate, dc:date, dcterms:date dcterms:modified, and atom:updated. It adds dependencies on DateTime::Format::ISO8601, DateTime::Format::Flexible, and DateTime::Format::Natural.

I didn't add a parameter, as there don't seem to be any real attributes to use. Maybe I've missed something?

I've Cc'd RT so that it doesn't get lost in the shuffle.

What do you think?

Best,

David

On May 20, 2010, at 4:05 PM, David E. Wheeler wrote:

> Hi Simon,
> 
> I'm using XML::Feed for a project. It's so nice not to have to worry about all the variations in feeds. Many thanks to you and SixApart for the great module.
> 
> One place where I do have to worry, though, is with dates. There are a lot of feeds out there with invalid date formats. Take http://bestwebgallery.com/feed/ for example. It has this:
> 
> 		<pubDate>May 17, 2010</pubDate>
> 
> Irritating. I fully expect to find a lot more shitty dates. Alas, with a date like this, issued() returns undef. I'd really like to make a best effort to get at dates in all formats, as I could really use it for proper(ish) sorting.
> 
> I noticed this test in t/01-parse.t:
> 
>    $feed = XML::Feed->parse('t/samples/rss10-invalid-date.xml')
>        or die XML::Feed->errstr;
>    $entry = ($feed->entries)[0];
>    ok(!$entry->issued);   ## Should return undef, but not die.
>    ok(!$entry->modified); ## Same.
> 
> So I guess that you want to be strict by default. So What I'm thinking is adding an attribute to XML::Feed to be looser when parsing dates. If it's set to true (false by default), then it would also try DateTime::Format::Natural or perhaps DateTime::Format::Flexible. Would you be interested in such a patch?
> 
> If so, looking at Format::RSS, I see that it first tries {dc}{date} and then {PubDate}. Should I continue with that approach? Or maybe try both strict first, and then try them both again more loosely?
> 
> Thanks,
> 
> David

Can't call method "items" on an undefined value at t/26-content-encoded.t line 11.
t/26-content-encoded.t .......... 
Dubious, test returned 2 (wstat 512, 0x200)
No subtests run 
...
Can't call method "items" on an undefined value at t/28-rss-guid.t line 9.
t/28-rss-guid.t ................. 
Dubious, test returned 2 (wstat 512, 0x200)
No subtests run

[email protected]

From [email protected] on 2012-04-21 07:02:08
:

http://feeds.news.aol.com/synfeeds/artsynop/2604/rss.xml - 

Can't use string ("<a name="836437"></a><div class=") as a HASH ref while "strict refs" in 
use at /usr/lib/perl5/site_perl/5.8.8/XML/Feed/Entry/Format/RSS.pm line 60.


Looking at the code, $item->{content} should be a hash, but its not:
i$VAR1 = {
          'isPermaLink' => '',
          'link' => 'http://www.sphere.com/nation/article/attack-calls-into-question-the-
practice-of-using-afghans-to-guard-us-bases/19299603',
          'dc' => {
                    'date' => '2010-01-01T17:33:34Z'
                  },
          'content' => '<a name="836437"></a><div class="hentry a836437" reltag="National 
News">    <div class="synpTtlArt"><img 
src="http://www.aolcdn.com/aolnews/sphereeyebrow" alt="Sphere" title="Sphere" /></div>
    <h3 class="entry-title TtlArt"><a rel="bookmark" 
href="http://www.sphere.com/nation/article/attack-calls-into-question-the-practice-of-
using-afghans-to-guard-us-bases/19299603">Should Afghans Guard US Bases?</a></h3>    
<h4 class="byline">        <span class="posted"><abbr class="synpAbbr" title="2010-01-
01T12:33:34Z">posted:<span class="bylinDt"> 840 DAYS 13 HOURS AGO</span></abbr>
</span></h4>    <h4 class="byline">        <span class="filedUnder">filed under: <a 
href="http://news.aol.com/nation">National News</a>, <a 
href="http://news.aol.com/world">World News</a></span></h4>    <div class="entry-
summary"><!-- Enhancement List size = 0 -->
<div class="synpTxt">In the wake of a suicide attack that left seven CIA employees dead in 
Khost province, questions surround the use of Afghan forces to guard U.S. bases in the 
volatile country.</div>    <div class="entry-permalink"> <a rel="bookmark" 
href="http://www.sphere.com/nation/article/attack-calls-into-question-the-practice-of-
using-afghans-to-guard-us-bases/19299603">Full Coverage</a></div>    </div><div 
class="synpShrHide"></div></div>',
          'item' => '
      
      
      
      
      
      
      
      
      
    ',
          'description' => 'In the wake of a suicide attack that left seven CIA employees dead in 
Khost province, questions surround the use of Afghan forces to guard U.S. bases in the 
volatile country.',
          'http://purl.org/dc/elements/1.1/' => {
                                                'date' => '2010-01-01T17:33:34Z'
                                              },
          'title' => 'Should Afghans Guard US Bases?',
          'category' => [
                        'National News',
                        'World News'
                      ],
          'guid' => '836437',
          'pubDate' => 'Fri, 01 Jan 2010 17:33:34 GMT'
        };

From [email protected] on 2012-04-21 15:54:13
:

On Sat Apr 21 00:02:08 2012, HALKEYE wrote:
> Looking at the code, $item->{content} should be a hash, but its not:

Thanks for the report!

In order to aid diagnosis, can you provide a short, self-contained code
snippet which demonstrates this issue?

From [email protected] on 2012-04-21 16:09:01
:

It looks like that RSS feed isn't valid:

http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ffeeds.news.aol.com%2Fsynfeeds%2Fartsynop%2F2604%2Frss.xml

But I don't think those errors should lead to the errors that you're seeing.

Investigating further.

Cheers,

Dave...

Relative Link attributes not being processed properly

See https://rt.cpan.org/Ticket/Display.html?id=53661 for details

XML::Feed::Entry::Format::Atom loses tz information when parsing dates [rt.cpan.org #92763]

Migrated from rt.cpan.org#92763 (status was 'new')

Requestors:

[email protected]

From [email protected] on 2014-02-05 20:32:34
:

Hi there -

I was using XML::Feed, and I noticed that the DateTime object from Atom 
feeds is always in UTC. It's the right time, it just loses the timezone 
offset, which is useful to know - you can figure out what timezone the 
original author was in when he wrote a piece, for example.

 From what I can tell, XML::Feed::Entry::Format::Atom is using the iso2dt 
function from XML::Atom::Util - looking at the code, you can see it 
discards the original timezone, and instead sets the dt object to be 
UTC. 
http://cpansearch.perl.org/src/MIYAGAWA/XML-Atom-0.27/lib/XML/Atom/Util.pm

Just thought I'd let you know. Thanks!

-John

use XML::Feed;

my $feed = XML::Feed->parse(URI->new('https://www.businessinsider.de/feed/gs-wds-nl-stream'));
my $entry = [$feed->entries]->[0];
$entry->issued for 1..1000000000;

It leaks memory at a rate of few MB per second on my machine.

XML::Feed version: 0.61
Perl version: 5.32.1

XML::Feed: Atom feeds come out as bytes, but RSS as Unicode

See https://rt.cpan.org/Ticket/Display.html?id=43004 for details

Anomaly in visibility of id in RSS?

When parsing RSS the id method sometimes returns the link.
This seems to depend on whether isPermaLink is true or false. See the demonstration below.

Is this expected behaviour? If so, why?

Modified t/samples/rss20.xml:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<channel>
<title>First Weblog</title>
<link>http://localhost/weblog/</link>
<description>This is a test weblog.</description>
<language>en-us</language>
<copyright>Copyright 2004</copyright>
<lastBuildDate>Sat, 29 May 2004 23:39:25 -0800</lastBuildDate>
<pubDate>Sat, 29 May 2004 23:39:57 -0800</pubDate>
<generator>http://www.movabletype.org/?v=3.0D</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 
<webMaster>Melody</webMaster>

<item>
<title>Entry Two - link and id differ, isPermaLink="true"</title>
<description>Hello!...</description>
<xhtml:body><![CDATA[<p>Hello!</p>]]></xhtml:body>
<link>http://localhost/weblog/2004/05/entry_two.html</link>
<author>Melody</author>
<guid isPermaLink="true">http://localhost/weblog/2004/05/alternative_url.html</guid>
<category>Travel</category>
<pubDate>Sat, 29 May 2004 23:39:25 -0800</pubDate>
</item>

<item>
<title>Entry Two - link and id differ, isPermaLink="false"</title>
<description>Hello!...</description>
<xhtml:body><![CDATA[<p>Hello!</p>]]></xhtml:body>
<link>http://localhost/weblog/2004/05/entry_two.html</link>
<author>Melody</author>
<guid isPermaLink="false">http://localhost/weblog/2004/05/alternative_url.html</guid>
<category>Travel</category>
<pubDate>Sat, 29 May 2004 23:39:25 -0800</pubDate>
</item>

</channel>
</rss>

Processed with eg/check_feed.pl:

Title:     First Weblog
Tagline:   This is a test weblog.
Format:    RSS 2.0
Author:    Melody
Link:      http://localhost/weblog/
Base:      
Language:  en-us
Copyright: Copyright 2004
Modified:  2004-05-29T23:39:57
Generator: http://www.movabletype.org/?v=3.0D

    Link:      http://localhost/weblog/2004/05/entry_two.html
    Author:    Melody
    Title:     Entry Two - link and id differ, isPermaLink="true"
    Caregory:  Travel
    Id:        http://localhost/weblog/2004/05/entry_two.html
    Issued:    2004-05-29T23:39:25
    Modified:  
    Lat:       
    Long:      
    Format:    RSS 2.0
    Tags:      Travel
    Enclosure: 
    Summary:   Hello!...
    Content:   <p>Hello!</p>

    Link:      http://localhost/weblog/2004/05/entry_two.html
    Author:    Melody
    Title:     Entry Two - link and id differ, isPermaLink="false"
    Caregory:  Travel
    Id:        http://localhost/weblog/2004/05/alternative_url.html
    Issued:    2004-05-29T23:39:25
    Modified:  
    Lat:       
    Long:      
    Format:    RSS 2.0
    Tags:      Travel
    Enclosure: 
    Summary:   Hello!...
    Content:   <p>Hello!</p>

davorg-cpan / xml-feed Goto Github PK

xml-feed's Introduction

xml-feed's People

Contributors

Stargazers

Watchers

Forkers

xml-feed's Issues

Recommend Projects

Recommend Topics

Recommend Org