davorg-cpan / xml-feed Goto Github PK
View Code? Open in Web Editor NEWThe CPAN module XML::Feed
The CPAN module XML::Feed
This is XML::Feed, an abstraction above the RSS and Atom syndication feed formats. It supports both parsing and autodiscovery of feeds. PREREQUISITES * Class::ErrorHandler * XML::RSS * XML::Atom * DateTime * DateTime::Format::Mail * DateTime::Format::W3CDTF * List::Util * Feed::Find * URI::Fetch INSTALLATION XML::Feed installation is straightforward. If your CPAN shell is set up, you should just be able to do % perl -MCPAN -e 'install XML::Feed' Alternatively, you can download it, unpack it, and then build it like this (using Module::Build): % perl Build.PL % ./Build installdeps % ./Build % ./Build test Then install it: % ./Build install Six Apart / [email protected]
i´m trying to parse and read the content of a rss feed and getting an error.
xml feed for testing
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:Test="http://www.Test.com">
<channel>
<title>Deportes - Test.com</title>
<link>http://www.Test.com</link>
<description>Últimas noticias de deportes</description>
<item>
<title><![CDATA[El 'Chacho' Coudet es el nuevo entrenador de Rosario Central]]></title>
<link>http://442.Test.com/2014-12-15-326653-coudet-fue-presentado-como-nuevo-dt-de-central/</link>
<description><![CDATA[El Chacho, Tengo mucha alegría y ganas de empezar a trabajar. No esperaba que sea acá”, reconoció.]]></description>
<category><![CDATA[Deportes]]></category>
<pubDate>15 12 2014 06:15:0 +0000</pubDate>
<enclosure url="http://www.Test.com/__export/1418678333348/sites/diarioTest/img/2014/12/15/deportes/1215_coudet_g_fb.jpg" type="image/jpeg"><![CDATA[El Chacho Coudet]]></enclosure>
<author><![CDATA[]]></author>
<content><![CDATA[<p>Eduardo Coudet fue presentado como nuevo entrenador deRosario Central .</p>
]]></content>
</item>
</channel>
</rss>
This is my test.pl script file.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Feed;
my $feed = XML::Feed->parse("test.xml");
for my $entry ($feed->entries) {
print $entry->content;
}
when i run this code i get this error.
Can't use string ("<p>Eduardo Coudet fue prese"...) as a HASH ref while "st
rict refs" in use at C:/Strawberry/perl/site/lib/XML/Feed/Entry/Format/RSS.pm li
ne 91.
i think is a bug inside XML::Feed
Would it be ok to add our $VERSION = '0.51'; to every module?
That will eliminate the confusion in the CPAN client that tells me things like this:
XML::Feed::Content undef 0 DAVECROSS/XML-Feed-0.51.tar.gz
Currently module cannot be installed on latest clean Perl because URI::Fetch
fails to pass Configure phase: https://rt.cpan.org/Public/Bug/Display.html?id=133491
Looks like this is only used as simple wrapper for checking 410 status:
https://github.com/davorg/xml-feed/blob/572b8de969454d9a8b95e75eb28a282ab8f4c56e/lib/XML/Feed.pm#L99
Is there any specific reason to have both LWP::UserAgent and URI::Fetch in dependencies?
With all respect to URI::Fetch
author - last release was 5 years ago, critical bug is not resolved for a week and module does not have any modern CI setup that will warn about such situations. Which leads to conclusions that this module is stalled.
There are a few places where we're rather lax in our parsing of date/time strings. I know Postel's law, but it might be nice to make this optional.
See lib/XML/Feed/Util.pm
loose
option on the DateTime::Format::Mail constructorAre those
XML::Feed::Util::format_w3cdtf
function returns wrong value for a DateTime
object with non-UTC time zone.
use DateTime;
use XML::Feed::Util qw(format_w3cdtf);
say format_w3cdtf(DateTime->now(time_zone => 'Asia/Tokyo'));
# Actual: 2021-05-20T18:02:33+09:00Z
# Expected: 2021-05-20T18:02:33+09:00
See https://rt.cpan.org/Ticket/Display.html?id=48337 for details
Migrated from rt.cpan.org#53661 (status was 'open')
Requestors:
From [email protected] on 2010-01-13 18:00:01
:
The link attribute does not get processed correctly from:
http://earthquake.usgs.gov/earthquakes/catalogs/7day-M5.xml
Notice the links are relative and should be appended by the base
attribute. To test here is a subset of the XML file. Just copy into a
file name 7day-M5.xml
<?xml version="1.0"?>
<feed xml:base="http://earthquake.usgs.gov/"
xmlns="http://www.w3.org/2005/Atom"
xmlns:georss="http://www.georss.org/georss">
<updated>2010-01-13T17:24:37Z</updated>
<title>USGS M5+ Earthquakes</title>
<subtitle>Real-time, worldwide earthquake list for the past 7
days</subtitle>
<link rel="self" href="/earthquakes/catalogs/7day-M5.xml"/>
<link href="http://earthquake.usgs.gov/earthquakes/"/>
<author><name>U.S. Geological Survey</name></author>
<id>http://earthquake.usgs.gov/</id>
<icon>/favicon.ico</icon>
<entry><id>urn:earthquake-usgs-gov:us:2010rkb8</id><title>M 5.3,
Tonga</title><updated>2010-01-13T16:21:24Z</updated><link
rel="alternate" type="text/html"
href="/earthquakes/recenteqsww/Quakes/us2010rkb8.php"/><summary
type="html"><![CDATA[<img
src="http://earthquake.usgs.gov/images/globes/-15_-175.jpg"
alt="15.741°S 174.695°W" align="left" hspace="20"
/><p>Wednesday, January 13, 2010 16:21:24 UTC<br>Thursday, January 14,
2010 06:21:24 AM at epicenter</p><p><strong>Depth</strong>: 10.00 km
(6.21 mi)</p>]]></summary><georss:point>-15.7409
-174.6951</georss:point><georss:elev>-10000</georss:elev><category
label="Age" term="Past day"/></entry>
</feed>
And run this command
perl -MXML::Feed -e 'my $feed = XML::Feed->parse("7day-M5.xml");
foreach my $e ($feed->entries) { print $e->link , "\n"; } '
I could not figure out if this bug lies in XML::Feed or XML::Atom. I
ran the code through the debugger but for the life of me could not tell
how it works.
Thanks for you time.
Jason
From [email protected] on 2016-02-13 09:59:48
:
On Wed Jan 13 13:00:01 2010, [email protected] wrote:
> The link attribute does not get processed correctly from:
> http://earthquake.usgs.gov/earthquakes/catalogs/7day-M5.xml
>
> Notice the links are relative and should be appended by the base
> attribute. To test here is a subset of the XML file. Just copy into a
> file name 7day-M5.xml
>
> <?xml version="1.0"?>
> <feed xml:base="http://earthquake.usgs.gov/"
> xmlns="http://www.w3.org/2005/Atom"
> xmlns:georss="http://www.georss.org/georss">
> <updated>2010-01-13T17:24:37Z</updated>
> <title>USGS M5+ Earthquakes</title>
> <subtitle>Real-time, worldwide earthquake list for the past 7
> days</subtitle>
> <link rel="self" href="/earthquakes/catalogs/7day-M5.xml"/>
> <link href="http://earthquake.usgs.gov/earthquakes/"/>
> <author><name>U.S. Geological Survey</name></author>
> <id>http://earthquake.usgs.gov/</id>
> <icon>/favicon.ico</icon>
> <entry><id>urn:earthquake-usgs-gov:us:2010rkb8</id><title>M 5.3,
> Tonga</title><updated>2010-01-13T16:21:24Z</updated><link
> rel="alternate" type="text/html"
> href="/earthquakes/recenteqsww/Quakes/us2010rkb8.php"/><summary
> type="html"><![CDATA[<img
> src="http://earthquake.usgs.gov/images/globes/-15_-175.jpg"
> alt="15.741°S 174.695°W" align="left" hspace="20"
> /><p>Wednesday, January 13, 2010 16:21:24 UTC<br>Thursday, January 14,
> 2010 06:21:24 AM at epicenter</p><p><strong>Depth</strong>: 10.00 km
> (6.21 mi)</p>]]></summary><georss:point>-15.7409
> -174.6951</georss:point><georss:elev>-10000</georss:elev><category
> label="Age" term="Past day"/></entry>
> </feed>
>
>
> And run this command
> perl -MXML::Feed -e 'my $feed = XML::Feed->parse("7day-M5.xml");
> foreach my $e ($feed->entries) { print $e->link , "\n"; } '
>
>
> I could not figure out if this bug lies in XML::Feed or XML::Atom. I
> ran the code through the debugger but for the life of me could not tell
> how it works.
It's not clear to me that this is a bug. Is there some standard which says that we should be returning absolute links if the feed contains relative links?
But whether or not the current method does the right thing, we are simply passing on the value that we get from XML::Atom. Adapting your example, we get:
$ perl -MXML::Atom::Feed -e 'my $feed = XML::Atom::Feed->new("7day-M5.xml"); foreach my $e ($feed->entries) { print $e->link->href , "\n"; } '
/earthquakes/recenteqsww/Quakes/us2010rkb8.php
So if there is a bug, it is a bug in XML::Atom and should be reported there.
Dave...
From [email protected] on 2016-02-13 10:05:28
:
On Wed Jan 13 13:00:01 2010, [email protected] wrote:
> The link attribute does not get processed correctly from:
> http://earthquake.usgs.gov/earthquakes/catalogs/7day-M5.xml
>
> Notice the links are relative and should be appended by the base
> attribute. To test here is a subset of the XML file. Just copy into a
> file name 7day-M5.xml
>
> <?xml version="1.0"?>
> <feed xml:base="http://earthquake.usgs.gov/"
> xmlns="http://www.w3.org/2005/Atom"
> xmlns:georss="http://www.georss.org/georss">
> <updated>2010-01-13T17:24:37Z</updated>
> <title>USGS M5+ Earthquakes</title>
> <subtitle>Real-time, worldwide earthquake list for the past 7
> days</subtitle>
> <link rel="self" href="/earthquakes/catalogs/7day-M5.xml"/>
> <link href="http://earthquake.usgs.gov/earthquakes/"/>
> <author><name>U.S. Geological Survey</name></author>
> <id>http://earthquake.usgs.gov/</id>
> <icon>/favicon.ico</icon>
> <entry><id>urn:earthquake-usgs-gov:us:2010rkb8</id><title>M 5.3,
> Tonga</title><updated>2010-01-13T16:21:24Z</updated><link
> rel="alternate" type="text/html"
> href="/earthquakes/recenteqsww/Quakes/us2010rkb8.php"/><summary
> type="html"><![CDATA[<img
> src="http://earthquake.usgs.gov/images/globes/-15_-175.jpg"
> alt="15.741°S 174.695°W" align="left" hspace="20"
> /><p>Wednesday, January 13, 2010 16:21:24 UTC<br>Thursday, January 14,
> 2010 06:21:24 AM at epicenter</p><p><strong>Depth</strong>: 10.00 km
> (6.21 mi)</p>]]></summary><georss:point>-15.7409
> -174.6951</georss:point><georss:elev>-10000</georss:elev><category
> label="Age" term="Past day"/></entry>
> </feed>
>
>
> And run this command
> perl -MXML::Feed -e 'my $feed = XML::Feed->parse("7day-M5.xml");
> foreach my $e ($feed->entries) { print $e->link , "\n"; } '
>
>
> I could not figure out if this bug lies in XML::Feed or XML::Atom. I
> ran the code through the debugger but for the life of me could not tell
> how it works.
I have just checked and, given a feed containing relative links, XML::RSS has exactly the same behaviour (the relative links are not converted to absolute links). I'm therefore becoming more convinced that our current behaviour is correct.
Dave...
This module is vulnerable to a XML External Entities Exploit, as described here:
http://mikeknoop.com/lxml-xxe-exploit/
Try parsing the following feed on a Linux system and you'll see the contents of your /etc/passwd included in the output:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE title [ <!ELEMENT title ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>The Blog</title>
<link>http://example.com/</link>
<description>A blog about things</description>
<lastBuildDate>Mon, 03 Feb 2014 00:00:00 -0000</lastBuildDate>
<item>
<title>&xxe;</title>
<link>http://example.com</link>
<description>a post</description>
<author>[email protected]</author>
<pubDate>Mon, 03 Feb 2014 00:00:00 -0000</pubDate>
</item>
</channel>
</rss>
See https://rt.cpan.org/Ticket/Display.html?id=53884 for details
See https://rt.cpan.org/Ticket/Display.html?id=69543 for details
Migrated from rt.cpan.org#43004 (status was 'open')
Requestors:
Attachments:
From [email protected] on 2009-02-03 19:32:07
:
XML::Atom has a bizarre API where by default, text is returned as a string of
UTF-8 bytes without the Unicode flag set. XML::RSS::Feed doesn't do this.
To make the output of XML::Feed the same in both cases, XML::Feed should
probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" around each read
access to the XML::Atom object's accessor functions, resulting in a
switch to Unicode output that matches XML::RSS::Feed.
This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
feeds; it ends up "double-escaping" the entries as they're written into the
cache. For instance, U+8217 closing single quote goes into the cache file as
the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", rather than the correct 3-byte
sequence "\xE2\x80\x99"; the effect is as if the string was encoded as
UTF-8, decoded as Latin-1, then encoded as UTF-8 again.
Simon
From [email protected] on 2009-07-07 12:52:13
:
I too have the same problem. And setting $XML::Atom::ForceUnicode = 1;
fixes this for me. But I'm afraid that it's a global variable and I
can't set it in my module AnyEvent::Feed which uses XML::Feed.
Greetings,
Robin
From [email protected] on 2009-11-17 02:02:41
:
Hmm, I'm not entirely sure what the best way to handle this is - setting
ForceUnicode is kind of a nuclear option which could screw up other
modules in, say, a mod_perl environment.
I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.
From [email protected] on 2010-05-20 19:18:50
:
On Mon Nov 16 21:02:41 2009, SIMONW wrote:
> Hmm, I'm not entirely sure what the best way to handle this is - setting
> ForceUnicode is kind of a nuclear option which could screw up other
> modules in, say, a mod_perl environment.
>
> I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.
I discovered this solution myself. I'd love to see XML::Atom have an object attribute to force
decoding to utf8. Frankly, it should be enabled by default.
Best,
David
From [email protected] on 2011-11-24 11:28:26
:
Hi all,
I've been bitten by this bug myself now when trying to combine my
blogs.perl.org's blog feed, which is only provided in Atom (why??), into
the rest of the feeds. The ForceUnicode setting workaround that is
described in this thread works nicely, but there should be a more
permanent solution.
Regards,
-- Shlomi Fish
From [email protected] on 2011-11-24 11:37:42
:
On Tue Feb 03 14:32:07 2009, [email protected] wrote:
> XML::Atom has a bizarre API where by default, text is returned as a
> string of UTF-8 bytes without the Unicode flag set. XML::RSS::Feed
> doesn't do this.
>
> To make the output of XML::Feed the same in both cases, XML::Feed
> should probably use "{ local $XML::Atom::ForceUnicode = 1; ... }"
> around each read access to the XML::Atom object's accessor functions,
> resulting in a switch to Unicode output that matches XML::RSS::Feed.
>
> This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
> feeds; it ends up "double-escaping" the entries as they're written
> into the cache. For instance, U+8217 closing single quote goes into
> the cache file as the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99",
> rather than the correct 3-byte sequence "\xE2\x80\x99"; the effect is
> as if the string was encoded as UTF-8, decoded as Latin-1, then
> encoded as UTF-8 again.
>
> Simon
Does it make sense to discuss this here? Isn't it a bug in XML::Atom?
Or am I misunderstanding?
Dave...
From [email protected] on 2011-11-24 12:01:22
:
On Thu, 24 Nov 2011 at 06:37:43 -0500, Dave Cross via RT wrote:
> Does it make sense to discuss this here? Isn't it a bug in XML::Atom?
>
> Or am I misunderstanding?
I agree that this needs discussion with the author of XML::Atom. I don't
know how you Cc people "correctly" in RT, it's not a bug tracker I'm
particularly familiar with.
As far as I'm concerned, the bug in X::F is that it doesn't produce the
same data type for RSS and Atom feeds (breaking encapsulation), and the
underlying bugs in X::A that make it hard for X::F to do the right
thing are:
1) produces a byte-string of UTF-8, rather than a Unicode string, by default
(might not be considered to be a bug, since it's documented in
XML::Atom::Feed; or might be considered to be a bug but unfixable, since
that would be an API break)
2) can only be directed to produce Unicode by setting a global variable
(this is an API design problem, rather than not behaving as documented)
Three possible solutions:
* If (1) is considered to be a bug, make XML::Atom::ForceUnicode the default,
and XML::Feed doesn't need any changes; requires changes to X::A only.
* If (1) is as designed or is unfixable, fix (2) instead (e.g. add
$feed->unicode(1) setter) and then change XML::Feed to use it; requires
changes to both X::A and X::F. I'd be inclined to say this one is the
most correct.
* If (1) is as designed, postprocess the XML::Atom output through
Encode::decode('utf-8', $bytes) in XML::Feed; requires changes to X::F only,
but will break if (1) is changed in a later version of X::A.
Which one is correct is up to you and the author of XML::Atom.
For now, IkiWiki sets "local $XML::Atom::ForceUnicode = 1" around each
invocation of XML::Feed, because we know that it's single-threaded, so the
usual problems with global variables are less of a concern. I realise this
would be unacceptable in a library, though.
S
Some of the files have ridiculously high scores for code complexity.
See https://kritika.io/users/davorg/repos/2672765739262019/
Would be good to reduce some of these. Below 10 is good!
Migrated from rt.cpan.org#57730 (status was 'new')
Requestors:
Attachments:
From [email protected] on 2010-05-21 19:07:14
:
Here's a patch. I extracted some shitty dates from some feeds I'm parsing, plus threw in a bunch of others. The test ensures that they all work in pubDate, dc:date, dcterms:date dcterms:modified, and atom:updated. It adds dependencies on DateTime::Format::ISO8601, DateTime::Format::Flexible, and DateTime::Format::Natural.
I didn't add a parameter, as there don't seem to be any real attributes to use. Maybe I've missed something?
I've Cc'd RT so that it doesn't get lost in the shuffle.
What do you think?
Best,
David
On May 20, 2010, at 4:05 PM, David E. Wheeler wrote:
> Hi Simon,
>
> I'm using XML::Feed for a project. It's so nice not to have to worry about all the variations in feeds. Many thanks to you and SixApart for the great module.
>
> One place where I do have to worry, though, is with dates. There are a lot of feeds out there with invalid date formats. Take http://bestwebgallery.com/feed/ for example. It has this:
>
> <pubDate>May 17, 2010</pubDate>
>
> Irritating. I fully expect to find a lot more shitty dates. Alas, with a date like this, issued() returns undef. I'd really like to make a best effort to get at dates in all formats, as I could really use it for proper(ish) sorting.
>
> I noticed this test in t/01-parse.t:
>
> $feed = XML::Feed->parse('t/samples/rss10-invalid-date.xml')
> or die XML::Feed->errstr;
> $entry = ($feed->entries)[0];
> ok(!$entry->issued); ## Should return undef, but not die.
> ok(!$entry->modified); ## Same.
>
> So I guess that you want to be strict by default. So What I'm thinking is adding an attribute to XML::Feed to be looser when parsing dates. If it's set to true (false by default), then it would also try DateTime::Format::Natural or perhaps DateTime::Format::Flexible. Would you be interested in such a patch?
>
> If so, looking at Format::RSS, I see that it first tries {dc}{date} and then {PubDate}. Should I continue with that approach? Or maybe try both strict first, and then try them both again more loosely?
>
> Thanks,
>
> David
See https://rt.cpan.org/Ticket/Display.html?id=57730 for details
See https://rt.cpan.org/Ticket/Display.html?id=124346 for details
t/26-content-encoded.t and t/28-rss-guid.t fail on all of my smoker systems:
Can't call method "items" on an undefined value at t/26-content-encoded.t line 11.
t/26-content-encoded.t ..........
Dubious, test returned 2 (wstat 512, 0x200)
No subtests run
...
Can't call method "items" on an undefined value at t/28-rss-guid.t line 9.
t/28-rss-guid.t .................
Dubious, test returned 2 (wstat 512, 0x200)
No subtests run
See https://rt.cpan.org/Ticket/Display.html?id=76738 for details
See https://rt.cpan.org/Ticket/Display.html?id=103405 for details
Migrated from rt.cpan.org#76738 (status was 'open')
Requestors:
From [email protected] on 2012-04-21 07:02:08
:
http://feeds.news.aol.com/synfeeds/artsynop/2604/rss.xml -
Can't use string ("<a name="836437"></a><div class=") as a HASH ref while "strict refs" in
use at /usr/lib/perl5/site_perl/5.8.8/XML/Feed/Entry/Format/RSS.pm line 60.
Looking at the code, $item->{content} should be a hash, but its not:
i$VAR1 = {
'isPermaLink' => '',
'link' => 'http://www.sphere.com/nation/article/attack-calls-into-question-the-
practice-of-using-afghans-to-guard-us-bases/19299603',
'dc' => {
'date' => '2010-01-01T17:33:34Z'
},
'content' => '<a name="836437"></a><div class="hentry a836437" reltag="National
News"> <div class="synpTtlArt"><img
src="http://www.aolcdn.com/aolnews/sphereeyebrow" alt="Sphere" title="Sphere" /></div>
<h3 class="entry-title TtlArt"><a rel="bookmark"
href="http://www.sphere.com/nation/article/attack-calls-into-question-the-practice-of-
using-afghans-to-guard-us-bases/19299603">Should Afghans Guard US Bases?</a></h3>
<h4 class="byline"> <span class="posted"><abbr class="synpAbbr" title="2010-01-
01T12:33:34Z">posted:<span class="bylinDt"> 840 DAYS 13 HOURS AGO</span></abbr>
</span></h4> <h4 class="byline"> <span class="filedUnder">filed under: <a
href="http://news.aol.com/nation">National News</a>, <a
href="http://news.aol.com/world">World News</a></span></h4> <div class="entry-
summary"><!-- Enhancement List size = 0 -->
<div class="synpTxt">In the wake of a suicide attack that left seven CIA employees dead in
Khost province, questions surround the use of Afghan forces to guard U.S. bases in the
volatile country.</div> <div class="entry-permalink"> <a rel="bookmark"
href="http://www.sphere.com/nation/article/attack-calls-into-question-the-practice-of-
using-afghans-to-guard-us-bases/19299603">Full Coverage</a></div> </div><div
class="synpShrHide"></div></div>',
'item' => '
',
'description' => 'In the wake of a suicide attack that left seven CIA employees dead in
Khost province, questions surround the use of Afghan forces to guard U.S. bases in the
volatile country.',
'http://purl.org/dc/elements/1.1/' => {
'date' => '2010-01-01T17:33:34Z'
},
'title' => 'Should Afghans Guard US Bases?',
'category' => [
'National News',
'World News'
],
'guid' => '836437',
'pubDate' => 'Fri, 01 Jan 2010 17:33:34 GMT'
};
From [email protected] on 2012-04-21 15:54:13
:
On Sat Apr 21 00:02:08 2012, HALKEYE wrote:
> Looking at the code, $item->{content} should be a hash, but its not:
Thanks for the report!
In order to aid diagnosis, can you provide a short, self-contained code
snippet which demonstrates this issue?
From [email protected] on 2012-04-21 16:09:01
:
It looks like that RSS feed isn't valid:
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Ffeeds.news.aol.com%2Fsynfeeds%2Fartsynop%2F2604%2Frss.xml
But I don't think those errors should lead to the errors that you're seeing.
Investigating further.
Cheers,
Dave...
See https://rt.cpan.org/Ticket/Display.html?id=53661 for details
Migrated from rt.cpan.org#92763 (status was 'new')
Requestors:
From [email protected] on 2014-02-05 20:32:34
:
Hi there -
I was using XML::Feed, and I noticed that the DateTime object from Atom
feeds is always in UTC. It's the right time, it just loses the timezone
offset, which is useful to know - you can figure out what timezone the
original author was in when he wrote a piece, for example.
From what I can tell, XML::Feed::Entry::Format::Atom is using the iso2dt
function from XML::Atom::Util - looking at the code, you can see it
discards the original timezone, and instead sets the dt object to be
UTC.
http://cpansearch.perl.org/src/MIYAGAWA/XML-Atom-0.27/lib/XML/Atom/Util.pm
Just thought I'd let you know. Thanks!
-John
See https://rt.cpan.org/Ticket/Display.html?id=92763 for details
Is six apart really the best contact details anymore?
Steps to reproduce:
use XML::Feed;
my $feed = XML::Feed->parse(URI->new('https://www.businessinsider.de/feed/gs-wds-nl-stream'));
my $entry = [$feed->entries]->[0];
$entry->issued for 1..1000000000;
It leaks memory at a rate of few MB per second on my machine.
XML::Feed version: 0.61
Perl version: 5.32.1
See https://rt.cpan.org/Ticket/Display.html?id=43004 for details
When parsing RSS the id method sometimes returns the link.
This seems to depend on whether isPermaLink
is true or false. See the demonstration below.
Is this expected behaviour? If so, why?
Modified t/samples/rss20.xml:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<channel>
<title>First Weblog</title>
<link>http://localhost/weblog/</link>
<description>This is a test weblog.</description>
<language>en-us</language>
<copyright>Copyright 2004</copyright>
<lastBuildDate>Sat, 29 May 2004 23:39:25 -0800</lastBuildDate>
<pubDate>Sat, 29 May 2004 23:39:57 -0800</pubDate>
<generator>http://www.movabletype.org/?v=3.0D</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<webMaster>Melody</webMaster>
<item>
<title>Entry Two - link and id differ, isPermaLink="true"</title>
<description>Hello!...</description>
<xhtml:body><![CDATA[<p>Hello!</p>]]></xhtml:body>
<link>http://localhost/weblog/2004/05/entry_two.html</link>
<author>Melody</author>
<guid isPermaLink="true">http://localhost/weblog/2004/05/alternative_url.html</guid>
<category>Travel</category>
<pubDate>Sat, 29 May 2004 23:39:25 -0800</pubDate>
</item>
<item>
<title>Entry Two - link and id differ, isPermaLink="false"</title>
<description>Hello!...</description>
<xhtml:body><![CDATA[<p>Hello!</p>]]></xhtml:body>
<link>http://localhost/weblog/2004/05/entry_two.html</link>
<author>Melody</author>
<guid isPermaLink="false">http://localhost/weblog/2004/05/alternative_url.html</guid>
<category>Travel</category>
<pubDate>Sat, 29 May 2004 23:39:25 -0800</pubDate>
</item>
</channel>
</rss>
Processed with eg/check_feed.pl:
Title: First Weblog
Tagline: This is a test weblog.
Format: RSS 2.0
Author: Melody
Link: http://localhost/weblog/
Base:
Language: en-us
Copyright: Copyright 2004
Modified: 2004-05-29T23:39:57
Generator: http://www.movabletype.org/?v=3.0D
Link: http://localhost/weblog/2004/05/entry_two.html
Author: Melody
Title: Entry Two - link and id differ, isPermaLink="true"
Caregory: Travel
Id: http://localhost/weblog/2004/05/entry_two.html
Issued: 2004-05-29T23:39:25
Modified:
Lat:
Long:
Format: RSS 2.0
Tags: Travel
Enclosure:
Summary: Hello!...
Content: <p>Hello!</p>
Link: http://localhost/weblog/2004/05/entry_two.html
Author: Melody
Title: Entry Two - link and id differ, isPermaLink="false"
Caregory: Travel
Id: http://localhost/weblog/2004/05/alternative_url.html
Issued: 2004-05-29T23:39:25
Modified:
Lat:
Long:
Format: RSS 2.0
Tags: Travel
Enclosure:
Summary: Hello!...
Content: <p>Hello!</p>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.