Code Monkey home page Code Monkey logo

uri-fast's Introduction

NAME

URI::Fast - A fast(er) URI parser

SYNOPSIS

use URI::Fast qw(uri);

my $uri = uri 'http://www.example.com/some/path?fnord=slack&foo=bar';

if ($uri->scheme =~ /http(s)?/) {
  my @path  = $uri->path;
  my $fnord = $uri->param('fnord');
  my $foo   = $uri->param('foo');
}

if ($uri->path =~ /\/login/ && $uri->scheme ne 'https') {
  $uri->scheme('https');
  $uri->param('upgraded', 1);
}

DESCRIPTION

URI::Fast is a faster alternative to URI. It is written in C and provides basic parsing and modification of a URI.

URI is an excellent module; it is battle-tested, robust, and handles many edge cases. As a result, it is rather slower than it would otherwise be for more trivial cases, such as inspecting the path or updating a single query parameter.

EXPORTED SUBROUTINES

Subroutines are exported on demand.

uri

Accepts a URI string, minimally parses it, and returns a URI::Fast object.

Note: passing a URI::Fast instance to this routine will cause the object to be interpolated into a string (via "to_string"), effectively creating a clone of the original URI::Fast object.

iri

Similar to "uri", but returns a URI::Fast::IRI object. A URI::Fast::IRI differs from a URI::Fast in that UTF-8 characters are permitted and will not be percent-encoded when modified.

abs_uri

Builds a new URI::Fast from a relative URI string and makes it "absolute" in relation to $base.

my $uri = abs_uri 'some/path', 'http://www.example.com/fnord';
$uri->to_string; # "http://www.example.com/fnord/some/path"

html_url

Parses a URI string, removing whitespace characters ignored in URLs found in HTML documents, replacing backslashes with forward slashes, and making the URL "normalize"d.

If a base URL is specified, the URI::Fast object returned will be made "absolute" relative to that base URL.

# Resulting URL is "https://www.slashdot.org/recent"
my $url = html_url '//www.slashdot.org\recent', "https://www.slashdot.org";

uri_split

Behaves (hopefully) identically to URI::Split, but roughly twice as fast.

encode/decode/uri_encode/uri_decode

See "ENCODING".

CONSTRUCTORS

new

If desired, both URI::Fast and URI::Fast::IRI may be instantiated using the default OO-flavored constructor, new.

my $uri = URI::Fast->new('http://www.example.com');

new_abs

OO equivalent to "abs_uri".

new_html_url

OO equivalent to "html_url".

ATTRIBUTES

All attributes serve as full accessors, allowing the URI segment to be both retrieved and modified.

RAW ACCESSORS

Each attribute defines a raw_* method, which returns the raw, encoded string value for that attribute. If a new value is passed, it will set the field to the raw, unchanged value without checking it or changing it in any way.

CLEARERS

Each attribute further has a matching clearer method (clear_*) which unsets its value.

ACCESSORS

In general, accessors accept an unencoded string and set their slot value to the encoded value. They return the decoded value. See "ENCODING" for an in depth description of their behavior as well as an explanation of the more complex behavior of compound fields.

scheme

Gets or sets the scheme portion of the URI (e.g. http), excluding ://.

auth

The authorization section is composed of the username, password, host name, and port number:

hostname.com
[email protected]
someone:[email protected]:1234

Setting this field may be done with a string (see the note below about "ENCODING") or a hash reference of individual field names (usr, pwd, host, and port). In both cases, the existing values are completely replaced by the new values and any values missing from the caller-supplied input are deleted.

usr

The username segment of the authorization string. Updating this value alters "auth".

pwd

The password segment of the authorization string. Updating this value alters "auth".

host

The host name segment of the authorization string. May be a domain string or an IP address. If the host is an IPV6 address, it must be surrounded by square brackets (per spec), which are included in the host string. Updating this value alters "auth".

port

The port number segment of the authorization string. Updating this value alters "auth".

path

In scalar context, returns the entire path string. In list context, returns a list of path segments, split by /.

my $uri = uri '/foo/bar';
my $path = $uri->path;  # "/foo/bar"
my @path = $uri->path;  # ("foo", "bar")

The path may also be updated using either a string or an array ref of segments:

$uri->path('/foo/bar');
$uri->path(['foo', 'bar']);

This differs from the behavior of "path_segments" in URI, which considers the leading slash separating the path from the authority section to be an individual segment. If this behavior is desired, the lower level split_path_compat is available. split_path_compat (and its partner, split_path), always return an array reference.

my $uri = uri '/foo/bar';
$uri->split_path;         # ['foo', 'bar'];
$uri->split_path_compat;  # ['', 'foo', 'bar'];

query

In scalar context, returns the complete query string, excluding the leading ?. The query string may be set in several ways.

$uri->query("foo=bar&baz=bat"); # note: no percent-encoding performed
$uri->query({foo => 'bar', baz => 'bat'}); # foo=bar&baz=bat
$uri->query({foo => 'bar', baz => 'bat'}, ';'); # foo=bar;baz=bat

In list context, returns a hash ref mapping query keys to array refs of their values (see "query_hash").

Both '&' and ';' are treated as separators for key/value parameters.

frag

The fragment section of the URI, excluding the leading #.

fragment

An alias of "frag".

METHODS

query_keys

Does a fast scan of the query string and returns a list of unique parameter names that appear in the query string.

Both '&' and ';' are treated as separators for key/value parameters.

query_hash

Scans the query string and returns a hash ref of key/value pairs. Values are returned as an array ref, as keys may appear multiple times. Both '&' and ';' are treated as separators for key/value parameters.

May optionally be called with a new hash of parameters to replace the query string with, in which case keys may map to scalar values or arrays of scalar values. As with all query setter methods, a third parameter may be used to explicitly specify the separator to use when generating the new query string.

param

Gets or sets a parameter value. Setting a parameter value will replace existing values completely; the "query" string will also be updated. Setting a parameter to undef deletes the parameter from the URI.

$uri->param('foo', ['bar', 'baz']);
$uri->param('fnord', 'slack');

my $value_scalar = $uri->param('fnord'); # fnord appears once
my @value_list   = $uri->param('foo');   # foo appears twice
my $value_scalar = $uri->param('foo');   # croaks; expected single value but foo has multiple

# Delete parameter
$uri->param('foo', undef); # deletes foo

# Ambiguous cases
$uri->param('foo', '');  # foo=
$uri->param('foo', '0'); # foo=0
$uri->param('foo', ' '); # foo=%20

Both '&' and ';' are treated as separators for key/value parameters when parsing the query string. An optional third parameter explicitly selects the character used to separate key/value pairs.

$uri->param('foo', 'bar', ';'); # foo=bar
$uri->param('baz', 'bat', ';'); # foo=bar;baz=bat

When unspecified, '&' is chosen as the default. In either case, all separators in the query string will be normalized to the chosen separator.

$uri->param('foo', 'bar', ';'); # foo=bar
$uri->param('baz', 'bat', ';'); # foo=bar;baz=bat
$uri->param('fnord', 'slack');  # foo=bar&baz=bat&fnord=slack

add_param

Updates the query string by adding a new value for the specified key. If the key already exists in the query string, the new value is appended without altering the original value.

$uri->add_param('foo', 'bar'); # foo=bar
$uri->add_param('foo', 'baz'); # foo=bar&foo=baz

This method is simply sugar for calling:

$uri->param('key', [$uri->param('key'), 'new value']);

As with "param", the separator character may be specified as the final parameter. The same caveats apply with regard to normalization of the query string separator.

$uri->add_param('foo', 'bar', ';'); # foo=bar
$uri->add_param('foo', 'baz', ';'); # foo=bar;foo=baz

query_keyset

Allows modification of the query string in the manner of a set, using keys without =value, e.g. foo&bar&baz. Accepts a hash ref of keys to update. A truthy value adds the key, a falsey value removes it. Any keys not mentioned in the update hash are left unchanged.

my $uri = uri '&baz&bat';
$uri->query_keyset({foo => 1, bar => 1}); # baz&bat&foo&bar
$uri->query_keyset({baz => 0, bat => 0}); # foo&bar

If there are key-value pairs in the query string as well, the behavior of this method becomes a little more complex. When a key is specified in the hash update hash ref, a positive value will leave an existing key/value pair untouched. A negative value will remove the key and value.

my $uri = uri '&foo=bar&baz&bat';
$uri->query_keyset({foo => 1, baz => 0}); # foo=bar&bat

An optional second parameter may be specified to control the separator character used when updating the query string. The same caveats apply with regard to normalization of the query string separator.

append

Serially appends path segments, query strings, and fragments, to the end of the URI. Each argument is added in order. If the segment begins with ?, it is assumed to be a query string and it is appended using "add_param". If the segment begins with #, it is treated as a fragment, replacing any existing fragment. Otherwise, the segment is treated as a path fragment and appended to the path.

my $uri = uri 'http://www.example.com/foo?k=v';
$uri->append('bar', 'baz/bat', '?k=v1&k=v2', '#fnord', 'slack');
# 'http://www.example.com/foo/bar/baz/bat/slack?k=v&k=v1&k=v2#fnord'

to_string

as_string

"$uri"

Stringifies the URI, encoding output as necessary. String interpolation is overloaded.

compare

$uri eq $other

Compares the URI to another, returning true if the URIs are equivalent. Overloads the eq operator.

clone

Sugar for:

my $uri = uri '...';
my $clone = uri $uri;

absolute

Builds an absolute URI from a relative URI and a base URI string. Adheres as strictly as possible to the rules for resolving a target URI in RFC3986 section 5.2. Returns a new URI::Fast object representing the absolute, merged URI.

my $uri = uri('some/path')->absolute('http://www.example.com/fnord');
$uri->to_string; # "http://www.example.com/fnord/some/path"

abs

Alias of "absolute".

relative

Builds a relative URI using a second URI (either a URI::Fast object or a string) as a base. Unlike "rel" in URI, ignores differences in domain and scheme assumes the caller wishes to adopt the base URL's instead. Aside from that difference, it's behavior should mimic "rel" in URI's.

my $uri = uri('http://example.com/foo/bar')->relative('http://example.com/foo');
$uri->to_string; # "foo/bar"

my $uri = uri('http://example.com/foo/bar/')->relative('http://example.com/foo');
$uri->to_string; # "foo/bar/"

rel

Alias of "relative".

normalize

Similar to "canonical" in URI, performs a minimal normalization on the URI. Only generic normalization described in the rfc is performed; no scheme-specific normalization is done. Specifically, the scheme and host members are converted to lower case, dot segments are collapsed in the path, and any percent-encoded characters in the URI are converted to upper case.

canonical

Alias of "normalize".

ENCODING

URI::Fast tries to do the right thing in most cases with regard to reserved and non-ASCII characters. URI::Fast will fully encode reserved and non-ASCII characters when setting individual values and return their fully decoded values. However, the "right thing" is somewhat ambiguous when it comes to setting compound fields like "auth", "path", and "query".

When setting compound fields with a string value, reserved characters are expected to be present, and are therefore accepted as-is. Any non-ASCII characters will be percent-encoded (since they are unambiguous and there is no risk of double-encoding them). Thus,

$uri->auth('someone:secret@Ῥόδος.com:1234');
print $uri->auth; # "someone:secret@%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82.com:1234"

On the other hand, when setting these fields with a reference value (assumed to be a hash ref for "auth" and "query" or an array ref for "path"; see individual methods' docs for details), each field is fully percent-encoded, just as if each individual simple slot's setter had been called:

$uri->auth({usr => 'some one', host => 'somewhere.com'});
print $uri->auth; # "some%[email protected]"
print $uri->usr;; # "some one"

The same goes for return values. For compound fields returning a string, non-ASCII characters are decoded but reserved characters are not. When returning a list or reference of the deconstructed field, individual values are decoded of both reserved and non-ASCII characters.

'+' vs '%20'

Although no longer part of the standard, + is commonly used as the encoded space character (rather than %20); it is still official to the application/x-www-form-urlencoded type, and is treated as a space by "decode".

encode

Percent-encodes a string for use in a URI. By default, both reserved and UTF-8 chars (! * ' ( ) ; : @ & = + $ , / ? # [ ] %) are encoded.

A second (optional) parameter provides a string containing any characters the caller does not wish to be encoded. An empty string will result in the default behavior described above.

For example, to encode all characters in a query-like string except for those used by the query:

my $encoded = URI::Fast::encode($some_string, '?&=');

decode

Decodes a percent-encoded string.

my $decoded = URI::Fast::decode($some_string);

uri_encode

uri_decode

These are aliases of "encode" and "decode", respectively. They were added to make BLUEFEET happy after he made fun of me for naming "encode" and "decode" too generically.

In fact, these were originally aliased as url_encode and url_decode, but due to some pedantic whining on the part of BGRIMM, they have been renamed to uri_encode and uri_decode.

escape_tree

unescape_tree

Traverses a data structure, escaping or unescaping defined scalar values in place. Accepts a reference to be traversed. Any further parameters are passed unchanged to "encode" or "decode". Croaks if the input to escape/unescape is a non-reference value.

my $obj = {
  foo => ['bar baz', 'bat%fnord'],
  bar => {baz => 'bat%bat'},
  baz => undef,
  bat => '',
};

URI::Fast::escape_tree($obj);

# $obj is now:
{
  foo => ['bar%20baz', 'bat%25fnord'],
  bar => {baz => 'bat%25bat'},
  baz => undef,
  bat => '',
}

URI::Fast::unescape_tree($obj); # $obj returned to original form

URI::Fast::escape_tree($obj, '%'); # escape but allow "%"

# $obj is now:
{
  foo => ['bar%20baz', 'bat%fnord'],
  bar => {baz => 'bat%bat'},
  baz => undef,
  bat => '',
}

CAVEATS

This module is designed to parse URIs according to RFC 3986. Browsers parse URLs using a different (but similar) algorithm and some strings that are valid URLs to browsers are not valid URIs to this module. The "html_url" function attempts to parse URLs more in line with how browsers do, but no guarantees are made as HTML standards and browser implementations are an ever shifting landscape.

SPEED

See URI::Fast::Benchmarks.

SEE ALSO

URI

The de facto standard.

RFC 3986

The official standard.

ACKNOWLEDGEMENTS

Thanks to ZipRecruiter for encouraging their employees to contribute back to the open source ecosystem. Without their dedication to quality software development this distribution would not exist.

CONTRIBUTORS

The following people have contributed to this module with patches, bug reports, API advice, identifying areas where the documentation is unclear, or by making fun of me for naming certain methods too generically.

Andy Ruder
Aran Deltac (BLUEFEET)
Ben Grimm (BGRIMM)
Dave Hubbard (DAVEH)
James Messrie
Martin Locklear
Randal Schwartz (MERLYN)
Sara Siegal (SSIEGAL)
Tim Vroom (VROOM)
Des Daignault (NAWGLAN)
Josh Rosenbaum

AUTHOR

Jeff Ober <[email protected]>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by Jeff Ober. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

uri-fast's People

Contributors

aeruder avatar manwar avatar sysread avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

uri-fast's Issues

html_url with empty relative url causes crash

I get a consistent crash when I run

perl -MURI::Fast=html_url -e 'html_url("", "http://xyz.com")'

Apparently, having a blank string as first parameter causes a crash. I think it should return the base in such case.

html_url, empty path not normalized

The following two produce unchanged (hence different) results but mean the same. I think the former is the right choice:

perl -MURI::Fast=html_url -we 'print html_url("http://example.com/",  "http://a")'
perl -MURI::Fast=html_url -we 'print html_url("http://example.com",  "http://a")'

I don't care which one gets preference, as long as they normalize to the same.

Bareword "U" not allowed while "strict subs" in use

Some of my smoker systems show the following failure:

Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Bareword "U" not allowed while "strict subs" in use at t/basics.t line 32.
Type of arg 1 to Test2::Tools::Exception::dies must be block or sub {} (not reference constructor) at t/basics.t line 178, near "},"
t/basics.t has too many errors.
One or more DATA sections were not processed by Inline.

The failure goes away if a recent Test2-Suite is installed.

Minimum URI::Encode::XS version

It seems that the minimum URI::Encode::XS version should 0.07, not 0.06. With the older version my smoker systems show the following test failure:

# Failed test 'path & query'
# at t/basics.t line 61.
# Caught exception in subtest: uri_decode() requires a scalar argument to decode! at /tmpfs/.cpan-build-cpansand/2018021921/URI-Fast-0.01-6/blib/lib/URI/Fast.pm line 128.

# Failed test 'complete'
# at t/basics.t line 80.
# Caught exception in subtest: uri_decode() requires a scalar argument to decode! at /tmpfs/.cpan-build-cpansand/2018021921/URI-Fast-0.01-6/blib/lib/URI/Fast.pm line 128.

# Failed test 'update param'
# at t/basics.t line 126.
# Caught exception in subtest: uri_decode() requires a scalar argument to decode! at /tmpfs/.cpan-build-cpansand/2018021921/URI-Fast-0.01-6/blib/lib/URI/Fast.pm line 128.
# Seeded srand with seed '20180219' from local date.
t/basics.t ............. 
Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/7 subtests 

Statistical analysis (theta=1 means "good"):

****************************************************************
Regression 'mod:URI::Encode::XS'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      0.0000	      0.0000	   2.71
[1='eq_0.06']  	     -0.0000	      0.0000	  -2.58
[2='eq_0.07']  	      1.0000	      0.0000	8623556281692261.00
[3='eq_0.09']  	      1.0000	      0.0000	11886752020323996.00
[4='eq_0.11']  	      1.0000	      0.0000	12020065557414462.00

R^2= 1.000, N= 65, K= 5
****************************************************************

URI::Fast::IRI lost its version in the CPAN index

With a recent CPAN.pm and an older URI::Fast installed a new problem appears when trying to upgrade:

  JEFFOBER/URI-Fast-0.47.tar.gz
  /usr/bin/make -- OK
allow_installing_module_downgrades: JEFFOBER/URI-Fast-0.47.tar.gz (called for J/JE/JEFFOBER/URI-Fast-0.47.tar.gz) contains downgrading module(s) (e.g. 'URI/Fast/IRI.pm' would downgrade installed '0.37' to 'undef'). Do you want to allow installing it? [yes] no
Testing/Installation stopped: allow_installing_module_downgrades: JEFFOBER/URI-Fast-0.47.tar.gz (called for J/JE/JEFFOBER/URI-Fast-0.47.tar.gz) contains downgrading module(s) (e.g. 'URI/Fast/IRI.pm' would downgrade installed '0.37' to 'undef')

Probably this line is problematic: https://metacpan.org/release/URI-Fast/source/lib/URI/Fast/IRI.pm#L7
I think that setting a version from another module's version does not work in the CPAN world. The PAUSE daemon parses a module version only by "looking" at the $VERSION line in the module and does not (almost) execute any code, especially it would not call the require here.

(@andk: maybe this is also interesting for you)

Use strstr()

Fast.xs provides str_index() which seems to reimplement the standard C function strstr() which is usually mapped on a single CPU instruction. There is no reason documented why strstr is not used.

Build not possible with BSD make

With the standard make on FreeBSD it's not possible to build URI-Fast-0.35_5:

Output from '/usr/bin/make':

make: "/home/cpansand/.cpan/build/2018070208/URI-Fast-0.35_5-bFRFNS/Makefile" line 1114: Missing dependency operator
make: "/home/cpansand/.cpan/build/2018070208/URI-Fast-0.35_5-bFRFNS/Makefile" line 1116: Need an operator
make: "/home/cpansand/.cpan/build/2018070208/URI-Fast-0.35_5-bFRFNS/Makefile" line 1118: Missing dependency operator
make: "/home/cpansand/.cpan/build/2018070208/URI-Fast-0.35_5-bFRFNS/Makefile" line 1120: Need an operator
make: "/home/cpansand/.cpan/build/2018070208/URI-Fast-0.35_5-bFRFNS/Makefile" line 1122: Need an operator
make: Fatal errors encountered -- cannot continue
make: stopped in /home/cpansand/.cpan/build/2018070208/URI-Fast-0.35_5-bFRFNS

Same problem exists with NetBSD: http://www.cpantesters.org/cpan/report/151ac890-7e19-11e8-9c78-3a8d13bf8fb6

And probably with other BSD systems (OpenBSD, Dragonfly).

Permit arbitrary binary data in query string

In the current implementation, it is assumed that any decoded string is utf8 and that any string to be encoded is utf8. This should be modified such that arbitrary binary data may be stored in a query parameter.

Treatment of double slashes in path

The documentation in Fast.xs says:

  // Relative URIs may begin with // to indicate an authority section without a
  // scheme, which is illegal in standard URI syntax (authority may only come
  // after a scheme, which is required, separated by //). This workaround helps
  // the parser along by identifying the authority section as such.

But when you try this out in the browser, you will find that they treat any double slash in the path as single slash. So, http://abc.nl//xyz// is interpreted as http://abc.nl/xyz/. Probably to be resolved by the url parser. I found major websites with href="//js/" which proves all (important) browsers interpret this as /js. (Actually, the URI module is doing this wrong as well)

Crash when assigning to path

Running 0.53, with this code:

        if($url->path =~ /[^\x20-\x7f]/)
        {   my $path = $url->path =~ s!([^\x20-\xf0])!$b = $1; utf8::encode($b);
                 join '', map sprintf("%%%02X", ord), split //, $b!gre;
            $url->path($path);
        }

I attempt to convert utf8 within the path into hex encoding. When running this on many urls, I get a crash. When I remove the storage line (the last one), it does not crash. It produces various internal checks of malloc. It may be memory issue on the html_url itself, because this causes a crash with me:

use warnings;
use strict;
use utf8;

use URI::Fast qw(html_url);

my $url = html_url("/mp3poisk/Relax, take it easy..ιlιlι.. ßѴҎ☺..", "http://mp3prima.com/");
warn "Parsed\n";
warn "Strinigfied: $url\n";

my $path = $url->path =~ s!([^\x20-\xf0])!$b = $1; utf8::encode($b);
                 join '', map sprintf("%%%02X", ord), split //, $b!gre;
warn "To hex=$path\n";

$url->path($path);
warn "Becomes $url\n";

I hope the utf8 copy-pastes correctly.

Are you using the right function to determine the length of the input strings?

Optimized web-page link to normalized absolute link

Not only my project, but also many other people, convert links which are found somewhere (usually in HTML, maybe in config files) into absolute, normalized urls. My application has to do this at least 150k times per second, for ever. Library support to speed this up is very welcome to save resources (the planet).

Current flow with XML::LibXML:

   foreach my a ($doc->findnodes("//a[@href]")) {
      my $href = $a->getAttribute('href') =~ s/^\s+//r =~ s/\s+$//r;
      my $uri = URI::Fast->new($href)->absolute($base)->canonical;
      ...
   }

The best performing would be

   URI::Fast->defaultBase($base);
   foreach my a ($doc->findnodes("//a[@href]")) {
       my $uri = URI::Fast->newFromHtml($a->getAttribute('href'));
       ...
   }

Distinct methods for handling dictionary vs. set query types

Both k1=v1&k2=v2 and k1&k2 are valid query string uses, but mixing semantics to allow both of them to be modified is beginning to impact clarity and speed in param's argument handling as well as the complexity of the function itself. These should be split off into separate methods.

Normalize hex encodings

Really often, too many characters are encoded hex. For instance, you often see http://example.com/%7Euser which is a needless incoding for http://example.com/~user. The Fast.xs marks this as TODO.

The correcting of this superfluous encoding is a bit difficult: you probably need to keep multi-byte encodings of UTF8 characters in HEX form.

Utf8 issue with produced string

When I run this:

#!/usr/bin/env perl
  
use warnings;
use strict;
use utf8;

use URI;
use URI::Fast  qw(html_url uri);
use Encode     qw(is_utf8 encode);

my $link = "http://год2020.рф";
my $url  = html_url $link;

my $host = $url->host;
warn "1 $host is UTF8? ", is_utf8($host) ? 'YES' : 'NO', "\n";
warn "2 $host\n";
warn "3 $host abc\n";
warn "4 $host рф\n";
warn "5 ", encode(utf8 => $host), "\n";
warn "6 ", encode(utf8 => $link), "\n";

I get this output:

1 год2020.рф is UTF8? YES
2 год2020.рф
3 год2020.рф abc
Wide character in warn at /tmp/a.pl line 18.
4 год2020.ÑÑ
                рф
5 год2020.ÑÑ

6 http://год2020.рф

Although the $host says it is utf8, it still gets mutilated by encode()...

uc_hex inefficient

The hex encoding normalization is far to verbose. This

/*
 * Uppercases a 3-digit hex sequence, if present, in the first 3 indices of
 * *buf. It is the caller's responsibility to ensure that *buf is at least 3
 * chars in length.
 */
static inline
bool uc_hex_3ch(pTHX_ char *buf) {
  if (buf[0] != '%') return 0;
  buf[1] = toUPPER(buf[1]);
  buf[2] = toUPPER(buf[2]);
  return 1;
}

/*
 * Uppercases 3-character hex codes over an entire uri_str_t.
 */
static inline
void uc_hex(pTHX_ uri_str_t *str) {
  size_t i = 0;
  while (i < str->length) {
    if (i + 2 < str->length && uc_hex_3ch(aTHX_ &str->string[i]) == 1) {
      i += 3;
    } else {
      ++i;
    }
  }
}

can be simplified into

static inline
void uc_hex(pTHX_ uri_str_t *str) {
  for(size_t i = 0; i < str->length -2; i++) {
     if(str->string[i]=='%') {
        str->string[i+1] = toupper(str->string[i+1]);
        str->string[i+2] = toupper(str->string[i+2]);
        i += 2;
  }
}

html_url with large relative link and base without scheme causes crash

In the illegal situation that the base does not specify a schema AND the relative url is larger, you trigger a segmentation fault.

perl -MURI::Fast=html_url -e 'html_url("//img.promportal.su/foto/good_fotos/4746/47467088/truba-stalnaya-273h3-8-mm-08-gost-10705-80-elektrosvarnaya-pryamoshovnaya_foto_largest.jpg","//snabservis.ru")'

Length may have nothing to do with it, just trigger it more consistently.

Normalizing unencoded utf8

For instance on [https://www.meadjohnson.com.cn/](some Chinese page) I find horrible html like

<ul class="search-hot"><li><a data-id="dhl_yqhl" href="/tag/孕期护理">孕期护理</a></li>
<li><a data-id="dhl_yqzz" href="/tag/孕期症状">孕期症状</a></li>
<li><a data-id="dhl_yqyy" href="/tag/孕期营养">孕期营养</a></li>
<li><a data-id="dhl_dnfy" href="/tag/大脑发育">大脑发育</a></li>
<li><a data-id="dhl_cj" href="/tag/产检">产检</a></li>
<li><a data-id="dhl_fstj" href="/tag/辅食添加">辅食添加</a></li>
</ul>

Module URI normalizes them into hex encodings. Should they? (open question)

Add constructor new_abs()

The URI module, where URI::Fast has some compatibility with, offers the new_abs() constructor:

my $uri = URI->new_abs($url, $base);

When the $url is relative, it is made absolute using $base. In URI::Fast, I need to do

my $uri = URI::Fast->new($url)->absolute($base);

Combining new() with absolute() in one call is (very probable) much more efficient.

Trailing \u0000 after html_url with fragments

When there is a fragment in the relative URI, the result of html_url() gets a trailing '\0'

perl -MURI::Fast=html_url -we 'print html_url("https://example.com/#more",  "http://a")' | od -c

Hex encoding not everywhere

The URI RFC3968 describes how hex encoding takes place, but that does not mean that all components can have hex encoding: simply because their character-set is restricted.

During normalization, we do not need to uc_hex() the scheme, host or port.

Percent encode/decode fails on ARM platforms

Sample CPAN testers failure:


"/home/njh/perl5/perlbrew/perls/perl-5.26.1/bin/perl5.26.1" -MExtUtils::Command::MM -e 'cp_nonempty' -- Fast.bs blib/arch/auto/URI/Fast/Fast.bs 644
PERL_DL_NONLAZY=1 "/home/njh/perl5/perlbrew/perls/perl-5.26.1/bin/perl5.26.1" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*
t/encoding.t .. ok
    # Failed test 'get param'
    # at t/iri.t line 26.
    # Unicode::GCString is not installed, table may not display all unicode characters properly
    # +---------+-------+
    # | GOT     | CHECK |
    # +---------+-------+
    # | <UNDEF> | ���¥�®   |
    # +---------+-------+

# Failed test 'getters'
# at t/iri.t line 27.
# Seeded srand with seed '20180620' from local date.
t/iri.t ....... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests 
t/parsing.t ... ok
t/basics.t .... ok
t/tied.t ...... ok

Test Summary Report
-------------------
t/iri.t     (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=5, Tests=42,  1 wallclock secs ( 0.22 usr  0.04 sys +  2.71 cusr  0.30 csys =  3.27 CPU)
Result: FAIL
Failed 1/5 test programs. 1/42 subtests failed.
Makefile:1066: recipe for target 'test_dynamic' failed
make: *** [test_dynamic] Error 255

------------------------------
PREREQUISITES
------------------------------

Prerequisite modules loaded:

requires:

    Module              Need     Have    
    ------------------- -------- --------
    Carp                0        1.42    
    Exporter            0        5.72    
    parent              0        0.236   
    perl                5.010    5.026001

build_requires:

    Module              Need     Have    
    ------------------- -------- --------
    ExtUtils::MakeMaker 6.63_03  7.30    
    ExtUtils::testlib   0        7.30    
    Test2               1.302125 1.302136
    Test2::Suite        0.000100 0.000106
    Test2::V0           0        0.000106
    Test::LeakTrace     0.16     0.16    
    Test::Pod           1.41     1.52    
    URI::Encode::XS     0.11     0.11    
    URI::Split          0        1.72    

configure_requires:

    Module              Need     Have    
    ------------------- -------- --------
    ExtUtils::MakeMaker 0        7.30    


------------------------------
ENVIRONMENT AND OTHER CONTEXT
------------------------------

Environment variables:

    AUTOMATED_TESTING = 1
    HARNESS_OPTIONS = j4
    LANG = en_US.UTF-8
    LANGUAGE = en_US.UTF-8
    LC_ALL = en_US.UTF-8
    PATH = /home/njh/perl5/perlbrew/bin:/home/njh/perl5/perlbrew/perls/perl-5.26.1/bin:/home/njh/src/njh/smoker/bin:/usr/bin:/bin
    PERL5LIB = /home/njh/.cpan/build/URI-Encode-XS-0.11-0/blib/arch:/home/njh/.cpan/build/URI-Encode-XS-0.11-0/blib/lib
    PERL5OPT = 
    PERL5_CPANPLUS_IS_RUNNING = 9770
    PERL5_CPAN_IS_RUNNING = 9770
    PERLBREW_HOME = /home/njh/.perlbrew
    PERLBREW_MANPATH = /home/njh/perl5/perlbrew/perls/perl-5.26.1/man
    PERLBREW_PATH = /home/njh/perl5/perlbrew/bin:/home/njh/perl5/perlbrew/perls/perl-5.26.1/bin
    PERLBREW_PERL = perl-5.26.1
    PERLBREW_ROOT = /home/njh/perl5/perlbrew
    PERLBREW_SHELLRC_VERSION = 0.83
    PERLBREW_VERSION = 0.83
    PERL_USE_UNSAFE_INC = 1
    SHELL = /bin/sh
    TMPDIR = /tmp/testwrapper.9758

Perl special variables (and OS-specific diagnostics, for MSWin32):

    $^X = /home/njh/perl5/perlbrew/perls/perl-5.26.1/bin/perl5.26.1
    $UID/$EUID = 1000 / 1000
    $GID = 1000 1000
    $EGID = 1000 1000

Perl module toolchain versions installed:

    Module              Have    
    ------------------- --------
    CPAN                2.18    
    CPAN::Meta          2.150010
    Cwd                 3.67    
    ExtUtils::CBuilder  0.280230
    ExtUtils::Command   7.30    
    ExtUtils::Install   2.14    
    ExtUtils::MakeMaker 7.30    
    ExtUtils::Manifest  1.70    
    ExtUtils::ParseXS   3.35    
    File::Spec          3.67    
    JSON                2.97001 
    JSON::PP            2.97000 
    Module::Build       0.4224  
    Module::Signature   0.81    
    Parse::CPAN::Meta   2.150010
    Test::Harness       3.39    
    Test::More          1.302136
    YAML                1.24    
    YAML::Syck          1.30    
    version             0.9918  


--

Summary of my perl5 (revision 5 version 26 subversion 1) configuration:
   
  Platform:
    osname=linux
    osvers=4.9.23-std-1
    archname=aarch64-linux
    uname='linux scaleway 4.9.23-std-1 #1 smp mon apr 24 13:18:14 utc 2017 aarch64 gnulinux '
    config_args='-de -Dprefix=/home/njh/perl5/perlbrew/perls/perl-5.26.1 -Dusedevel -Accflags=-O2 -W -Wformat=2 -Wswitch -Wshadow -Wwrite-strings -Wuninitialized -Wall -pipe -fomit-frame-pointer -D_FORTIFY_SOURCE=2 -Wpointer-arith -Wstrict-prototypes -fstack-protector -Wstack-protector -Wextra -Wbad-function-cast -Wcast-align -Wcast-qual -Wdisabled-optimization -Wendif-labels -Wfloat-equal -Wformat-nonliteral -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wundef -Wformat-security -march=armv8-a+crc -Aeval:scriptdir=/home/njh/perl5/perlbrew/perls/perl-5.26.1/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-O2 -W -Wformat=2 -Wswitch -Wshadow -Wwrite-strings -Wuninitialized -Wall -pipe -fomit-frame-pointer -D_FORTIFY_SOURCE=2 -Wpointer-arith -Wstrict-prototypes -fstack-protector -Wstack-protector -Wextra -Wbad-function-cast -Wcast-align -Wcast-qual -Wdisabled-optimization -Wendif-labels -Wfloat-equal -Wformat-nonliteral -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wundef -Wformat-security -march=armv8-a+crc -fwrapv -fno-strict-aliasing -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2'
    cppflags='-O2 -W -Wformat=2 -Wswitch -Wshadow -Wwrite-strings -Wuninitialized -Wall -pipe -fomit-frame-pointer -D_FORTIFY_SOURCE=2 -Wpointer-arith -Wstrict-prototypes -fstack-protector -Wstack-protector -Wextra -Wbad-function-cast -Wcast-align -Wcast-qual -Wdisabled-optimization -Wendif-labels -Wfloat-equal -Wformat-nonliteral -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wundef -Wformat-security -march=armv8-a+crc -fwrapv -fno-strict-aliasing -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='6.3.0 20170516'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=1
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/aarch64-linux-gnu/6/include-fixed /usr/include/aarch64-linux-gnu /usr/lib /lib/aarch64-linux-gnu /lib/../lib /usr/lib/aarch64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.24.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.24'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_DEVEL
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Locally applied patches:
    Devel::PatchPerl 1.48
  Built under linux
  Compiled at Sep 27 2017 11:53:29
  %ENV:
    PERL5LIB="/home/njh/.cpan/build/URI-Encode-XS-0.11-0/blib/arch:/home/njh/.cpan/build/URI-Encode-XS-0.11-0/blib/lib"
    PERL5OPT=""
    PERL5_CPANPLUS_IS_RUNNING="9770"
    PERL5_CPAN_IS_RUNNING="9770"
    PERLBREW_HOME="/home/njh/.perlbrew"
    PERLBREW_MANPATH="/home/njh/perl5/perlbrew/perls/perl-5.26.1/man"
    PERLBREW_PATH="/home/njh/perl5/perlbrew/bin:/home/njh/perl5/perlbrew/perls/perl-5.26.1/bin"
    PERLBREW_PERL="perl-5.26.1"
    PERLBREW_ROOT="/home/njh/perl5/perlbrew"
    PERLBREW_SHELLRC_VERSION="0.83"
    PERLBREW_VERSION="0.83"
    PERL_USE_UNSAFE_INC="1"
  @INC:
    /home/njh/.cpan/build/URI-Encode-XS-0.11-0/blib/arch
    /home/njh/.cpan/build/URI-Encode-XS-0.11-0/blib/lib
    /home/njh/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/aarch64-linux
    /home/njh/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1
    /home/njh/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1/aarch64-linux
    /home/njh/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1
    .```

Normalization of blanks

Links on web-pages sometimes contain white-spaces. Browsers translate that into %20 (as does the URI module).
For normalization, the + (as far as I know deprecated) should be translated into %20 as well.

frag() error in iri.t on NetBSD/amd64

From cpan testers report:

PERL_DL_NONLAZY=1 "/home/cpan/pit/bare/perl-5.18.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*
t/basics.t ........ ok
t/compare.t ....... ok
t/encoding.t ...... ok
t/ipv.t ........... ok
    # Failed test 'frag'
    # at t/iri.t line 22.
    # Unicode::GCString is not installed, table may not display all unicode characters properly
    # +----------------+----+-------+
    # | GOT            | OP | CHECK |
    # +----------------+----+-------+
    # | ��\N{U+92}���®���¥�� | eq | ���®�¥�©  |
    # +----------------+----+-------+

# Failed test 'getters'
# at t/iri.t line 27.
# Seeded srand with seed '20180629' from local date.
t/iri.t ........... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests 
t/memory.t ........ ok
t/param.t ......... ok
t/parsing.t ....... ok
t/query_keyset.t .. ok
t/test.t .......... ok
t/tied.t .......... ok

Test Summary Report
-------------------
t/iri.t         (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=11, Tests=60,  2 wallclock secs ( 0.09 usr  0.03 sys +  1.68 cusr  0.55 csys =  2.35 CPU)
Result: FAIL
Failed 1/11 test programs. 1/60 subtests failed.
*** Error code 255

Stop.
make: stopped in /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35


PREREQUISITES:

Here is a list of prerequisites you specified and versions we
managed to load:

	  Module Name                        Have     Want
	  Carp                               1.29        0
	  Exporter                           5.68        0
	  ExtUtils::MakeMaker                7.34  6.63_03
	  ExtUtils::testlib                  7.34        0
	  Test2                          1.302136 1.302125
	  Test2::Suite                   0.000114 0.000100
	  Test2::V0                      0.000114        0
	  Test::LeakTrace                    0.16     0.16
	  Test::Pod                          1.52     1.41
	  URI::Encode::XS                    0.11     0.11
	  URI::Split                         1.74        0
	  parent                            0.225        0

Perl module toolchain versions installed:
	Module Name                        Have
	CPANPLUS                         0.9174
	CPANPLUS::Dist::Build              0.88
	Cwd                                3.74
	ExtUtils::CBuilder             0.280230
	ExtUtils::Command                  7.34
	ExtUtils::Install                  2.14
	ExtUtils::MakeMaker                7.34
	ExtUtils::Manifest                 1.71
	ExtUtils::ParseXS                  3.35
	File::Spec                         3.74
	Module::Build                    0.4224
	Pod::Parser                        1.60
	Pod::Simple                        3.28
	Test2                          1.302136
	Test::Harness                      3.42
	Test::More                     1.302136
	version                          0.9924

******************************** NOTE ********************************
The comments above are created mechanically, possibly without manual
checking by the sender.  As there are many people performing automatic
tests on each upload to CPAN, it is likely that you will receive
identical messages about the same problem.

If you believe that the message is mistaken, please reply to the first
one with correction and/or additional informations, and do not take
it personally.  We appreciate your patience. :)
**********************************************************************

Additional comments:


This report was machine-generated by CPANPLUS::Dist::YACSmoke 1.02.
Powered by minismokebox version 0.68

------------------------------
ENVIRONMENT AND OTHER CONTEXT
------------------------------

Environment variables:

    AUTOMATED_TESTING = 1
    NONINTERACTIVE_TESTING = 1
    PATH = /home/cpan/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R7/bin:/usr/X11R6/bin:/usr/pkg/bin:/usr/pkg/sbin:/usr/games:/usr/local/bin:/usr/local/sbin
    PERL5LIB = :/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aUVR3uqHM3/Importer-0.025/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aUVR3uqHM3/Importer-0.025/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/z8T7vtmLr3/Scope-Guard-0.21/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/z8T7vtmLr3/Scope-Guard-0.21/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/IvdCuuczxa/Sub-Info-0.002/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/IvdCuuczxa/Sub-Info-0.002/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aeMieONlaW/Term-Table-0.012/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aeMieONlaW/Term-Table-0.012/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/oKcMbBBr_o/Test2-Suite-0.000114/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/oKcMbBBr_o/Test2-Suite-0.000114/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/JsVUQLwnlV/Test-LeakTrace-0.16/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/JsVUQLwnlV/Test-LeakTrace-0.16/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/UPexbq5kGM/Test-Pod-1.52/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/UPexbq5kGM/Test-Pod-1.52/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/YtPLHS0AYC/URI-Encode-XS-0.11/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/YtPLHS0AYC/URI-Encode-XS-0.11/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/X_WHyUoOeo/Test-Needs-0.002005/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/X_WHyUoOeo/Test-Needs-0.002005/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/h6nCoXWXd1/URI-1.74/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/h6nCoXWXd1/URI-1.74/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35/blib/arch
    PERL5_CPANPLUS_IS_RUNNING = 24578
    PERL5_CPANPLUS_IS_VERSION = 0.9174
    PERL5_MINISMOKEBOX = 0.68
    PERL5_YACSMOKE_BASE = /home/cpan/pit/bare/conf/perl-5.18.0
    PERL_EXTUTILS_AUTOINSTALL = --defaultdeps
    PERL_MM_USE_DEFAULT = 1
    SHELL = /usr/pkg/bin/bash
    TERM = screen

Perl special variables (and OS-specific diagnostics, for MSWin32):

    Perl: $^X = /home/cpan/pit/bare/perl-5.18.0/bin/perl
    UID:  $<  = 1002
    EUID: $>  = 1002
    GID:  $(  = 100 100
    EGID: $)  = 100 100


-------------------------------


--

Summary of my perl5 (revision 5 version 18 subversion 0) configuration:
   
  Platform:
    osname=netbsd, osvers=6.1.4, archname=amd64-netbsd
    uname='netbsd naboo.bingosnet.co.uk 6.1.4 netbsd 6.1.4 (generic) amd64 '
    config_args='-des -Dprefix=/home/cpan/pit/bare/perl-5.18.0'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/pkg/include',
    optimize='-O',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/pkg/include'
    ccversion='', gccversion='4.5.3', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -Wl,-rpath,/usr/pkg/lib -Wl,-rpath,/usr/local/lib -fstack-protector -L/usr/pkg/lib'
    libpth=/usr/pkg/lib /lib /usr/lib
    libs=-lgdbm -lm -lcrypt -lutil -lc -lposix
    perllibs=-lm -lcrypt -lutil -lc -lposix
    libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E '
    cccdlflags='-DPIC -fPIC ', lddlflags='--whole-archive -shared  -L/usr/pkg/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV
                        PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_ALL
                        USE_64_BIT_INT USE_LARGE_FILES USE_LOCALE
                        USE_LOCALE_COLLATE USE_LOCALE_CTYPE
                        USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
  Built under netbsd
  Compiled at May  9 2014 16:48:11
  %ENV:
    PERL5LIB=":/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aUVR3uqHM3/Importer-0.025/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aUVR3uqHM3/Importer-0.025/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/z8T7vtmLr3/Scope-Guard-0.21/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/z8T7vtmLr3/Scope-Guard-0.21/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/IvdCuuczxa/Sub-Info-0.002/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/IvdCuuczxa/Sub-Info-0.002/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aeMieONlaW/Term-Table-0.012/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aeMieONlaW/Term-Table-0.012/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/oKcMbBBr_o/Test2-Suite-0.000114/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/oKcMbBBr_o/Test2-Suite-0.000114/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/JsVUQLwnlV/Test-LeakTrace-0.16/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/JsVUQLwnlV/Test-LeakTrace-0.16/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/UPexbq5kGM/Test-Pod-1.52/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/UPexbq5kGM/Test-Pod-1.52/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/YtPLHS0AYC/URI-Encode-XS-0.11/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/YtPLHS0AYC/URI-Encode-XS-0.11/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/X_WHyUoOeo/Test-Needs-0.002005/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/X_WHyUoOeo/Test-Needs-0.002005/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/h6nCoXWXd1/URI-1.74/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/h6nCoXWXd1/URI-1.74/blib/arch:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35/blib/lib:/home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35/blib/arch"
    PERL5_CPANPLUS_IS_RUNNING="24578"
    PERL5_CPANPLUS_IS_VERSION="0.9174"
    PERL5_MINISMOKEBOX="0.68"
    PERL5_YACSMOKE_BASE="/home/cpan/pit/bare/conf/perl-5.18.0"
    PERL_EXTUTILS_AUTOINSTALL="--defaultdeps"
    PERL_MM_USE_DEFAULT="1"
  @INC:
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aUVR3uqHM3/Importer-0.025/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aUVR3uqHM3/Importer-0.025/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/z8T7vtmLr3/Scope-Guard-0.21/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/z8T7vtmLr3/Scope-Guard-0.21/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/IvdCuuczxa/Sub-Info-0.002/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/IvdCuuczxa/Sub-Info-0.002/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aeMieONlaW/Term-Table-0.012/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aeMieONlaW/Term-Table-0.012/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/oKcMbBBr_o/Test2-Suite-0.000114/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/oKcMbBBr_o/Test2-Suite-0.000114/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/JsVUQLwnlV/Test-LeakTrace-0.16/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/JsVUQLwnlV/Test-LeakTrace-0.16/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/UPexbq5kGM/Test-Pod-1.52/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/UPexbq5kGM/Test-Pod-1.52/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/YtPLHS0AYC/URI-Encode-XS-0.11/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/YtPLHS0AYC/URI-Encode-XS-0.11/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/X_WHyUoOeo/Test-Needs-0.002005/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/X_WHyUoOeo/Test-Needs-0.002005/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/h6nCoXWXd1/URI-1.74/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/h6nCoXWXd1/URI-1.74/blib/arch
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35/blib/lib
    /home/cpan/pit/bare/conf/perl-5.18.0/.cpanplus/5.18.0/build/aEM_moZtOc/URI-Fast-0.35/blib/arch
    /home/cpan/pit/bare/perl-5.18.0/lib/site_perl/5.18.0/amd64-netbsd
    /home/cpan/pit/bare/perl-5.18.0/lib/site_perl/5.18.0
    /home/cpan/pit/bare/perl-5.18.0/lib/5.18.0/amd64-netbsd
    /home/cpan/pit/bare/perl-5.18.0/lib/5.18.0
    .```

html_url and data:

In HTML, the data: scheme is a pseudo link. Probably best if html_url() returns undef in such encounter.

Path double hex encoded at assign

When I fix the path, because html_url() does not handle utf8 input, I use something like this:

$url->path(do_something_smart($url->path));

My smart function translates UTF8 into HEX encoding. However, when I later look in $url->path, all the "%" signs have become "%25".

Is it possible that the setter for path() expects unencoded strings, and the path getter produces encoded strings? It is really very inconvenient design when $url->path($url->path) is not idempotent. And what is my way around setting path without mutilation?

Adding TO_JSON

When you run

print JSON::XS->new->convert_blessed(1)->encode(URI::Fast->new("http://xz.nl"))';

you get the error message

encountered object 'http://xz.nl', but neither allow_blessed, convert_blessed nor allow_tags
    settings are enabled (or TO_JSON/FREEZE method missing)

This is a very confusing message, but it means here that URI::Fast has no TO_JSON method. The behaviour is the same for JSON::XS and JSON (perl version). It is a very ugly concept to enforce other unrelated objects to add a function. On the other hand: JSON is a really important, URI supports it, and adaptation is easy: add an alias for stringification.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.