pauldix / domainatrix Goto Github PK
View Code? Open in Web Editor NEWA cruel mistress that uses the public suffix domain list to dominate URLs by canonicalizing, finding the public suffix, and breaking them into their domain parts.
A cruel mistress that uses the public suffix domain list to dominate URLs by canonicalizing, finding the public suffix, and breaking them into their domain parts.
p Domainatrix.parse("/test?foo=bar")
# => NoMethodError: undefined method `split' for nil:NilClass
p Domainatrix.parse("example.com/test?foo=bar")
# => NoMethodError: undefined method `split' for nil:NilClass
p Domainatrix.parse("www.example.com/test?foo=bar")
# => NoMethodError: undefined method `split' for nil:NilClass
p Domainatrix.parse("http://www.example.com/test?foo=bar")
#=> #<Domainatrix::Url:0x007fa4064d7810 @scheme="http", @host="www.example.com", @url="http://www.example.com/test?foo=bar", @public_suffix="com", @domain="example", @subdomain="www", @path="/test?foo=bar">
Produces a triple '/' when referencing an image from an asset pipeline
Domainatrix.parse('/assets/fallback/default_user_avatar.png').url
# 'http:///assets/fallback/default_user_avatar.png'
It would be nice to be able to pass in IP addresses, as often a website will run as the IP for testing. Eg. url = Domainatrix.parse(request.url)
where request.url may be 192.168.0.1 testing on the local network.
At the moment its throws 'You have a nil object when you didn't expect it!'
Please check the below examples
irb(main):120:0> parsed_url = Domainatrix.parse('http://www.priceking.in')
=> @Domain="pricekg"
irb(main):122:0> parsed_url = Domainatrix.parse('http://www.indiaplaza.in')
=> @Domain="diaplaza"
irb(main):124:0> parsed_url = Domainatrix.parse('http://www.comscore.com')
=> @Domain="score"
@Domain should have been comscore instead of score..
On the gem's home page, you write:
url = Domainatrix.parse("http://www.pauldix.net")
url.canonical # => "net.pauldix"
However, in IRB, I get the following behavior:
irb> url = Domainatrix.parse('http://www.pauldix.net')
=> #<Domainatrix::Url:0x007fd0409d5310 @scheme="http", @host="www.pauldix.net", @url="http://www.pauldix.net", @public_suffix="net", @domain="pauldix", @subdomain="www", @path="">
> url.canonical
=> "net.pauldix.www"
Is the www
supposed to be a part of the canonical name?
$ ruby -v
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin14.0.0]
$ gem list domainatrix -d
*** LOCAL GEMS ***
domainatrix (0.0.11)
Authors: Paul Dix, Brian John
Homepage: http://github.com/pauldix/domainatrix
Installed at: /Users/craibuc/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0
A cruel mistress that uses the public suffix domain list to dominate
URLs by canonicalizing, finding the public suffix, and breaking them
into their domain parts.
Suggestion:
At the moment, in order to get the full domain (minus subdomain) I have to:
url.domain + '.' + url.public_suffix
It would be nice to have one method that combines these :)
Domainatrix.parse("http://blog.andrina.web.id").public_suffix
=> "web.id"
web.id is NOT a public suffix according to the public suffix list (http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/effective_tld_names.dat?raw=1)
When using Domainatrix with Ruby 1.9.1 (p378 on OSX 10.6 i386) the following error occurs when calling Domainatrix.parse:
ArgumentError: invalid byte sequence in US-ASCII
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:14:in strip' from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:14:in
block in read_dat_file'
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:13:in each' from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:13:in
read_dat_file'
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:9:in initialize' from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix.rb:11:in
new'
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix.rb:11:in parse' from (irb):3 from /opt/bin/irb:12:in
FIX:
change domainatrix/domain_parser.rb:14 from:
line = line.strip
to: line = line.force_encoding('utf-8').strip
ruby-1.9.2-p0 > Domainatrix.parse('http://74.205.88.194/article/news/microsoft_ballmer_envious_ipads_success_insists_windows_tablets_are_priority') NoMethodError: undefined method `has_key?' for nil:NilClass from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:52:in `block in parse_domains_from_host' from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:47:in `each_index' from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:47:in `parse_domains_from_host' from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:33:in `parse' from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix.rb:12:in `parse' from (irb):2 from /Users/igrigorik/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `' ruby-1.9.2-p0 >
Obviously this isn't a legitimate TLD, but it's in use thanks to the Pow rack server by 37 Signals. It might make sense to add support for it, or at least not throw an error when it is used with parse:
>> Domainatrix.parse('http://google.com/')
=> #<Domainatrix::Url:0x00000104a196b8 @scheme="http", @host="google.com", @url="http://google.com/", @public_suffix="com", @domain="google", @subdomain="", @path="/">
>> Domainatrix.parse('http://google.dev/')
NoMethodError: undefined method `has_key?' for nil:NilClass
from [...]/whiny_nil.rb:48:in `method_missing'
from [...]/domainatrix-0.0.10/lib/domainatrix/domain_parser.rb:59:in `block in parse_domains_from_host'
from [...]/domain_parser.rb:54:in `each_index'
from [...]/domain_parser.rb:54:in `parse_domains_from_host'
from [...]/domain_parser.rb:40:in `parse'
Hi,
It seems that Domainatrix.parse() method fails when domain has no suffix eg, 'http://www.foo/'
$ irb
require 'domainatrix'
Domainatrix.parse('http://www.foo/')
NoMethodError: undefined method has_key?' for nil:NilClass from /Users/ami/.rvm/gems/ruby-1.9.2-head@rails3beta/gems/domainatrix-0.0.10/lib/domainatrix/domain_parser.rb:59:in
block in parse_domains_from_host'
Thanks,
Ami
Would be nice to include a validation method for example ...
url = Domainatrix.parse('http://www.test.com')
url.valid? # << returns true
url = Domainatrix.parse('http://www.test.madeupanddoesntexist')
url.valid? # << returns false
This could be done just by using a regular expression, however, also using the list of valid TLD's would be great.
Blows up when URL doesn't contain HTTP:// would be nice to make the HTTP:// optional
Code was tested under Sinatra
Error:-
concatenating "http://" is a workaround but it would be nice to have this within the gem itself..
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.