To support my open-source work, consider adding me on Patreon.
kevinelliott / agent_orange Goto Github PK
View Code? Open in Web Editor NEWParse and process User Agents like a secret one
Parse and process User Agents like a secret one
To support my open-source work, consider adding me on Patreon.
Hey, I'm looking for a useragent parser for my rails app, I'm looking at your project and this one: https://github.com/jilion/useragent
What would you say agent_orange does better compared to https://github.com/jilion/useragent?
Create a parser chain, that uses smart matchers to detect platforms/devices/etc.
Base class is AgentOrange::Matcher, and contains all the rules necessary to match. Matchers are stuffed in an ordered array, and detection will process them in order.
Some thought is necessary to this architecture and how it will be implemented.
The types returned from each method/attribute are really confusing - for example here:
> platform = ua.device.platform => "iPhone"
where ua.device.platform
actually returns a AgentOrange::Platform
object. Am I missing something? Documenting with YARD would be awesome, too...
Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A306 Safari/6531.22.7
It's actually iPhone 3GS with updated iOS version (I guess it's no longer possible to know that it's 3GS)
platform.version returns 'iPhone'
Loads of bots are being missed because the bot check is only checking content[:comment]
Some that are coming through for use and not being caught include:
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
compatible; Googlebot/2.1; +http://www.google.com/bot.html
compatible; YandexBot/3.0; +http://yandex.com/bots
compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li
compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/
Twitterbot/1.0
LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)
http://showyou.com/crawler
compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm
compatible; TweetmemeBot/3.0; +http://tweetmeme.com/
+http://search.msn.com/msnbot.htm
Is there a good reason why only the comment section is checked?
user_agents starting "compatible;" dont seem to be parsed at all
ua.device is returning 'Mobile' but ua.is_mobile? is returning false
Does this gem have tests?
Thanks for the great work on agent_orange!
We're using it to detect mobile UA's on http://www.classmonkeys.com to serve users a mobile experience. The only issue we've run into is that 10" tablets which fall in the gray area between mobile phone and laptop (which we want to serve a desktop web experience to) are reported as mobile.
Is there some way we haven't thought of using agent orange to see if something is a tablet?
When using Rails and bundler, the gemspec is flagged as invalid by bundler, and gem installation fails, when using git as a source.
For example, this line in my gemspec was failing with a bundler error "invalid gemspec":
gem 'agent_orange', '0.1.0', :git => 'git://github.com/eatenbyagrue/agent_orange.git', :branch => 'hiringthing'
I fixed in a branch of my fork with the following change to the gemspec.
s.files = "agent_orange-0.1.0.gem"
Allow developers to inject their own regex matchers for specific needs. These injections should prepend all other parsing so that their additions are prioritized.
Something like this might be nice:
AgentOrange.custom_matchers << AgentOrange::Matcher.new(name: 'Special Bot', regex: /.+SpecialBot.+/)
I added the Code Climate badge to the README, and it links to the code analysis by Code Climate. It's graded pretty poorly, since there is some redundancy code and other smells. Refactor to rid all smells and get a nice score.
What do you think that Facebook Crawler should be defined as bot? It has "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" user agent.
There appears to be an issue if the version number has not been extracted and is nil.
NoMethodError private method `gsub' called for nil:NilClass
agent_orange (0.0.9) lib/agent_orange/version.rb:30:in `sanitize_version_string'
I traced it down to the following user agents (in my test data)
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 6.0 [en]
Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) Opera 6.01 [en]
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]
Mozilla/0.91 Beta (Windows)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00
Mozilla/5.0 (Windows NT 5.1; U; en) Opera 8.00
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0 ; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ru) Opera 8.50
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Media Center PC
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 4.0) Opera 7.0 [en]
Mozilla/0.6 Beta (Windows)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.0
Mozilla/5.0 (Windows NT 5.1; U; en) Opera 8.01
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]
Mozilla/5.0 (Linux; U; Android 2.3.3; en-ie; HTC Wildfire S A510e Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
Mozilla/4.0 (compatible; MSIE 6.0; Windows ME) Opera 7.11 [en]
Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)
Hey Kevin,
Have run agent_orange over a sample of user_agents. Failed on only 46 of around 12,000. Not bad!
Here's a list of the failed ones.
1PasswordThumbs/1 CFNetwork/454.11.12 Darwin/10.7.0 (i386) (MacBookPro7%2C1)
Aboundex/0.2 (http://www.aboundex.com/crawler/)
Anemone/0.6.0
Baiduspider+(+http://www.baidu.com/search/spider.htm)
cmsworldmap.com
Covario-IDS/1.0 (Covario; http://www.covario.com/ids; support at covario dot com)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 1 subscribers; feed-id=7611448681530808886)
Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 3 subscribers; feed-id=8646581579435208956)
Googlebot-Image/1.0
Java/1.4.1_04
Java/1.6.0_04
Java/1.6.0_16
Java/1.6.0_20
Java/1.6.0_27
libwww-perl/6.02
linkdex.com/v2.0
MetaURI API/2.0 +metauri.com
Microsoft URL Control - 6.00.8169
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (compatible; Embedly/0.2; +http://support.embed.ly/)
Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
percbotspider
Plesk
PostRank/2.0 (postrank.com; 1 subscribers)
R6_CommentReader(www.radian6.com/crawler)
radian6_default_(www.radian6.com/crawler)
Reeder/1.5.1 CFNetwork/485.12.7 Darwin/10.4.0
Reeder/1.5.1 CFNetwork/485.13.9 Darwin/11.0.0
Reeder/1.5.1 CFNetwork/544 Darwin/11.0.0
Reeder/1.5.1 CFNetwork/547 Darwin/11.0.0
Reeder/1010.19.00 CFNetwork/520.0.13 Darwin/11.1.0 (i386) (MacBook4%2C1)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.0.0 (x86_64) (MacBookPro5%2C3)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.0.0 (x86_64) (MacBookPro6%2C2)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.1.0 (i386) (MacBookPro6%2C2)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.1.0 (x86_64) (iMac11%2C1)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.1.0 (x86_64) (MacBook5%2C1)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.1.0 (x86_64) (MacBookAir3%2C1)
Reeder/1010.29.00 CFNetwork/520.0.13 Darwin/11.1.0 (x86_64) (MacBookPro5%2C3)
Reeder/2.5.1 CFNetwork/485.13.9 Darwin/11.0.0
Reeder/2.5.1 CFNetwork/547 Darwin/11.0.0
Ruby
SBIder/Nutch-1.0-dev (http://www.sitesell.com/sbider.html)
ThumbnailService/39001 CFNetwork/520.0.13 Darwin/11.1.0 (x86_64) (MacBookPro5%2C2)
xpymep.exe
For this user agent:
5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.165 Safari/535.19
It is identified as Safari when doing:
device.engine.browser.to_s
There have been a few people over the last couple of months who have not liked the name of the project (and thus the gem). It was originally intended to be a sociopolitical statement against the use of the chemical warfare used during wartime (and for experimentation on unknowing people). For whatever reason, it is not necessarily apparent to people and may be offending them. @zeke even went so far to fork and replace all mention of the name to something else.
Should we change the project and gem name? If so, any suggestions?
We currently have basic testing in place that primarily tests what we expect a User Agent to detect as. Deeper unit and functional tests should be created to test our object-heavy design and ensure core functionality is functioning as expected.
Mozilla/5.0 (Linux; U; Android 2.3.7; en-us; Nexus One Build/GRK39F) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
It's identified as platform: PC, browser: Mobile Safari
on IE 11:
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko
it detects engine_version, browser and browser_version as empty strings:
{"platform":"windows","device":"","engine":"gecko","engine_version":"","browser":"","browser_version":""}
FYI: http://msdn.microsoft.com/en-us/library/ie/bg182625%28v=vs.85%29
Thanks!
Is there a way to detect the "top level" version of Safari?
For instance, Safari 6.0.2 has a "version" of 536.26.17. I want to show 6.0.2
, though, and I can't figure out a way to get that.
Thanks for making this gem! I was wondering if you had considered adding search engine detection? I see that it currently checks whether something is a bot, but it would be awesome if users could do something like:
ua.is_search_engine? # => true
us.referrer.search_engine.name # => "Google"
I have a simple spec
ua = AgentOrange::UserAgent.new("Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" )
ua.is_bot?.should eql true
It returns false
I added
id_new_190610_333333 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/) Pingdom bot R http://www.pingdom.comTo the end of the xml but it's still not correctly marking this as a bot.
What is the deal with the xml, how do I add a new bot to the list?
cannot be 'os' as :
attr_accessor :operating_system
ua.device.operating_system
Apple iOS 5.1
The project currently has no tests. Before Milestone 0.1.0 is hit, we should have tests will full coverage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.