Code Monkey home page Code Monkey logo

Comments (8)

abgit avatar abgit commented on May 21, 2024

There's already a 'browsers' module that distinguish crawlers from others and 'os' that distinguish operative systems.
The problem here is to distinguish 'desktop' from 'mobile' devices.

Maybe this can only be done by prefixing operative systems module with type or create a new module specific for this propose. Examples:
'Windows NT 4.0' would change from 'Windows' to 'Desktop - Windows';
'iPhone' with change from 'Macintosh' to 'Mobile - Macintosh';
'iTunes' with change from 'Macintosh' to 'Desktop - Macintosh';

What's the best approach for this?

from goaccess.

allinurl avatar allinurl commented on May 21, 2024

Not sure about this. So far it's possible to know which ones are mobile (i.e., Android, iPhone, Blackberry, etc) and which ones are desktop. Perhaps we could add a new sub node under the actual OS, however, feels like it wouldn't serve much purpose?

from goaccess.

NinnOgTonic avatar NinnOgTonic commented on May 21, 2024

@allinurl I think perhaps it would be a good idea to consider reopening this issue?

We would love to have device type information in the JSON output, or alternatively pure UA strings grouped somehow perhaps? I.e. It would be nice to have Windows NT 6.3; ARM (Or those with touch identifiers perhaps?) type devices not just classified as windows, but rather surface devices and so forth imo?

from goaccess.

aphorise avatar aphorise commented on May 21, 2024

Just a thought for the longer term - can we strive to provide a comprehensive / near complete device coverage? - if we'd have a conscience lexical parallel of Device:to:Browser even with manufactor specific / vendor specific pollutants 😄 (ms, vs). I'd approximate that a database with ~10k-18k of known devices out there would be a 97%+ coverage of all thats mobile & maybe in circulation from NetFront, HTML4, Java, MMS, SymbianOS, WAP, WEB, WHL, Windows, WindowsCE, etc...

Using a few known listings & articles (0, 1, 2, 3, 4) as guide the following lexicon may be a good start:

UA keyword device full UA sample string of a specific device
UP. / up.b / up/ Openwave Mobile Browser / telephone AUDIOVOX-CDM-8915 UP.Browser/6.2.2.6.h.1.102 (GUI) MMP/2.0
BlackBerry, BlackBerry, UP., up.b, up/ vs BlackBerry device / telephone BlackBerry6510/4.0.0 UP.Browser/5.0.3.3
HP, hp-tablet or msvs HP device / telephone Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)
HTC, HTC_, HTC- or msvs HTC device / telephone HTC_Touch_HD_T8282 Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile 7.11)
HTC, HTC_, HTC- or msvs HTC device / telephone HTC_S310-Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; Smartphone; 176x220)
LGE, LG-, lg-, LG/ LGE tablet / telephone LG-T300/V100 Obigo/Q7.3 MMS/LG-MMS-V1.1/1.2 MediaPlayer/LGPlayer/1.0 Java/ASVM/1.1 Profile/MIDP-2.1 Configuration/C
LGE, LG-, lg-, LG/ LGE tablet / telephone Mozilla/5.0 (Linux; Android 4.1.2; LG-P760 Build/JZO54K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.99 Mobile Safari/537.36
SAMSUMNG, samsung, sam-, sam or msvs Samsung tablet / telephone SAMSUNG-SGH-A737/UCGI3 SHP/VPP/R5 NetFront/3.4 SMM-MMS/1.2.0 profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0
SAMSUMNG, samsung, sam-, sam or msvs Samsung tablet / telephone samr810 Netfront/3.4 Mozilla/5.0 like Gecko/20060426
sie-, SIE- Siemens telephone SIE-SK6R/46 UP.Browser/7.0.2.2.d.1.100(GUI) MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
SonyEricsson, sonyericsson, Sony Tablet, or msvs SonyEricsson tablet / telephone SonyEricssonK608i/R2BA Browser/SEMC-Browser/4.2 Profile/MIDP-2.0 Configuration/CLDC-1.1
SonyEricsson, sonyericsson, Sony Tablet, or msvs SonyEricsson tablet / telephone SonyEricssonK608i/R301 Profile/MIDP-1.0 Configuration/CLDC-1.0

A complete device focused list of manufacturers & relevant providers in the global context could comprises of approximately ~ 142 with more UA's by distributor / OEM ? :

A-Z of makes & vendors
Acer, Airness, Alcatel, Allview, Amazon, Amoi, Apple, Asus, AT&T, Audiovox
BlackBerry / RIM, Benefon, Benq, Benq-siemens, Bird, BLU, Bosch
Casio, Cat, Celkon, Chea, Coolpad, Cricket, Dell, DoCoMo (NTT Mobile)
EE, Elson, Emporia, Emblaze, Energizer, Ericsson, Eten, Ezio, Ezze
Fly, Foxconn, Fujitsu Siemens, G3, Garmin-asus, Gigabyte, Gionee
Haier, HP, HTC, Huawei, Jolla
I-mate, I-mobile, Icemobile, Innostream, iNQ, Intex
Konka, Kyocera, Karbonn, Lava, Lenovo, LGE
Maxon, Maxwest, Meizu, Micromax, Microsoft, Mitac, Mitsubishi, Mobilexp, Mobistel, Modelabs, Modu, MWg, Motorola, My Way
Nec, Neonode, Nintendo, NIU, Nokia, nook, Nvidia, O2, Obigo, Onda, Orange, OnePlus, Oppo
Palm, Panasonic, Pantech, Parla, Philips, Phoneone, Psion, Plum, Posh, Prestigio, QMobile, Qtek
Sagem, Samart, Samsung, Sanyo, Sec, Sendo, Sewon, Sharp, Siemens, Skyspring, Sonim, Sonim, Sony, Sony Ericsson, Spice, Sprint, SPV
T-mobile, Tel.Me., Techfaith, Thuraya, Toshiba
UCWEB, Uniscope, Uriver, Utec, Utstarcom, Vertu, verykool, Virgin, Vitelcom, Vivo, Vk Mobile, Vodafone, Voxtel
Wellcom, Wiko, WND, XCute, Xiaomi, XOLO, Yezz, Yota, YU, ZTE

This list excludes all potential UA such as those by application, crawler, proxy, service, etc, from the 1.5+ million (& growing) already classified in some directories with others yet to emerge for vehicle / automotive (your car) and television (not sure how unique these are) systems.

I can help in the compilation of the proposed DB.

from goaccess.

allinurl avatar allinurl commented on May 21, 2024

@overnine assuming we still want to use the ones posted above, i.e., desktop, mobile, others then I think we could add this as a new panel. Categorizing this could be tricky. Are those the only categories we could use? Should we have tablets as well?

@aphorise having a comprehensive list of browsers and OSs would be awesome, but we may need to refactor browsers.c & opesys.c since currently those are bottlenecks. (the size of the list is proportional to the run time, which makes them slow). We could have a large list mapped in a hash but it would need to be a full match search. Or bsearch perhaps?

For the record, this may be related to #152.

from goaccess.

NinnOgTonic avatar NinnOgTonic commented on May 21, 2024

@allinurl i only use goaccess for the json output, i dont really have any considerations in terms of usability of the curse interface. Also the most comprehensive resource i know of which handles this is https://github.com/serbanghita/Mobile-Detect

from goaccess.

aphorise avatar aphorise commented on May 21, 2024

@allinurl - I actually think we can do this adaptively / build on a global browser & device context. So instead of having fixed size table / bsearch or hash - have a tree that spans / adaptively increases based on new / differing UA's that read and that would only get a naive comparison against a master UA-DB / record once for every-thing unknown or new.

So basically the UA-DB (lexicon) can have two focuses:

  • by device (as per earlier comment & proposal) - that'd be a potential look up of 1 in ~ 18``k (estimate).
  • by service or anything else (as per known & remaining lot in excess of a million) - that'd be a potential look up of 1 in ~ 1.4``m (estimate).

The ideal UA lexicon would be ( adaptive-UA) adaptively built from the data-set (logs) being parsed and could be an O(n) algorithm if its stored in ahash or a O(n^2) if its in some other from (btree, link-list); either-way the scope of any set in most cases should be less than < complete-UA O(n) or O(n^2)thats the total space / list of everything thats known & can be targeted.

What'd be even more smart for real-time stats & scenarios where adaptive-UA can / may grow indefinably would be to have a stack-limit to a reasonable size 2^16 or 65535 - thereby dropping any single / one time occurrences; all that would be needed is a FIFO / FILO (first-in-first-out / first-in-last-out) precedence on the lower portion (say a reserve of 255) so as to always accommodate and record new / single 1 time agents that would always register and would be visible in the most recent time-frames. Ditto governing rules for what can move up the adaptive-UA

I hope I'm clearing conveying the intent.

from goaccess.

aphorise avatar aphorise commented on May 21, 2024

For reference the RAM / memory space requirements to hold all potential UA can be expressed as:

# for 65535 records
MAX / unlikely case
(2^16)-1=65,535 bytes per record max
65535^2=4,294,836,225 bytes (~``4``Gbytes)
AVG q3 / possible upper-quartile
(2^11)-1=2,047 bytes per record avg-q3
2,047*65535=134,150,145 bytes (~``134``Mbytes)
AVG q1 / possible lower-quartile
(2^9)-1=511 bytes per record avg-q1
511*65535=33,553,920 bytes (~33``Mbytes)
AVG / common sub-or-at-average
(2^8)-1=255 bytes per record real average
255*65535=16,711,425 bytes (~``16``Mbytes)

This is on the assumption that no UA are alike (all unique for 2^16-1) and that none shall ever exceed 64``Kbytes which is unreasonably high & frankly even graver than stupid if neared or exceed. The most obvious candidates of illogical uses which tend to exceed the common 512-1024 (bytes) are for example _Internet-E_xplorer (IE) or other debug or build specific text that may be embedded as part of the UA identifier - even these tend to be < 8``Kbytes at worse.

A few other rules can also be applied as objective determinant for anything not matching which can have a final criteria scope of device &/or browser :

  • DEVICE: 1 of 143 manufacturer-UA
  • BROWSER:1 of < ~ 64 application-UA chrome/blink et al, firefox et al, geko, trident / IE / MOSIAC, Netscape, Opera, Safari, Webkit, etc.

^^^ the same approach can also be applied initial for new / unknown UA so as to further scope the lookup by device-ua first then other-ua otherwise fallback to the first / best initial detection (if applicable).

from goaccess.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.