Code Monkey home page Code Monkey logo

Comments (6)

juanfrilla avatar juanfrilla commented on June 12, 2024 1

Amazing @melroy89 , so I'll wait you to merge that pull request

from fake-useragent.

juanfrilla avatar juanfrilla commented on June 12, 2024 1

@melroy89 i did this with your tool:

from fake_useragent import UserAgent
from ua_parser import user_agent_parser
#Imagine you are scraping a page that has this headers and you want to rotate headers
headers = {
    "Connection": "keep-alive",
    "sec-ch-ua": '"Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"',
    "Accept": "*/*",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "X-Requested-With": "XMLHttpRequest",
    "sec-ch-ua-mobile": "?0",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
    "sec-ch-ua-platform": '"macOS"',
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Dest": "empty",
    "Accept-Language": "es-VE,es-US;q=0.9,es-419;q=0.8,es;q=0.7,en;q=0.6,pt;q=0.5",
}
chrome_data = UserAgent().getChrome
parsed_string = user_agent_parser.Parse(chrome_data["useragent"])
platform = parsed_string["os"]["family"]
version = chrome_data["version"]
if "mac" in platform.lower():
    platform = "macOS"
os_browser = {
    "User-Agent": chrome_data["useragent"],
    "sec-ch-ua-platform": platform,
    "sec-ch-ua": f'" Not A;Brand";v="99", "Chromium";v="{version}", "Google Chrome";v="{version}"',
}
headers.update(os_browser)

Here it's almost done with Chrome, but I need with edge, with mozilla, with mobile phone platform headers like "sec-ua-platform": "iPhone" or "sec-ua-platform": "Android"
Edge headers has its own sec-ch-ua , Mozilla headers dont have sec-ch-ua, then sec-ch-ua-mobile corresponding to that headers.

In summary, I want that fake-user-agent contemplate all of this and gives you the correct header corresponding to the user-agent it provided at the start.

from fake-useragent.

waleed-salama avatar waleed-salama commented on June 12, 2024 1

Great job @melroy89 , keep up the good work!

The thing is we need to be able to fetch the client hints headers from the database together with the user-agent. Nowadays, only user-agent is not enough; because websites are expecting to receive the client hints as well, and they expect them to match the user-agent (if it typically supports them).

Since chrome 110, for privacy reasons, the user agent is getting obfuscated to show less details and show more static data like (Android 10; K) for all android agents regardless of OS version or device model. The removed details are then available through the client hint headers @juanfrilla mentioned. Here is an example:

"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Sec-Ch-Ua": "\"Google Chrome\";v=\"119\", \"Chromium\";v=\"119\", \"Not?A_Brand\";v=\"24\"", 
"Sec-Ch-Ua-Mobile": "?0", 
"Sec-Ch-Ua-Platform": "\"macOS\"", 

*** My client is not running Mac OS X 10.15.7, I don't think any one is these days 😅, but that's the version they decided to keep for all macs, at least for all Intel macs, not sure about ARM-based ones. It's a static version. See here

These are not possible to get through the user-agent if it is from an agent like Chrome 110 or newer, which most browsers are currently. However, I don't see this data on willshouse.com either; it only provides the data that you added new methods for, but the headers are not necessarily the same as those, each actual agent could generate them differently. Now, if we generate the headers the way @juanfrilla did in his example, we would have some valid headers, but if they don't match the historical data at the target site, the visit could get flagged as a bot. See here

Since the client hints are relatively new, I couldn't find another data source which could provide them. But others could find a better solution. Maybe there is some way to get this data from willshouse.com?

from fake-useragent.

melroy89 avatar melroy89 commented on June 12, 2024

So I can't generate all the correct headers you are asking for. But what I can do is adding additional getxx properties for you, which return the whole object instead of only the ua string.

See PR: #216

from fake-useragent.

melroy89 avatar melroy89 commented on June 12, 2024

It's merged! And a new version is released: v1.3.0. Update via: pip install --upgrade fake-useragent

You can now try the new methods.. For example:

from fake_useragent import UserAgent
ua = UserAgent()
ua.getRandom

Small disclaimer: These dictionaries/objects could change in the future, so the key/value pairs you can back could change in the future! Because I just give the object back "as-is".

Here an example of such an output:

{'percent': 0.7, 'useragent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'system': 'Chrome 114.0 Win10', 'browser': 'chrome', 'version': 114.0, 'os': 'win10'}

from fake-useragent.

melroy89 avatar melroy89 commented on June 12, 2024

Again. I still don't understand why you also need a parser!? I just added new methods for you.

If you want chrome. Use the new ua.getChrome method.

This will give you a python object that already include the platform/os.

Example:

from fake_useragent import UserAgent
ua = UserAgent()
some_browser = ua.getChrome

some_browser['useragent']
some_browser['system']
some_browser['os']
some_browser['browser']

from fake-useragent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.