Code Monkey home page Code Monkey logo

selenium-wire's Introduction

Selenium Wire is no longer being maintained. Thank you for your support and all your contributions.

Selenium Wire

Selenium Wire extends Selenium's Python bindings to give you access to the underlying requests made by the browser. You author your code in the same way as you do with Selenium, but you get extra APIs for inspecting requests and responses and making changes to them on the fly.

image

image

image

image

image

image

Simple Example

from seleniumwire import webdriver  # Import from seleniumwire

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )

Prints:

https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
...

Features

  • Pure Python, user-friendly API
  • HTTP and HTTPS requests captured
  • Intercept requests and responses
  • Modify headers, parameters, body content on the fly
  • Capture websocket messages
  • HAR format supported
  • Proxy server support

Compatibilty

  • Python 3.7+
  • Selenium 4.0.0+
  • Chrome, Firefox, Edge and Remote Webdriver supported

Table of Contents

Installation

Install using pip:

pip install selenium-wire

If you get an error about not being able to build cryptography you may be running an old version of pip. Try upgrading pip with python -m pip install --upgrade pip and then re-run the above command.

Browser Setup

No specific configuration should be necessary except to ensure that you have downloaded the relevent webdriver executable for your browser and placed it somewhere on your system PATH.

OpenSSL

Selenium Wire requires OpenSSL for decrypting HTTPS requests. This is probably already installed on your system (you can check by running openssl version on the command line). If it's not installed you can install it with:

Linux

# For apt based Linux systems
sudo apt install openssl

# For RPM based Linux systems
sudo yum install openssl

# For Linux alpine
sudo apk add openssl

MacOS

brew install openssl

Windows

No installation is required.

Creating the Webdriver

Ensure that you import webdriver from the seleniumwire package:

from seleniumwire import webdriver

Then just instantiate the webdriver as you would if you were using Selenium directly. You can pass in any desired capabilities or browser specific options - such as the executable path, headless mode etc. Selenium Wire also has it's own options that can be passed in the seleniumwire_options attribute.

# Create the driver with no options (use defaults)
driver = webdriver.Chrome()

# Or create using browser specific options and/or seleniumwire_options options
driver = webdriver.Chrome(
    options = webdriver.ChromeOptions(...),
    seleniumwire_options={...}
)

Note that for sub-packages of webdriver, you should continue to import these directly from selenium. For example, to import WebDriverWait:

# Sub-packages of webdriver must still be imported from `selenium` itself
from selenium.webdriver.support.ui import WebDriverWait

Remote Webdriver

Selenium Wire has limited support for using the remote webdriver client. When you create an instance of the remote webdriver, you need to specify the hostname or IP address of the machine (or container) running Selenium Wire. This allows the remote instance to communicate back to Selenium Wire with its requests and responses.

options = {
    'addr': 'hostname_or_ip'  # Address of the machine running Selenium Wire. Explicitly use 127.0.0.1 rather than localhost if remote session is running locally.
}
driver = webdriver.Remote(
    command_executor='http://www.example.com',
    seleniumwire_options=options
)

If the machine running the browser needs to use a different address to talk to the machine running Selenium Wire you need to configure the browser manually. This issue goes into more detail.

Accessing Requests

Selenium Wire captures all HTTP/HTTPS traffic made by the browser1. The following attributes provide access to requests and responses.

driver.requests

The list of captured requests in chronological order.

driver.last_request

Convenience attribute for retrieving the most recently captured request. This is more efficient than using driver.requests[-1].

driver.wait_for_request(pat, timeout=10)

This method will wait until it sees a request matching a pattern. The pat attribute will be matched within the request URL. pat can be a simple substring or a regular expression. Note that driver.wait_for_request() doesn't make a request, it just waits for a previous request made by some other action and it will return the first request it finds. Also note that since pat can be a regular expression, you must escape special characters such as question marks with a slash. A TimeoutException is raised if no match is found within the timeout period.

For example, to wait for an AJAX request to return after a button is clicked:

# Click a button that triggers a background request to https://server/api/products/12345/
button_element.click()

# Wait for the request/response to complete
request = driver.wait_for_request('/api/products/12345/')
driver.har

A JSON formatted HAR archive of HTTP transactions that have taken place. HAR capture is turned off by default and you must set the enable_har option to True before using driver.har.

driver.iter_requests()

Returns an iterator over captured requests. Useful when dealing with a large number of requests.

driver.request_interceptor

Used to set a request interceptor. See Intercepting Requests and Responses.

driver.response_interceptor

Used to set a response interceptor.

Clearing Requests

To clear previously captured requests and HAR entries, use del:

del driver.requests

Request Objects

Request objects have the following attributes.

body

The request body as bytes. If the request has no body the value of body will be empty, i.e. b''.

cert

Information about the server SSL certificate in dictionary format. Empty for non-HTTPS requests.

date

The datetime the request was made.

headers

A dictionary-like object of request headers. Headers are case-insensitive and duplicates are permitted. Asking for request.headers['user-agent'] will return the value of the User-Agent header. If you wish to replace a header, make sure you delete the existing header first with del request.headers['header-name'], otherwise you'll create a duplicate.

host

The request host, e.g. www.example.com

method

The HTTP method, e.g. GET or POST etc.

params

A dictionary of request parameters. If a parameter with the same name appears more than once in the request, it's value in the dictionary will be a list.

path

The request path, e.g. /some/path/index.html

querystring

The query string, e.g. foo=bar&spam=eggs

response

The response object associated with the request. This will be None if the request has no response.

url

The request URL, e.g. https://www.example.com/some/path/index.html?foo=bar&spam=eggs

ws_messages

Where the request is a websocket handshake request (normally with a URL starting wss://) then ws_messages will contain a list of any websocket messages sent and received. See WebSocketMessage Objects.

Request objects have the following methods.

abort(error_code=403)

Trigger immediate termination of the request with the supplied error code. For use within request interceptors. See Example: Block a request.

create_response(status_code, headers=(), body=b'')

Create a response and return it without sending any data to the remote server. For use within request interceptors. See Example: Mock a response.

WebSocketMessage Objects

These objects represent websocket messages sent between the browser and server and vice versa. They are held in a list by request.ws_messages on websocket handshake requests. They have the following attributes.

content

The message content which may be either str or bytes.

date

The datetime of the message.

from_client

True when the message was sent by the client and False when sent by the server.

Response Objects

Response objects have the following attributes.

body

The response body as bytes. If the response has no body the value of body will be empty, i.e. b''. Sometimes the body may have been compressed by the server. You can prevent this with the disable_encoding option. To manually decode an encoded response body you can do:

from seleniumwire.utils import decode

body = decode(response.body, response.headers.get('Content-Encoding', 'identity'))
date

The datetime the response was received.

headers

A dictionary-like object of response headers. Headers are case-insensitive and duplicates are permitted. Asking for response.headers['content-length'] will return the value of the Content-Length header. If you wish to replace a header, make sure you delete the existing header first with del response.headers['header-name'], otherwise you'll create a duplicate.

reason

The reason phrase, e.g. OK or Not Found etc.

status_code

The status code of the response, e.g. 200 or 404 etc.

Intercepting Requests and Responses

As well as capturing requests and responses, Selenium Wire allows you to modify them on the fly using interceptors. An interceptor is a function that gets invoked with requests and responses as they pass through Selenium Wire. Within an interceptor you can modify the request and response as you see fit.

You set your interceptor functions using the driver.request_interceptor and driver.response_interceptor attributes before you start using the driver. A request interceptor should accept a single argument for the request. A response interceptor should accept two arguments, one for the originating request and one for the response.

Example: Add a request header

def interceptor(request):
    request.headers['New-Header'] = 'Some Value'

driver.request_interceptor = interceptor
driver.get(...)

# All requests will now contain New-Header

How can I check that a header has been set correctly? You can print the headers from captured requests after the page has loaded using driver.requests, or alternatively point the webdriver at https://httpbin.org/headers which will echo the request headers back to the browser so you can view them.

Example: Replace an existing request header

Duplicate header names are permitted in an HTTP request, so before setting the replacement header you must first delete the existing header using del like in the following example, otherwise two headers with the same name will exist (request.headers is a special dictionary-like object that allows duplicates).

def interceptor(request):
    del request.headers['Referer']  # Remember to delete the header first
    request.headers['Referer'] = 'some_referer'  # Spoof the referer

driver.request_interceptor = interceptor
driver.get(...)

# All requests will now use 'some_referer' for the referer

Example: Add a response header

def interceptor(request, response):  # A response interceptor takes two args
    if request.url == 'https://server.com/some/path':
        response.headers['New-Header'] = 'Some Value'

driver.response_interceptor = interceptor
driver.get(...)

# Responses from https://server.com/some/path will now contain New-Header

Example: Add a request parameter

Request parameters work differently to headers in that they are calculated when they are set on the request. That means that you first have to read them, then update them, and then write them back - like in the following example. Parameters are held in a regular dictionary, so parameters with the same name will be overwritten.

def interceptor(request):
    params = request.params
    params['foo'] = 'bar'
    request.params = params

driver.request_interceptor = interceptor
driver.get(...)

# foo=bar will be added to all requests

Example: Update JSON in a POST request body

import json

def interceptor(request):
    if request.method == 'POST' and request.headers['Content-Type'] == 'application/json':
        # The body is in bytes so convert to a string
        body = request.body.decode('utf-8')
        # Load the JSON
        data = json.loads(body)
        # Add a new property
        data['foo'] = 'bar'
        # Set the JSON back on the request
        request.body = json.dumps(data).encode('utf-8')
        # Update the content length
        del request.headers['Content-Length']
        request.headers['Content-Length'] = str(len(request.body))

driver.request_interceptor = interceptor
driver.get(...)

Example: Basic authentication

If a site requires a username/password, you can use a request interceptor to add authentication credentials to each request. This will stop the browser from displaying a username/password pop-up.

import base64

auth = (
    base64.encodebytes('my_username:my_password'.encode())
    .decode()
    .strip()
)

def interceptor(request):
    if request.host == 'host_that_needs_auth':
        request.headers['Authorization'] = f'Basic {auth}'

driver.request_interceptor = interceptor
driver.get(...)

# Credentials will be transmitted with every request to "host_that_needs_auth"

Example: Block a request

You can use request.abort() to block a request and send an immediate response back to the browser. An optional error code can be supplied. The default is 403 (forbidden).

def interceptor(request):
    # Block PNG, JPEG and GIF images
    if request.path.endswith(('.png', '.jpg', '.gif')):
        request.abort()

driver.request_interceptor = interceptor
driver.get(...)

# Requests for PNG, JPEG and GIF images will result in a 403 Forbidden

Example: Mock a response

You can use request.create_response() to send a custom reply back to the browser. No data will be sent to the remote server.

def interceptor(request):
    if request.url == 'https://server.com/some/path':
        request.create_response(
            status_code=200,
            headers={'Content-Type': 'text/html'},  # Optional headers dictionary
            body='<html>Hello World!</html>'  # Optional body
        )

driver.request_interceptor = interceptor
driver.get(...)

# Requests to https://server.com/some/path will have their responses mocked

Have any other examples you think could be useful? Feel free to submit a PR.

Unset an interceptor

To unset an interceptor, use del:

del driver.request_interceptor
del driver.response_interceptor

Limiting Request Capture

Selenium Wire works by redirecting browser traffic through an internal proxy server it spins up in the background. As requests flow through the proxy they are intercepted and captured. Capturing requests can slow things down a little but there are a few things you can do to restrict what gets captured.

driver.scopes

This accepts a list of regular expressions that will match the URLs to be captured. It should be set on the driver before making any requests. When empty (the default) all URLs are captured.

driver.scopes = [
    '.*stackoverflow.*',
    '.*github.*'
]

driver.get(...)  # Start making requests

# Only request URLs containing "stackoverflow" or "github" will now be captured

Note that even if a request is out of scope and not captured, it will still travel through Selenium Wire.

seleniumwire_options.disable_capture

Use this option to switch off request capture. Requests will still pass through Selenium Wire and through any upstream proxy you have configured but they won't be intercepted or stored. Request interceptors will not execute.

options = {
    'disable_capture': True  # Don't intercept/store any requests
}
driver = webdriver.Chrome(seleniumwire_options=options)
seleniumwire_options.exclude_hosts

Use this option to bypass Selenium Wire entirely. Any requests made to addresses listed here will go direct from the browser to the server without involving Selenium Wire. Note that if you've configured an upstream proxy then these requests will also bypass that proxy.

options = {
    'exclude_hosts': ['host1.com', 'host2.com']  # Bypass Selenium Wire for these hosts
}
driver = webdriver.Chrome(seleniumwire_options=options)
request.abort()

You can abort a request early by using request.abort() from within a request interceptor. This will send an immediate response back to the client without the request travelling any further. You can use this mechanism to block certain types of requests (e.g. images) to improve page load performance.

def interceptor(request):
    # Block PNG, JPEG and GIF images
    if request.path.endswith(('.png', '.jpg', '.gif')):
        request.abort()

driver.request_interceptor = interceptor

driver.get(...)  # Start making requests

Request Storage

Captured requests and responses are stored in the system temp folder by default (that's /tmp on Linux and usually C:\Users\<username>\AppData\Local\Temp on Windows) in a sub-folder called .seleniumwire. To change where the .seleniumwire folder gets created you can use the request_storage_base_dir option:

options = {
    'request_storage_base_dir': '/my/storage/folder'  # .seleniumwire will get created here
}
driver = webdriver.Chrome(seleniumwire_options=options)

In-Memory Storage

Selenium Wire also supports storing requests and responses in memory only, which may be useful in certain situations - e.g. if you're running short lived Docker containers and don't want the overhead of disk persistence. You can enable in-memory storage by setting the request_storage option to memory:

options = {
    'request_storage': 'memory'  # Store requests and responses in memory only
}
driver = webdriver.Chrome(seleniumwire_options=options)

If you're concerned about the amount of memory that may be consumed, you can restrict the number of requests that are stored with the request_storage_max_size option:

options = {
    'request_storage': 'memory',
    'request_storage_max_size': 100  # Store no more than 100 requests in memory
}
driver = webdriver.Chrome(seleniumwire_options=options)

When the max size is reached, older requests are discarded as newer requests arrive. Keep in mind that if you restrict the number of requests being stored, requests may have disappeared from storage by the time you come to retrieve them with driver.requests or driver.wait_for_request() etc.

Proxies

If the site you are accessing sits behind a proxy server you can tell Selenium Wire about that proxy server in the options you pass to the webdriver.

The configuration takes the following format:

options = {
    'proxy': {
        'http': 'http://192.168.10.100:8888',
        'https': 'https://192.168.10.100:8888',
        'no_proxy': 'localhost,127.0.0.1'
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)

To use HTTP Basic Auth with your proxy, specify the username and password in the URL:

options = {
    'proxy': {
        'https': 'https://user:[email protected]:8888',
    }
}

For authentication other than Basic, you can supply the full value for the Proxy-Authorization header using the custom_authorization option. For example, if your proxy used the Bearer scheme:

options = {
    'proxy': {
        'https': 'https://192.168.10.100:8888',  # No username or password used
        'custom_authorization': 'Bearer mytoken123'  # Custom Proxy-Authorization header value
    }
}

More info on the Proxy-Authorization header can be found here.

The proxy configuration can also be loaded through environment variables called HTTP_PROXY, HTTPS_PROXY and NO_PROXY:

$ export HTTP_PROXY="http://192.168.10.100:8888"
$ export HTTPS_PROXY="https://192.168.10.100:8888"
$ export NO_PROXY="localhost,127.0.0.1"

SOCKS

Using a SOCKS proxy is the same as using an HTTP based one but you set the scheme to socks5:

options = {
    'proxy': {
        'http': 'socks5://user:[email protected]:8888',
        'https': 'socks5://user:[email protected]:8888',
        'no_proxy': 'localhost,127.0.0.1'
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)

You can leave out the user and pass if your proxy doesn't require authentication.

As well as socks5, the schemes socks4 and socks5h are supported. Use socks5h when you want DNS resolution to happen on the proxy server rather than on the client.

Using Selenium Wire with Tor

See this example if you want to run Selenium Wire with Tor.

Switching Dynamically

If you want to change the proxy settings for an existing driver instance, use the driver.proxy attribute:

driver.get(...)  # Using some initial proxy

# Change the proxy
driver.proxy = {
    'https': 'https://user:[email protected]:8888',
}

driver.get(...)  # These requests will use the new proxy

To clear a proxy, set driver.proxy to an empty dict {}.

This mechanism also supports the no_proxy and custom_authorization options.

Bot Detection

Selenium Wire will integrate with undetected-chromedriver if it finds it in your environment. This library will transparently modify ChromeDriver to prevent it from triggering anti-bot measures on websites.

If you wish to take advantage of this make sure you have undetected_chromedriver installed:

pip install undetected-chromedriver

Then in your code, import the seleniumwire.undetected_chromedriver package:

import seleniumwire.undetected_chromedriver as uc

chrome_options = uc.ChromeOptions()

driver = uc.Chrome(
    options=chrome_options,
    seleniumwire_options={}
)

Certificates

Selenium Wire uses it's own root certificate to decrypt HTTPS traffic. It is not normally necessary for the browser to trust this certificate because Selenium Wire tells the browser to add it as an exception. This will allow the browser to function normally, but it will display a "Not Secure" message (and/or unlocked padlock) in the address bar. If you wish to get rid of this message you can install the root certificate manually.

You can download the root certificate here. Once downloaded, navigate to "Certificates" in your browser settings and import the certificate in the "Authorities" section.

Using Your Own Certificate

If you would like to use your own root certificate you can supply the path to the certificate and the private key using the ca_cert and ca_key options.

If you do specify your own certificate, be sure to manually delete Selenium Wire's temporary storage folder. This will clear out any existing certificates that may have been cached from previous runs.

All Options

A summary of all options that can be passed to Selenium Wire via the seleniumwire_options webdriver attribute.

addr

The IP address or hostname of the machine running Selenium Wire. This defaults to 127.0.0.1. You may want to change this to the public IP of the machine (or container) if you're using the remote webdriver.

options = {
    'addr': '192.168.0.10'  # Use the public IP of the machine
}
driver = webdriver.Chrome(seleniumwire_options=options)
auto_config

Whether Selenium Wire should auto-configure the browser for request capture. True by default.

ca_cert

The path to a root (CA) certificate if you prefer to use your own certificate rather than use the default.

options = {
    'ca_cert': '/path/to/ca.crt'  # Use own root certificate
}
driver = webdriver.Chrome(seleniumwire_options=options)
ca_key

The path to the private key if you're using your own root certificate. The key must always be supplied when using your own certificate.

options = {
    'ca_key': '/path/to/ca.key'  # Path to private key
}
driver = webdriver.Chrome(seleniumwire_options=options)
disable_capture

Disable request capture. When True nothing gets intercepted or stored. False by default.

options = {
    'disable_capture': True  # Don't intercept/store any requests.
}
driver = webdriver.Chrome(seleniumwire_options=options)
disable_encoding

Ask the server to send back uncompressed data. False by default. When True this sets the Accept-Encoding header to identity for all outbound requests. Note that it won't always work - sometimes the server may ignore it.

options = {
    'disable_encoding': True  # Ask the server not to compress the response
}
driver = webdriver.Chrome(seleniumwire_options=options)
enable_har

When True a HAR archive of HTTP transactions will be kept which can be retrieved with driver.har. False by default.

options = {
    'enable_har': True  # Capture HAR data, retrieve with driver.har
}
driver = webdriver.Chrome(seleniumwire_options=options)
exclude_hosts

A list of addresses for which Selenium Wire should be bypassed entirely. Note that if you have configured an upstream proxy then requests to excluded hosts will also bypass that proxy.

options = {
    'exclude_hosts': ['google-analytics.com']  # Bypass these hosts
}
driver = webdriver.Chrome(seleniumwire_options=options)
ignore_http_methods

A list of HTTP methods (specified as uppercase strings) that should be ignored by Selenium Wire and not captured. The default is ['OPTIONS'] which ignores all OPTIONS requests. To capture all request methods, set ignore_http_methods to an empty list:

options = {
    'ignore_http_methods': []  # Capture all requests, including OPTIONS requests
}
driver = webdriver.Chrome(seleniumwire_options=options)
port

The port number that Selenium Wire's backend listens on. You don't normally need to specify a port as a random port number is chosen automatically.

options = {
    'port': 9999  # Tell the backend to listen on port 9999 (not normally necessary to set this)
}
driver = webdriver.Chrome(seleniumwire_options=options)
proxy

The upstream proxy server configuration if you're using a proxy.

options = {
    'proxy': {
        'http': 'http://user:[email protected]:8888',
        'https': 'https://user:[email protected]:8889',
        'no_proxy': 'localhost,127.0.0.1'
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)
request_storage

The type of storage to use. Selenium Wire defaults to disk based storage, but you can switch to in-memory storage by setting this option to memory:

options = {
    'request_storage': 'memory'  # Store requests and responses in memory only
}
driver = webdriver.Chrome(seleniumwire_options=options)
request_storage_base_dir

The base location where Selenium Wire stores captured requests and responses when using its default disk based storage. This defaults to the system temp folder (that's /tmp on Linux and usually C:\Users\<username>\AppData\Local\Temp on Windows). A sub-folder called .seleniumwire will get created here to store the captured data.

options = {
    'request_storage_base_dir': '/my/storage/folder'  # .seleniumwire will get created here
}
driver = webdriver.Chrome(seleniumwire_options=options)
request_storage_max_size

The maximum number of requests to store when using in-memory storage. Unlimited by default. This option currently has no effect when using the default disk based storage.

options = {
    'request_storage': 'memory',
    'request_storage_max_size': 100  # Store no more than 100 requests in memory
}
driver = webdriver.Chrome(seleniumwire_options=options)
suppress_connection_errors

Whether to suppress connection related tracebacks. True by default, meaning that harmless errors that sometimes occur at browser shutdown do not alarm users. When suppressed, the connection error message is logged at DEBUG level without a traceback. Set to False to allow exception propagation and see full tracebacks.

options = {
    'suppress_connection_errors': False  # Show full tracebacks for any connection errors
}
driver = webdriver.Chrome(seleniumwire_options=options)
verify_ssl

Whether SSL certificates should be verified. False by default, which prevents errors with self-signed certificates.

options = {
    'verify_ssl': True  # Verify SSL certificates but beware of errors with self-signed certificates
}
driver = webdriver.Chrome(seleniumwire_options=options)

License

MIT


  1. Selenium Wire ignores OPTIONS requests by default, as these are typically uninteresting and just add overhead. If you want to capture OPTIONS requests, you need to set the ignore_http_methods option to [].

selenium-wire's People

Contributors

acidbotmaker avatar anujith-singh avatar axgdev avatar bobdu avatar idxn avatar joelluijmes avatar kzt-ysmr avatar mattwmaster58 avatar mynameisfiber avatar nck avatar pheels avatar pr11t avatar rgalexander216 avatar royopa avatar silverfruity avatar spinda avatar sterliakov avatar wahost070 avatar wkeeling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

selenium-wire's Issues

502 Bad Gateway (non-SSL)

Hi,

I'm seeing what look like random 502 errors with both the Chrome and Firefox webdrivers. Here's a debug log. You'll many successful requests and then a socket timeout. The same test steps work when run manually outside of Selenium, so the website under test doesn't appear to be timing out (unless the socket timeout is very low!).

I'm able to "reproduce" (random) this on two Windows machines, Windows 10 and Windows Server 2016. This is the latest version of seleniumwire and selenium.

Is there anything else I can do to try to isolate the issue?

Thanks,

Chrome window opens but nothing happens

Using Chrome 57 on Linux, with ChromeDriver 2.41.578700

The browser window opens with a message saying that it is being remotely controlled, but nothing happens.

ERR_PROXY_CONNECTION_FAILED after clicking any link in the webpage.

I am trying to get the json response from a webpage which needs login.

It works for the first page. I can get the http response. But once webdriver clicked on a link, and tried to go to another page, the page will load but non of the links in the newly loaded page work.

If you click on any of the link, the browser will say: ERR_PROXY_CONNECTION_FAILED.

I have tried to switch to selenium. No such problem exists. Of course, I cannot get the json files as desired.

502 Bad Gateway

Could be my problem is covered under a different issue(#10), but the fix they mention doesn't appear to be available to me.

I'm running headless Chrome from a RHEL 7.6 container. At this point, just trying to get to the login page of a device. Device has an invalid certificate.

Here's the logging output:

/opt/app-root/lib/python3.6/site-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

InsecureRequestWarning)

starting browser with headless-true

DEBUG:selenium.webdriver.remote.remote_connection:POST http://127.0.0.1:53308/session/b5e0dba596d67a133b740d2fc5541579/url {"url": "http://10.10.10.1", "sessionId": "b5e0dba596d67a133b740d2fc5541579"}

INFO:seleniumwire.proxy.handler:Capturing request: http://10.10.10.1/

INFO:seleniumwire.proxy.handler:Capturing response: http://10.10.10.1/ 302 Found

DEBUG:seleniumwire.proxy.handler:http://10.10.10.1/ 302

DEBUG:seleniumwire.proxy.handler:10.10.10.1:443 200

INFO:seleniumwire.proxy.handler:Capturing request: https://10.10.10.1/

DEBUG:seleniumwire.proxy.handler:code 502, message Bad Gateway

Traceback (most recent call last):

File "/opt/app-root/lib/python3.6/site-packages/seleniumwire/proxy/proxy2.py", line 108, in do_GET

conn.request(self.command, path, req_body, dict(req.headers))

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/http/client.py", line 1239, in request

self._send_request(method, url, body, headers, encode_chunked)

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/http/client.py", line 1285, in _send_request

self.endheaders(body, encode_chunked=encode_chunked)

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/http/client.py", line 1234, in endheaders

self._send_output(message_body, encode_chunked=encode_chunked)

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/http/client.py", line 1026, in _send_output

self.send(msg)

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/http/client.py", line 964, in send

self.connect()

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/http/client.py", line 1400, in connect

server_hostname=server_hostname)

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 407, in wrap_socket

_context=self, _session=session)

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 814, in init

self.do_handshake()

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 1068, in do_handshake

self._sslobj.do_handshake()

File "/opt/rh/rh-python36/root/usr/lib64/python3.6/ssl.py", line 689, in do_handshake

self._sslobj.do_handshake()

ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)

DEBUG:seleniumwire.proxy.handler:https://10.10.10.1/ 502

DEBUG:urllib3.connectionpool:http://127.0.0.1:53308 "POST /session/b5e0dba596d67a133b740d2fc5541579/url HTTP/1.1" 200 72

DEBUG:selenium.webdriver.remote.remote_connection:Finished Request

Terminating browser session.

INFO:seleniumwire.proxy.client:Destroying proxy

DEBUG:seleniumwire.proxy.storage:Cleaning up /opt/app-root/src/.seleniumwire/storage-55d28002-30cf-48be-a596-ed94e241aa00

DEBUG:selenium.webdriver.remote.remote_connection:DELETE http://127.0.0.1:53308/session/b5e0dba596d67a133b740d2fc5541579 {"sessionId": "b5e0dba596d67a133b740d2fc5541579"}

DEBUG:urllib3.connectionpool:http://127.0.0.1:53308 "DELETE /session/b5e0dba596d67a133b740d2fc5541579 HTTP/1.1" 200 72

DEBUG:selenium.webdriver.remote.remote_connection:Finished Request

Works fine when importing vanilla Selenium.

Certificates not valid on Chrome

Related to #2

This behavior has also been observed when using Chrome in a regular test. The certificate is not considered valid.

Windows Server 2016.
Chrome version - 59.0.307 & latest

Test setup:

from seleniumwire import webdriver
from seleniumwire.webdriver import ChromeOptions
import time

options = {
    'https': 'proxy detail',
    'disable_encoding': True
}

browser = webdriver.Chrome(seleniumwire_options = options)
browser.header_overrides = { 'Cookie': 'cookie string' }

browser.get('url')

HTTP Response 501 when PATCH requests made under selenium-wire

When making PATCH requests using selenium-wire, I get a HTTP Error 501 as a response. On further inspection, some response headers are missing. The following screenshot captures the state of the headers under selenium-wire:

fail

The next screenshot captures the state of the headers when not using selenium-wire:

pass

I would like to know why the "Allow" header is not present when using selenium-wire.

Certificates not valid on Chrome when running standalone proxy

This does not affect selenium tests and is only an issue when running Selenium Wire in standalone mode with:

python -m seleniumwire standaloneproxy port=12345

When using Chrome with the standalone proxy it reports that the server's certificate is not valid. This is because the certificate is created without a Subject Alternative Name. This is not a problem on older versions of Chrome, but is on newer versions as Chrome has tightened up its security.

The solution is to ensure that OpenSSL sets the Subject Alternative Name when generating certificates. There is a command line option for this in newer versions of OpenSSL but not in older versions.

FileNotFoundError

----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 58822)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/socketserver.py", line 650, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.7/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.7/site-packages/seleniumwire/proxy/proxy2.py", line 65, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.7/socketserver.py", line 720, in __init__
    self.handle()
  File "/usr/local/lib/python3.7/http/server.py", line 428, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.7/http/server.py", line 414, in handle_one_request
    method()
  File "/usr/local/lib/python3.7/site-packages/seleniumwire/proxy/proxy2.py", line 207, in do_GET
    res_body_modified = self.response_handler(req, req_body, res, res_body_plain)
  File "/usr/local/lib/python3.7/site-packages/seleniumwire/proxy/handler.py", line 166, in response_handler
    self.server.storage.save_response(req.id, res, res_body)
  File "/usr/local/lib/python3.7/site-packages/seleniumwire/proxy/storage.py", line 118, in save_response
    self._save(response_data, request_dir, 'response')
  File "/usr/local/lib/python3.7/site-packages/seleniumwire/proxy/storage.py", line 90, in _save
    with open(os.path.join(request_dir, filename), 'wb') as out:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.seleniumwire/storage-d63ae536-e5fe-4932-afba-99b27c1f99e7/request-05ad1193-431f-48e5-83a3-c56e25745f58/response'
----------------------------------------

I often get errors like this, sometimes they stop the show, sometimes they're shown as a result of an actual Selenium side exception. i'm running via a Docker image with chrome driver installed

Can we enable forced caching ?[#feature_request]

As selenium wire is a proxy, it is the best place to enable caching. And from what I understand selenium wire is handling SSL Termination. So might as well add an option to force cache successful requests for a specified period of time.

http proxy in proxy_config not works

I have some questions

if username and password:
auth = '%s:%s' % (username, password)
headers['Proxy-Authorization'] = 'Basic ' + base64.b64encode(auth.encode('latin-1')).decode(
'latin-1')
if proxy_type == 'https':
conn = http.client.HTTPSConnection(hostport, timeout=self.timeout)
conn.set_tunnel(netloc, headers=headers)
else:
conn = http.client.HTTPConnection(hostport, timeout=self.timeout)

Proxy-Authorization headers didn't pass to anything when .the proxy_type is not https?

then I try to add this line conn.set_tunnel(netloc, headers=headers) in the else condition, but it still not works, because

https://github.com/python/cpython/blob/8c349565e8a442e17f1a954d1a9996847749d778/Lib/http/client.py#L894-L897

it looks like the http tunnel method in the http.client lib only send CONNECT to negotiate with the proxy server, maybe we also need tunneling without using CONNECT, so that we can handle http request to the proxy server? (I'm not sure about that, need further discussion)

SyntaxError: invalid syntax

I got error
Traceback (most recent call last):
File "D:/PROJECTS/DEMO/Resources/Extension/test.py", line 1, in
from seleniumwire import webdriver
File "C:\Python27\lib\site-packages\seleniumwire\webdriver_init_.py", line 3, in
from .browser import Chrome, Edge, Firefox, Safari # noqa
File "C:\Python27\lib\site-packages\seleniumwire\webdriver\browser.py", line 56
def init(self, *args, seleniumwire_options=None, **kwargs):
^
SyntaxError: invalid syntax

When I try to run this sample script
from seleniumwire import webdriver

driver = webdriver.Firefox()

driver.get('https://mondial-assistance-teleconsultation-fr-dev.intranet.allianz-assistance.com/#/')

for request in driver.requests:
if request.response:
print(
request.path,
request.response.status_code,
request.response.headers['Content-Type']
)

Could you please suggest on this?
Nipon

Error Response : Error code : 502

Error Response :

Error Code : 502

Message: Bad Gateway

This error message is displayed for few URL when the URL is opened via selenium-wire ( SAFARI ) but if I opened the URL manually in chrome once and after that, I try to run the script the home page of the URL is displayed. Need your help on this

Selenium-wire didn't use socks configuration

Thanks a lot for your amazing job.

I've got one issue with selenium-wire :

I have a firefox profile configuration with socks parameter, It works well with selenium webdriver, but when I use selenium-wire webdriver, he didn't use those parameter

Thanks

profile = FirefoxProfile()
profile.set_preference('permissions.default.image', 2)
profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', 'false')
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', 'localhost')
profile.set_preference('network.proxy.socks_port', 9050)
profile.set_preference('network.proxy.socks_remote_dns', True)
profile.update_preferences()

Gzipped content slow to load

In some cases, pages that use a Content-Encoding of gzip seem to be slower to load when using Selenium Wire. This doesn't seem to affect all gzipped pages but has been noticed on pages being served by nginx using it's gzip module.

Getting Error response while running using seleniumwire

HI I am getting "Error response " while running the code snippet below .-

Code ;
from seleniumwire import webdriver # Import from seleniumwire
driver = webdriver.Chrome()
driver.get('https://www.google.com')
for request in driver.requests:
if request.response:
print(
request.path,
request.response.status_code,
request.response.headers['Content-Type']
)

Exact error faced :

Error response
Error code: 502

Message: Bad Gateway.

Error code explanation: 502 - Invalid responses from another server/proxy.

Any help will be appreciated !!

Error with /crypto/rand/randfile.c in Ubuntu Server

In Ubuntu Server I get the next error with Chrome and Firefox drivers.

The connection with the driver in selenium works correctly.

Is this a bug or do I need to configure something else?

In [5]: driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")

In [6]: Can't load /home/admin_srvgeo/.rnd into RNG
140207141626304:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:88:Filename=/home/admin_srvgeo/.rnd
Can't load /home/admin_srvgeo/.rnd into RNG
139962297303488:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:88:Filename=/home/admin_srvgeo/.rnd

Issue running selenium wire on Safari browser

HI Team,

I am a beginner with the selenium-wire, I wanted to override headers on safari, I did all the configuration as mentioned in the document but after setting proxy as localhost and random port, I am able to run scripts but getting no internet access.

Need your assistance on this.

Regards,
Sukrant

Exclude requests by content type to decrease loading time

Thanks for your work on selenium-wire! However, I'm facing some performance issues. Opening a single webpage sometimes takes +60s and thus results in a timeout. A reason seems to be that selenium-wire processes all non-resources (images, fonts).

My log file contains a lot of "Capturing response" messages. A possible for a speedup would be to exclude all those requests, i.e., to have a filter for unwanted content types. Is this currently possible? Or what would be a starting point to contribute such a feature?

Proxy timeout leads to 502 bad gateway

My server under test got some delay in login. It tooks now over 5sec and I realize automation fails allways after 5sec time. Failure is:
INFO:Capturing request: http://SERVER.UNDER.TEST/login DEBUG:code 502, message Bad Gateway Traceback (most recent call last): File "/home/USER/miniconda3/lib/python3.7/site-packages/seleniumwire/proxy/proxy2.py", line 179, in do_GET res = conn.getresponse() File "/home/USER/miniconda3/lib/python3.7/http/client.py", line 1321, in getresponse response.begin() File "/home/USER/miniconda3/lib/python3.7/http/client.py", line 296, in begin version, status, reason = self._read_status() File "/home/USER/miniconda3/lib/python3.7/http/client.py", line 257, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/home/USER/miniconda3/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) socket.timeout: timed out DEBUG:http://SERVER.UNDER.TEST/login 502

I have seleniumwire==1.0.4 in use

After changing timeout value in https://github.com/wkeeling/selenium-wire/blob/6bb6c61/seleniumwire/proxy/proxy2.py#L38 locally, test goes PASS.

Can that timeout value be given as parameter for proxy? I don't how to do it.

options = {
'proxy': {
'http': 'http://localproxy:xxxx',
'https': http://localproxy:xxxx',
'no_proxy': 'localhost, 127.0.0.1'
}
}

AttributeError: 'WebDriver' object has no attribute 'requests'

Unable to get header information after click?

from seleniumwire import webdriver

driver.get("http://url")
driver.find_element_by_xpath("//div[@id='toolbar']/div[2]/span").click()
for request in driver.requests:
 if request.response:
 print(request.response.headers['Referer'])


for request in driver.requests:
AttributeError: 'WebDriver' object has no attribute 'requests'

Question - webdriver.remote availability

Is the remote version of webdriver available? I can't see it in the documentation.

I normally use selenium like:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

driver = webdriver.Remote("http://selenium:4444/wd/hub", DesiredCapabilities.CHROME)

But I need to manipulate request headers and it isn't available in plain selenium.webdriver

[Feature request] Modify request responses

It may be out of the use case for this project, but I think it'd be very useful to write mitmproxy-style response handlers to inject various things into the webpage before the page load. Here's a simple mitmproxy-style response handler that could be useful. It parses text/html requests that respond with the status code 200 and injects some simple js code.

    def response(self, flow):
        # only process 200 responses of html content
        if flow.response.headers['Content-Type'] != 'text/html':
            return
        if not flow.response.status_code == 200:
            return

        # inject the script tag
        parsed_html = html.fromstring(flow.response.text)
        container = parsed_html.head or parsed_html.body
        if container:
            script = builder.SCRIPT('alert("injected")', type='text/javascript')
            container.insert(0, script)
            flow.response.text = parsed_html

Logging Configuration

First of all, thanks for fixing the signal issue. Allowed me to move forward with a project i was working on.

With that out of the way, and also because i don't want you sitting on your laurels for too long 😉...i got a flavor to ask:

For the love of all that is holy, please allow for some sort of syslog output configuration without having to modify your source. I mean i can get in there and rip it out manually, but i'd really rather not mess with your beautiful work. As it stands, by default, logging output from selenium-wire is...if i had to ballpark...like 95% of my syslog output and it's making me want to stick chopsticks in my eyeballs when i'm trying to troubleshoot stuff.

love you long time. ❤️

edit: spitballing - maybe like a singleton logging class? something you can init with a log level and filter?

from seleniumwire import Logging
log = Logging(level = [None, 'INFO', 'DEBUG'], filter = 'GET')

it's also entirely possible i'm missing something, but i guess my main point is i'd like to be able to configure the logging level of selenium-wire independently of whatever program it may be running under.

driver.header_overrides doesn't set all headers

Test script:

import logging
from pprint import pprint
from seleniumwire import webdriver 

logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")

driver = webdriver.Chrome(
    chrome_options=chrome_options,
    executable_path="/usr/lib/chromium-browser/chromedriver",
)

driver._client.set_header_overrides(
    headers={
        "User-Agent": "Mozilla/5.0 (test)",
        "Accept": "text/html, foo/bar",
        "Accept-Language": "en,de;q=0.7,fr;q=0.3",
        'New-Header1': 'Some Value',
    })

driver.get("https://www.google.com")

for request in driver.requests:
    if request.response:
        print(request.path, request.response.status_code)
        pprint(dict(request.headers))

output from first request:

https://www.google.com/ 200
{'Accept': 'text/html, foo/bar',
 'Accept-Encoding': 'gzip, deflate',
 'Connection': 'keep-alive',
 'Host': 'www.google.com',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (test)'}

So: "Accept" and "User-Agent" are set, but not "Accept-Language" and "New-Header1" ...

Maybe related to chrome usage ?!?

application/octect-stream response is not handled

DEBUG http://seleniumwire/requests 200
DEBUG:seleniumwire.proxy.handler:http://seleniumwire/requests 200
DEBUG http://seleniumwire/response_body?request_id=6f7ba003-9c3c-4b58-a837-73c853488590 200
DEBUG:seleniumwire.proxy.handler:http://seleniumwire/response_body?request_id=6f7ba003-9c3c-4b58-a837-73c853488590 200
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 36218)
Traceback (most recent call last):
  File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 65, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 422, in handle
    self.handle_one_request()
  File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request
    method()
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/handler.py", line 125, in do_GET
    super().do_GET()
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 147, in do_GET
    self.admin_handler()
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/handler.py", line 36, in admin_handler
    self._get_response_body(**params)
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/handler.py", line 72, in _get_response_body
    self._send_body(body)
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/handler.py", line 77, in _send_body
    self._send_response(body, 'application/octet-stream')
  File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/handler.py", line 115, in _send_response
    self.wfile.write(body)
  File "/usr/lib/python3.5/socket.py", line 593, in write
    return self._sock.send(b)
TypeError: a bytes-like object is required, not 'str'

request body is emtpy

I tried to get an AJAX request and in principle it works. I see the request with header and the server's response, but without the data (status code is 200). I'm 100% sure that this Ajax query returns data, but I can't find it.

There are also a lot of errors when loading the website, all of which don't occur with normal selenium.

I'm a bit overwhelmed myself at this point.

My code is relativ easy:

from seleniumwire import webdriver 


path = ""  # path to geckodriver
driver = webdriver.Firefox(executable_path=path)
driver.get('https://www.airbnb.de/rooms/28104490?guests=1&adults=1&check_in=2019-03-13&check_out=2019-03-18')

ajax_requests = [request for request in driver.requests if request.path.startswith("https://www.airbnb.de/api/v2/")]

Signal only works in main thread

Hello again :)

I use selenium-wire, in a Django app.
If I use selenium webdriver, I can start my Django server normaly with "python3 manage.py runserver".
With selenium-wire, I must start the server with this command "python3 manage.py runserver --nothreading --noreload"
The issue seems to be located in proxy.py

/usr/local/lib/python3.7/site-packages/seleniumwire/proxy/storage.py in __init__ signal.signal(signal.SIGTERM, lambda *_: self.cleanup())

Thanks a lot.

"Loading 'screen' into random state - done" is printed multiple without changing website.

"Loading 'screen' into random state - done" is printed multiple without changing website.
Also the Website is displayed as "unsecure", which means the SSL-Certificate is not created properly.

Here is the Code:
options={
'proxy': {
'http': 'http://',
'https': 'https://
',
'no_proxy': 'localhost,127.0.0.1,dev_server:8080'
}
}

driver = webdriver.Chrome(seleniumwire_options=options)
driver.get("https://whatismyipaddress.com/")

I get that this must have to be failure of Opensll, but as in the installation of selenium-wire it is stated that is is included in the package, how can I try to solve it?

Multiple Instances of OpenSSL open up

We have observed that multiple Instances of OpenSSL open up while the site is loading and subsequently we see "OSError: Tunnel Connection failed: 403 Forbidden" error in the logs.

Is this expected behavior of Selenium wire?

Broken Pipe Errors Popup every now and then


Exception happened during processing of request from ('127.0.0.1', 53178)
Traceback (most recent call last):
File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 65, in init
super().init(*args, **kwargs)
File "/usr/lib/python3.5/socketserver.py", line 681, in init
self.handle()
File "/usr/lib/python3.5/http/server.py", line 424, in handle
self.handle_one_request()
File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request
method()
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 224, in do_GET
self.wfile.write(res_body)
File "/usr/lib/python3.5/socket.py", line 593, in write
return self._sock.send(b)
File "/usr/lib/python3.5/ssl.py", line 861, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 586, in write
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
Successfully started content_tester for url
Fatal Python error: could not acquire lock for <_io.BufferedWriter name=''> at interpreter shutdown, possibly due to daemon threads

Thread 0x00007fa214bc8700 (most recent call first):
File "/usr/lib/python3.5/ssl.py", line 575 in read
File "/usr/lib/python3.5/ssl.py", line 791 in read
File "/usr/lib/python3.5/ssl.py", line 929 in recv_into
File "/usr/lib/python3.5/socket.py", line 575 in readinto
File "/usr/lib/python3.5/http/client.py", line 258 in _read_status
File "/usr/lib/python3.5/http/client.py", line 297 in begin
File "/usr/lib/python3.5/http/client.py", line 1197 in getresponse
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 179 in do_GET
File "/usr/lib/python3.5/http/server.py", line 410 in handle_one_request
File "/usr/lib/python3.5/http/server.py", line 424 in handle
File "/usr/lib/python3.5/socketserver.py", line 681 in init
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 65 in init
File "/usr/lib/python3.5/socketserver.py", line 354 in finish_request
File "/usr/lib/python3.5/socketserver.py", line 625 in process_request_thread
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap

Thread 0x00007fa216bcc700 (most recent call first):
File "/usr/lib/python3.5/socketserver.py", line 375 in handle_error
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 49 in handle_error
File "/usr/lib/python3.5/socketserver.py", line 628 in process_request_thread
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap

Current thread 0x00007fa22b150700 (most recent call first):
Aborted (core dumped)

Headless chrome is slower to start and also throws error

Exception happened during processing of request from ('127.0.0.1', 51648)
Traceback (most recent call last):
File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 65, in init
super().init(*args, **kwargs)
File "/usr/lib/python3.5/socketserver.py", line 681, in init
self.handle()
File "/usr/lib/python3.5/http/server.py", line 422, in handle
self.handle_one_request()
File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request
method()
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 76, in do_CONNECT
self.connect_intercept()
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 106, in connect_intercept
with ssl.wrap_socket(self.connection, keyfile=self.certkey, certfile=certpath, server_side=True) as conn:
File "/usr/lib/python3.5/ssl.py", line 1069, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python3.5/ssl.py", line 752, in init
self.do_handshake()
File "/usr/lib/python3.5/ssl.py", line 988, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.5/ssl.py", line 633, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

This happens in headless browser only

Socket of selenium-wire or proxy2 instance can't be killed

Hi,

I'm testing selenium-wire and it seems on OSX the websocket / proxy instance can't be killed when the webdriver is closing.
Seleniumwire_options works fine when it's the first instance, after the Seleniumwire_options configuration is destroy or override by something.
The result is the proxy configuration work only one time.

configuration :

OSX 10.14.5
Python 3.7

Thanks a lot

Proxy config should support None for http and https options

Selenium Wire should support the use of None for the http and https options - e.g.

seleniumwire_options={
    'proxy': {
        'http': None,  # This should not cause an error
        'https': http://x.x.x.x:port,
        'no_proxy': 'localhost,127.0.0.1'
     }
}

Currently setting None for either http or https causes an error.

time to load site

I am getting some really crazy high load times when using selenium wire.

is there anything that can be done to speed things up?

I am getting several timeouts on pages because they are taking 60+ seconds to load versus the usual 3-5 seconds.

Running behind a squid proxy gives Bad Request

I am running a Squid proxy currently to do the Caching, so I need to do ssl termination. I am so using ssl-bump feature. But for some reason curl requests are going through but not Selenium-wire requests. I specified proxy according to the docs

socket.timeout: timed out

Using selenium-wire I'm continuously receiving 502- Bad Gateway socket:timeout errors.

I'm using selenium-wire 1.0.5. I do not receive these errors when using regular selenium module.

My options are as follows:

logging.basicConfig(level=logging.DEBUG)
options = {
'proxy': {
'http': 'http://localproxy:xxxx',
'https': http://localproxy:xxxx',
'no_proxy': 'localhost, 127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
driver.get('http://url..../')

Any suggestions would be greatly appreciated.

Thanks.

driver.header_overrides doesn't work with driver.get()

driver.wait_for_request() doesn't work here, see: #6

So i just use driver.get() but driver.header_overrides doesn't work with that.

In the logging output there is also not the following entry:

DEBUG:http://seleniumwire/header_overrides 200

EDIT: work-a-round, call: driver._client.set_header_overrides(headers={...})

Possible bug in selenium-wire response.body

The problem is that sometimes response.body type is list or dict when expected type is either bytes or None. Usually this occurs when response body is JSON-formatted text. I don’t think this is intended feature and bytes in this situation would be easier to handle.

Here is a minimal example program that reproduces the issue and outputs lines that contains type(request.response.body), request.path and request.response.body. I chose radiopaedia.org as example, but this seems to be issue anywhere where JSON-objects are used for example https://www.nytimes.com.

It seems that JSON-object formatted responses are converted to dict and JSON-array formatted are converted to list.

Few examples from output:

<class 'bytes'>   https://radiopaedia.org/cases/colles-fracture-10   b'<!DOCTYPE html>\n\n<html lang="en-GB">\n<head>\n … 
<class 'list'>   https://radiopaedia.org/studies/44157/stacks?lang=gb   [{'modality': 'X-ray', 'images': [{'id’: …
<class 'dict'>   https://radiopaedia.org/api/v1/countries/current   {'id': 72, 'code': 'FI', 'name': 'Finland', 'newsletter_auto_opt_in': False}

Notice that these print-outputs are from list and dict objects, not from original response bodies. Actual responses are JSON-formatted strings with double quotes instead of single quotes.

Here is an actual response of https://radiopaedia.org/api/v1/countries/current that I got from Google Chrome DevTools Network:

"{"id":72,"code":"FI","name":"Finland","newsletter_auto_opt_in":false}”

Selenium wire not working with mitm proxy

selenium-wire==0.10.0
mitm==4.0.4
Selenium wire is giving 400 (invalid-http-request-form-expected-authority-or-absolute-got-relative) with mitm in upstream mode. To run mitm in upstream proxy mode use this.
mitmproxy -p 3129 --mode upstream:"http://upstream:port"

Error in 0.9 since solving OPTION request issue


Exception happened during processing of request from ('127.0.0.1', 57568)

Traceback (most recent call last):
File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 65, in init
super().init(*args, **kwargs)
File "/usr/lib/python3.5/socketserver.py", line 681, in init
self.handle()
File "/usr/lib/python3.5/http/server.py", line 424, in handle
self.handle_one_request()
File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request
method()
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 207, in do_GET
res_body_modified = self.response_handler(req, req_body, res, res_body_plain)
File "/home/user/.virtualenvs/project/lib/python3.5/site-packages/seleniumwire/proxy/handler.py", line 160, in response_handler
self.server.storage.save_response(req.id, res, res_body)
AttributeError: 'CaptureRequestHandler' object has no attribute 'id'

Proxy does not work

I tried to set up a connection using these instructions:

options = {
'proxy': {
'http': 'http://username:password@host:port',
'https': 'https://username:password@host:port',
'no_proxy': 'localhost,127.0.0.1,dev_server:8080'
}
}
driver = webdriver.Firefox(seleniumwire_options=options, )

My proxy does not need authentication.
At the end I wrote driver=webdriver.Firefox(seleniumwire_options=set_seleniumwire_options,firefox_options=options, firefox_profile=profile)

The end result is that whatever is read through the crawler using the proxy settings gets 500, Request Time Out.

Not sure if I am missing something.

driver.wait_for_request() doesn't work with webdriver.Chrome()

It seems that driver.wait_for_request() doesn't work with webdriver.Chrome()

e.g.:

import logging

from seleniumwire import webdriver  # Import from seleniumwire

logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")

driver = webdriver.Chrome(
    chrome_options=chrome_options,
    executable_path="/usr/lib/chromium-browser/chromedriver",
)

print("Request google.com...", flush=True)
driver.wait_for_request("https://www.google.com", timeout=3)
print("OK")

output is like:

...
DEBUG:http://seleniumwire/find?path=https%3A%2F%2Fwww.google.com 200
DEBUG:http://seleniumwire/find?path=https%3A%2F%2Fwww.google.com 200
DEBUG:http://seleniumwire/find?path=https%3A%2F%2Fwww.google.com 200
DEBUG:http://seleniumwire/find?path=https%3A%2F%2Fwww.google.com 200
...
selenium.common.exceptions.TimeoutException: Message: Timed out after 3s waiting for request https://www.google.com

other chrome options

I am using selenium chrome for a couple of projects and was wondering what the correct format is to add some existing options into selenium wire instead of standard selenium.

currently, i have the following for example.

`options = webdriver.ChromeOptions()

options.add_argument('--no-sandbox')

options.add_argument(
    "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36")

options.add_argument("--disable-default-apps")

options.add_argument("--lang=en,en-US")

PROXY = "111.222.33.44:1234"

options.add_argument('--proxy-server=%s' % PROXY)

chrome_driver = "C:\\chromedriver_win32\\chromedriver.exe"

options.add_argument("user-data-dir=C:\chromeprofiles\Profile 2")

`

I then start the webdriver using

driver = webdriver.Chrome(chrome_options=options, executable_path=chrome_driver)

I changed the chrome_options to be seleniumwire_options and reformatted the proxy settings to follow your example

I am wondering if you support the other arguments above and what the format is used as i am not entirely sure based on the docs.

and a huge thanks for solving my longest running problem!! getting access to the network requests :-)

cache is not cleaned

Hi everybody,
I have an issue about the stored cache.
The software doesn't clean it.
In my logs I have a lot of ConnectionResetError that doesn't crash the application completely but probably could explain why is not cleaning the cache.

This is an example of exception:

Traceback (most recent call last):
  File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/hachreak/projects/scraper/lib/python3.5/site-packages/seleniumwire/proxy/proxy2.py", line 65, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 424, in handle
    self.handle_one_request()
  File "/usr/lib/python3.5/http/server.py", line 390, in handle_one_request
    self.raw_requestline = self.rfile.readline(65537)
  File "/usr/lib/python3.5/socket.py", line 576, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.5/ssl.py", line 937, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 799, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 583, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

I suspect that it's crashing the thread with the proxy that skip the cleaning.
Do you have any suggestion how it's possible to fix this behavior?
Thanks a lot! 😄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.