I am working on a white hat side project, my intention is to scrape behind a login screen of my own data so that I can plot it :)
I am able to run the code on local env (mac os/ details below) it logins in and able to go to the desired behind login page.
However when promoted to remote linux server (ubuntu/ details below) it fails to login and is rerouted back to login page.
At first I thought it was ip/ dns registering as blacklisted but then I ran both behind a nordvpn (server: us5793) and was still getting the same result: (Works on local not on Remote)
This is the same result for local env and remote env
IP Location |
Chicago, Illinois (US) |
NordVPN |
|
64.44.80.68, 198.143.57.3 |
|
Mac OS X |
|
Chrome 83.0.4103.97 |
|
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 |
|
1920px X 1080px |
|
Enabled |
|
Enabled |
|
The expected result is that the function below returns success in finding the "mytrips" text within the html. This indicates the login was a success.
My Speculation Is one of two things,
1 the chromedriver binary responds differently to the cdc
replacement you do in your code acts differently in my linux env
2 the way javascript is reinjected back into the code isn't correctly working in linux.
Other found resources:
How to inject JS and beat detection
Many Tests for bot indication
general chrome headless My code passes this for both environments
I'm going to continue hacking away at this thing and would love to help develop a solution for this and other things moving forward :) , Ideally would love to have the equivalent of the networking tab in inspect to debug these things.
Local MacOS (success) -- Login Success
sys.platform: darwin
sysname: Darwin
version: Darwin Kernel Version 19.3.0: Thu Jan 9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64
release: 19.3.0
machine: x86_64
selenium : 3.141.0
Tried this in python3.6 & 3.8. No luck on either.
Remote Linux(fail) -- Login Fail -- Shouldn't matter with vpn, but this lives in AWS Ec2
sys.platform: linux
sysname: Linux
version: #21~18.04.1-Ubuntu SMP Mon May 11 12:33:03 UTC 2020
release: 5.3.0-1019-aws
machine: x86_64
selenium : 3.141.0
achieved running behind nordvpn with a shell script
#!/bin/bash
echo "Executing Nord VPN"
nordvpn connect us5793
echo "Executing Python"
python3.8 /home/ubuntu/test.py
echo "Disconnecting VPN"
nordvpn disconnect
**Created a fake account for you to test on as well **
import os
import sys
print(f""" \n
sys.platform: {sys.platform}
sysname: {os.uname().sysname}
version: {os.uname().version}
release: {os.uname().release}
machine: {os.uname().machine}
\n
""")
import undetected_chromedriver as uc
uc.install() #important this is first
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
from time import sleep
class BotDriver:
def __init__(self,username, pw, start_url, url_behind_login, headless_input = True):
self.username = username
self.pw = pw
chrome_options = ChromeOptions()
chrome_options.headless = headless_input
chrome_options.add_argument("--incognito")
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument("--start-maximized")
self.driver = Chrome(chrome_options=chrome_options)
self.start_url = start_url
self.url_behind_login = url_behind_login
self.driver.get('https://www.iplocation.net/')
self.driver.get_screenshot_as_file(f"iplocation.png")
self.driver.get(start_url)
self.waitdriver = WebDriverWait( self.driver, 10)
def get_element(self,findby,argument_to_click):
element = self.waitdriver.until(EC.element_to_be_clickable((findby, argument_to_click)))
return element
def slow_keys(self,input_keys,element,speed=.2):
for character in input_keys:
sleep(speed)
element.send_keys(character)
sleep(1)
def main(self):
element0 = self.get_element( By.LINK_TEXT,"Sign In or Join" )
element0.click()
element1 = self.get_element( By.XPATH,'//*[@id="user-id"]' )
element1.click()
self.slow_keys(self.username,element1)
element2 = self.get_element( By.XPATH,'//*[@id="password"]' )
element2.click()
self.slow_keys(self.pw,element2)
self.driver.get_screenshot_as_file(f"before_submit.png")
element3 = self.get_element( By.XPATH,"//button[@name='submitButton']" )
element3.click()
self.driver.get_screenshot_as_file(f"after_submit.png")
sleep(3)
#test string to find
soup = BeautifulSoup(self.driver.page_source, 'lxml')
test = soup.body.findAll(text='My Trips')
if len(test) > 1:
print(f'\n\n\n Login Success ({test} len {len(test)})\n\n\n')
else:
print(f'\n\n\n Login failed ({test} len {len(test)})\n\n\n')
self.driver.get(self.url_behind_login)
self.driver.get_screenshot_as_file(f"last.png")
if __name__ == "__main__":
username = input('Enter your login email: ')
pw = input('Enter PW: ')
start_url = 'https://www.marriott.com/default.mi'
url_behind_login = 'https://www.marriott.com/loyalty/findReservationList.mi'
pbd = BotDriver(username, pw, start_url, url_behind_login, headless_input = True)
pbd.main()