Code Monkey home page Code Monkey logo

udemy_bot's People

Contributors

dimakiss avatar espadaser avatar myururdurmaz avatar smoke-pgf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

udemy_bot's Issues

Cloudflare prevents complete execution

When I run the bot it has been unable to get past cloudflare protection. I increased the sleep timeout to manually try the captchas but it seems that I'm given an endless loop of more captchas on the udemy site via cloudflare when using selenium.

I tried using undetected_chromedriver as a replacement to chromedriver but I've been experiencing an error with it at line 207:

https://github.com/dimakiss/Udemy_bot/blob/main/Udemy_bot.py#L207
elif is_account_exist(sys.argv[1], sys.argv[2]):

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 88 Current browser version is 87.0.4280.141 with binary path /Applications/Google Chrome.app/Contents/MacOS/Google Chrome

 

A couple of other thing's I've tried are manually specifying the chrome driver version / binary for undetected_chromedriver:

import undetected_chromedriver as uc
uc.TARGET_VERSION = 87
uc.install(
    executable_path='/usr/local/bin/chromedriver',
)

but it still gives the error above. There isn't currently a chrome 88 available when I check for updates.

Error when validating mail (id="email--1") and error with selenium( DevToolsActivePort file doesn't exist)

As mentioned I tried to run the bot in a docker container, which is equal on running it in a linux environment.
I will avoid to tell you the hell I had to do to make this bot work. I Cried long hours and pulled my hair in every way. Believe me.
In the end, I managed to do it but I got this:

Checking if the email and password are correct
Traceback (most recent call last):
File "/etc/udemybot/Udemy_bot.py", line 203, in
elif is_account_exist(sys.argv[1], sys.argv[2]):
File "/etc/udemybot/Udemy_bot.py", line 182, in is_account_exist
browser = webdriver.Chrome(options=options)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in init
RemoteWebDriver.init(
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

The only way I found to workaround this is to add in the code

print("Checking if the email and password are correct")
 options = Options()
 options.add_argument("--no-sandbox")
 options.add_argument("--disable-dev-shm-usage")
 options.add_argument("--incognito")
 options.add_argument("--headless")

Have you got a better solution?

Then, after fixing this with a clunky workaround, I get this:

Checking if the email and password are correct
Traceback (most recent call last):
File "/etc/udemybot/Udemy_bot.py", line 203, in
elif is_account_exist(sys.argv[1], sys.argv[2]):
File "/etc/udemybot/Udemy_bot.py", line 186, in is_account_exist
browser.find_element_by_id("email--1").send_keys(email)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 360, in find_element_by_id
return self.find_element(by=By.ID, value=id_)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="email--1"]"}
(Session info: headless chrome=88.0.4324.150)

What can it be? I know it's the html parser that is missing something but debugging in the container it's hard.
You have any idea of what's going on?

EDIT: I printed the html with
print(browser.page_source)

And what you get is this:

<div aria-hidden="true" style="background-color: rgb(255, 255, 255); border: 1px solid rgb(208, 208, 208); box-shadow: rgba(0, 0, 0, 0.1) 0px 0px 4px; border-radius: 4px; left: -10000px; top: -10000px; z-index: -2147483648; position: absolute; transition: opacity 0.15s ease-out 0s; opacity: 0; visibility: hidden;"><div style="position: relative;"><iframe src="https://assets.hcaptcha.com/captcha/v1/40446ab/static/hcaptcha-challenge.html#id=0lcxfl97hd5&amp;host=www.udemy.com&amp;sentry=true&amp;reportapi=https%3A%2F%2Faccounts.hcaptcha.com&amp;recaptchacompat=off&amp;sitekey=33f96e6a-38cd-421b-bb68-7806e1764460" title="Main content of the hCaptcha challenge" frameborder="0" scrolling="no" style="border: 0px; z-index: 2000000000; position: relative;"></iframe></div><div style="width: 100%; height: 100%; position: fixed; pointer-events: none; top: 0px; left: 0px; z-index: 0; background-color: rgb(255, 255, 255); opacity: 0.05;"></div><div style="border-width: 11px; position: absolute; pointer-events: none; margin-top: -11px; z-index: 1; right: 100%;"><div style="border-width: 10px; border-style: solid; border-color: transparent rgb(255, 255, 255) transparent transparent; position: relative; top: 10px; z-index: 1;"></div><div style="border-width: 11px; border-style: solid; border-color: transparent rgb(208, 208, 208) transparent transparent; position: relative; top: -11px; z-index: 0;"></div></div></div></body></html>

Which looks like some bot protection measure from https://www.botstop.com/?utm_source=hcaptcha1 or something like that

Any idea?

Adding feature

Adding the option of reading from a previous urls text file and make sure that urls are not repeating.

All categories enabled by default, README.md states otherwise

I believe courses from all categories are being added by default, eg.

https://www.udemy.com/course/total-beginner-guitar-lessons/ is added

README.md contains
The current default categories are IT and Software and Development

The config area of udemy_bot.py contains

### CONFIG ###

categories_list = [
    'business',
    'design',
    'development',
    'finance-and-accounting',
    'health-and-fitness',
    'it-and-software',
    'lifestyle',
    'marketing',
    'music',
    'office-productivity',
    'personal-development',
    'photography',
    'photography-and-video',
    'teaching-and-academics'
]
#Personal preference for example
#categories_list=[
#    'development',
#    'it-and-software'
#]
rating_stars = 4.2
rating_people = 200

#### END OF CONFIG ###

Cheers - Scott.

Credentials not correct but they are. Login fails

#10
In reference to the above, I successfully resolved the bot error, but now I'm facing a login problem.
I receive the "There was a problem logging in. Check your email and password or create an account." error.

Mail and password are correct. I copied and pasted them on udemy and I logged in successfully

Debugging a bit, I see that in function "is_account_exist" in udemy_bot.py that
is_exist = temp_url == browser.current_url

gives browser.current_url is not defined

Which probably is the culprit of the whole thing.
Any idea of why it doesn't work???

EDIT: Further investigation (printing the elements found) brings to this;

Checking if the email and password are correct
browser_email: input name="email" required="" maxlength="64" minlength="7" placeholder="Email" data-purpose="email" type="email" id="email--1" class="form-control" value="">ù
browser_password: <ùinput type="password" name="password" required="" placeholder="Password" class="textinput textInput form-control" maxlength="64" data-purpose="password" id="id_password" minlength="6">ù
current_url: https://www.udemy.com/join/login-popup/
browser_submit input type="submit" name="submit" value="Log In" class="btn btn-primary " id="submit-id-submit" data-purpose="do-login
is_exist True
There was a problem logging in. Check your email and password or create an account.

So it's all ok, but nonetheless temp_url and browser.current_url remains equal and it won't work

EDIT:
Out of desperation, I tried again to solve something but this damn bot is a hatch of bugs and won't work in any way.
We've already see that now the bot is catching the html elements correctly.
So, what I did is import these libraries:

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys

Then, I tried to call the submit in 2 ways:

The original way in which the bot calls the submit
browser.find_element_by_id("submit-id-submit").click()

and this one
browser.find_element_by_id("submit-id-submit").send_keys(Keys.ENTER)

then, at the moment of checking if the browser.current_url is changed I tried the selenium webdriverwait, which simulates a real chrome waiting process instead of
sleep(2)
which is a workaround

so I did

try:
        print('browser_submit' + browser.find_element_by_id("submit-id-submit").get_attribute('outerHTML'))
        browser.find_element_by_id("submit-id-submit").send_keys(Keys.ENTER)
    except NoSuchElementException:
        print("No element found submit")
    print('Waiting max 30 seconds for url change')
    wait = WebDriverWait(browser, 30)
    try:
        wait.until(lambda driver: browser.current_url != temp_url)
    except TimeoutException:
        print('url did not change in 30 seconds')

But even in 30 seconds, nothing happened and I fall in the "TimeoutException".

Now, why it doesn't changes url? That's causing the problem?

I tried to skip is_exists function and I got bot messages about scraping X potential courses, and that it added them to my account, but nothing was added in reality.

Please help!

Selenium package gives error installing with pip3

RUN pip3 install -r requirements.txt
---> Running in 88829269d1f6
Collecting bs4==0.0.1
Downloading bs4-0.0.1.tar.gz (1.1 kB)
Collecting requests==2.23.0
Downloading requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting lxml>=4.6.2
Downloading lxml-4.6.2-cp39-cp39-manylinux1_x86_64.whl (5.4 MB)
ERROR: Could not find a version that satisfies the requirement selenium==1.25.9
ERROR: No matching distribution found for selenium==1.25.9

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.