jansenicus / www-coursera-downloader Goto Github PK

This Jupyter Notebook will help you downloading Coursera videos, subtitles and quizzes (but not answering the quiz). It will automatically download and convert vtt subtitle files into srt. All resources downloaded are numbered according to their sequence.

Jupyter Notebook 100.00%

video coursera-downloader jupyter-notebook python3

www-coursera-downloader's Introduction

FEATURES

download quizzes and practice quizzes.
download all video lectures;
download vtt subtitles;
automatic conversion of vtt to srt format;
read html readings and save to html file;
creation of m3u playlist.

PREVIEW

Click below to preview the Notebook:

REQUIREMENTS:

python3
jupyter notebook

Coursera Downloader

to answer several similar questions in quora:

    Is there a way to mass download the materials from a Coursera course?
    
    How can I download all the video lectures of a coursera course in one go?
    
    Are there any ways to batch download the complete course videos on coursera new platform?
    
    How do I write a Python script that automatically downloads all the videos of the course from Coursera?
    
    Ashish Kedia: How can I write a Script in Python to mass download all course videos from Coursera new platform and name them by lecture title?

Download all videos in all weeks of all lesson in one specified course.

downloading from the old 'http://class.coursera.org' is easy since:
- it is a simple html and can be parsed with html parser;
- all links to the course material is provided in one page url;
- you can use many popular software like 'DownThemAll' to download all the materials you wish to download;
- there are many solutions already provided in github.com for this purpose;
downloading from the new 'https://www.coursera.org' however is harder since:
- it is javascript rendered and must be parsed using a browser engine, meaning: the html elements you want to parse may not be visible until you view it in a browser;
- links to the course materials are spread within many page urls;
- you will get tired of downloading after 144 urls;
this compiled python gives solution to download all videos, subtitles and transcripts in:

    https://www.coursera.org/

usage:

    jupyter notebook

www-coursera-downloader's People

Contributors

Stargazers

Watchers

Forkers

the13620 ashrafn arjunkaruvally anaraquelcosta arpit625 legendvijay martinus1828 jtdeng shubashree maurya97 sai2197 skeptomai todun cocomwana gersongams twhite96 dognjen jefferysac nunb assumeacanopener riskiuniverse nagyistge xcage15 gkhatwani bholagabbar jeyaprabu ssassa18 vickyonit yinghawl joleonar dodermatt sudzz asiam9 phunghx shrvenkataraman trigged kchase99 ankitaggnitt frc-javier abhinavgupta931 ajupujari deenjohn deepnarayan7 akhandmishratruth evaristoc sai-rahul akaanirban totalgood abhishekhp2016 sp0ty-cd srinidhinandakumar thyagoleal nigamankit7 md4ndr3 danielmore ahtealeb gitzine khamphetlab mmayankk kevicao asimmerchant walexkino abhiroyq1 prongs1588 shunr kasireddy1033 savourylie jiajunmao paeltech hcui10 phoducoder demis78 rishabhsingh99 imosudi shixudongleo deveshtarasia gyanesh198 jimxx1995 chieunam junyizhang ayushjain1144 khaled-rahnama manishpatwal alisharifi2000 vuonglequoc chaxiu almoslmi amlapierre pandinosaurus baicalin deyaamohammed polyakovyevgeniy hhy5277 dunhampa yatinadityat sgarcia710 nitesh10126 karagul nick600 prajwalchalla

www-coursera-downloader's Issues

There are 0 courses available

The program shows that I don't have any courses on coursera even though I have many.

splinter.exceptions email

After running "python www-coursera-downloader.pyc" I get this error:

Getting courses list...
Traceback (most recent call last):
File "www-coursera-downloader.py", line 744, in
arrLessonURL, arrLessonTitle = readCSV(strNamaFile)
File "www-coursera-downloader.py", line 733, in main
except:
File "www-coursera-downloader.py", line 192, in getCourses
print "Using Chrome Web Driver...\n"
File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/init.py", line 413, in fill
field = self.find_by_name(name).first
File "/usr/local/lib/python2.7/dist-packages/splinter/element_list.py", line 53, in first
return self[0]
File "/usr/local/lib/python2.7/dist-packages/splinter/element_list.py", line 44, in getitem
self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "email"

Any ideas how to solve the problem? The dependencies should be fine ...
Maybe someone had had the same issue and can give me a hint.
Thanx in advance

Specialization Course Videos Are Not Listed

Would you be able to add the ability to list Specialization courses to the list of courses?

The coursera page that PhantonJS scrapes lists "My Specializations" before "My Courses" - but essentially I think they're the same thing? The downloader doe not however list any of the courses listed under "My Sepcializations" :(

Great idea here though - I hope it goes well.

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.firefox.service.Service object at 0x105985510>> ignored

So after installing splinter and pycrypto, the script finally ran but only till the point of creating the coursera.pass file. After that, instead of giving the list of courses, the aforementioned errors pop up.

User and password has been saved to coursera.pass file.
Please delete the file if you want to change your credentials.

Getting courses list...

You have not properly installed or configured PhantomJS!
You will see an automated browser popping up and crawling,
which you will not see if you have properly installed or configured PhantomJS.
Do not close that automated browser...

Press any key to continue...

Traceback (most recent call last):
File "www-coursera-downloader.py", line 744, in
>arrLessonURL, arrLessonTitle = readCSV(strNamaFile)
File "www-coursera-downloader.py", line 733, in main
except:
File "www-coursera-downloader.py", line 185, in getCourses
try:
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/splinter/browser.py", line 63, in Browser
>return driver(*args, **kwargs)
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/splinter/driver/webdriver/firefox.py", line 48, in init
timeout=timeout)
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/selenium/webdriver/firefox/webdriver.py", line 140, in init
self.service.start()
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/selenium/webdriver/common/service.py", line 81, in start
>os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x105908250>> ignored
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.chrome.service.Service object at 0x105985210>> ignored
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.firefox.service.Service object at 0x105985510>> ignored

Not able to download videos

In Download Only Videos
Ln[53]

Lecture 1 2.1 Critical Sections
/learn/concurrent-programming-in-java/lecture/YrqDJ/2-1-critical-sections

Reading2.1 Lecture 2  Summary
/learn/concurrent-programming-in-java/supplement/gaQ9x/2-1-lecture-summary

Lecture 3 2.2 Object Based Isolation (Monitors)
/learn/concurrent-programming-in-java/lecture/djUwe/2-2-object-based-isolation-monitors

Reading2.2 Lecture 4  Summary
/learn/concurrent-programming-in-java/supplement/PEpS3/2-2-lecture-summary

Lecture 5 2.3 Concurrent Spanning Tree Algorithm
/learn/concurrent-programming-in-java/lecture/ZUsiv/2-3-concurrent-spanning-tree-algorithm

Reading2.3 Lecture 6  Summary
/learn/concurrent-programming-in-java/supplement/4VxYN/2-3-lecture-summary

Lecture 7 2.4 Atomic Variables
/learn/concurrent-programming-in-java/lecture/zDzxX/2-4-atomic-variables

Reading2.4 Lecture 8  Summary
/learn/concurrent-programming-in-java/supplement/k5eW4/2-4-lecture-summary

Lecture 9 2.5 Read, Write Isolation
/learn/concurrent-programming-in-java/lecture/GOfdF/2-5-read-write-isolation

Reading2.5 Lecture 10  Summary
/learn/concurrent-programming-in-java/supplement/3fmKA/2-5-lecture-summary

Lecture 2 Demonstration: Global and Object-Based Isolation
/learn/concurrent-programming-in-java/lecture/vMHcW/demonstration-global-and-object-based-isolation

currently downloading: Week-1-Lecture_1_2.1_Critical_Sections.mp4

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getitem__(self, index)
     39         try:
---> 40             return super(ElementList, self).__getitem__(index)
     41         except IndexError:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

ElementDoesNotExist                       Traceback (most recent call last)
/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getattr__(self, name)
     71         try:
---> 72             return getattr(self.first, name)
     73         except (ElementDoesNotExist, AttributeError):

/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in first(self)
     52         """
---> 53         return self[0]
     54 

/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getitem__(self, index)
     43                 u'no elements could be found with {0} "{1}"'.format(
---> 44                     self.find_by, self.query))
     45 

ElementDoesNotExist: no elements could be found with tag_name "video"

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-53-c51a7e7b1a4a> in <module>()
----> 1 get_lectures()

<ipython-input-39-8c7fa7d3c9bf> in get_lectures()
     43         lessons = list(zip(lessons_urls, lessons_titles))
     44 
---> 45         download_week_lessons(lessons, i, browser)
     46 
     47         print()

<ipython-input-40-739964be6c5d> in download_week_lessons(lessons, i, browser)
     10         time.sleep(loading_time)
     11         screenshot()
---> 12         mp4 = browser.find_by_tag('video').find_by_tag('source')['src']
     13         mp4 = mp4.replace('360p/',resolution[chosen_res]+'p/')
     14         print('currently downloading: '+ filename)

/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getattr__(self, name)
     73         except (ElementDoesNotExist, AttributeError):
     74             raise AttributeError(u"'{0}' object has no attribute '{1}'".format(
---> 75                 self.__class__.__name__, name))

AttributeError: 'ElementList' object has no attribute 'find_by_tag'

Coursera uses localization and change language

Hi!

Thank you for creating this.

I am located at Spain, and Coursera seems to detect user IP localization so for instance, instead of "Log In" uses "Iniciar Sesión" to get credentials. Or "Mis Cursos" instead of "My Courses"

I have change those sentences so the script works for me. I wonder if there is any possibility to configure phantomjs to force Coursera to use English.

RuntimeError: Bad magic number in .pyc file

I am new to python.
I have installed Anaconda and phantomjs packages. then, I navigate to the repository using Anaconda prompt.
After running python www-coursera-downloader.pyc I got the following error
RuntimeError: Bad magic number in .pyc file

what should I do??

Plaintext password

Hey,
the "enter your coursera password" dialog takes plaintext passwords.
Can you mask the input?

Can not download course list over htttp proxy.

When I try to run the script and enter my credentials, it fails in fetching my course list. I am suspecting it can't fetch data over http proxy although I am not sure.

Unable to download course contents--Gets a ConnectionReset Error

When I am trying to download the courses, I am getting a connection reset message .


for e in chosen_courses:
    
    course_title = courses_t[int(e)-1]
    print(e + ' '+course_title)
    lecture_homepage = homepage + courses_u[int(e)-1]
    os.chdir(initial_dirname)
    create_download_dir(courses_t[ int(e)- 1])
        
    a,b,c = enumerate_lessons(lecture_homepage)
    lessons = zip(a,b,c)
    
    download_videos(lessons)
    #create_m3u_playlist(course_title)
    
    download_quiz(lessons)
    download_html(lessons)
    
    os.chdir(initial_dirname)
    print('download finished')

Executing the above cell gives me this stacktrace and a strange ConnectionReset error.
Is it implemented by Coursera?

StackTrace:

2 Machine Learning: Regression
/Users/gparasha/Downloads/www-coursera-downloader-master/Machine Learning Regression/Machine Learning Regression

---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)
<ipython-input-85-ce301f5ab8ea> in <module>()
      7     create_download_dir(courses_t[ int(e)- 1])
      8 
----> 9     a,b,c = enumerate_lessons(lecture_homepage)
     10     lessons = zip(a,b,c)
     11 

<ipython-input-73-90d09d2102ab> in enumerate_lessons(lecture_homepage)
      1 def enumerate_lessons(lecture_homepage):
      2 
----> 3     weeks = enumerate_weeks(lecture_homepage)
      4     w_digit = len(str(len(weeks)))
      5 

<ipython-input-72-94389983aa33> in enumerate_weeks(lecture_homepage)
      5     weeks = []
      6 
----> 7     if browser.find_by_text('Preview Week 1'):
      8 
      9         week = lecture_homepage.replace('welcome','week/')

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/splinter/driver/webdriver/__init__.py in find_by_text(self, text)
    406     def find_by_text(self, text):
    407         return self.find_by_xpath('//*[text()="%s"]' % text,
--> 408                                   original_find='text', original_query=text)
    409 
    410     def find_by_id(self, id):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/splinter/driver/webdriver/__init__.py in find_by_xpath(self, xpath, original_find, original_query)
    393         return self.find_by(
    394             self.driver.find_elements_by_xpath, xpath, original_find=original_find,
--> 395             original_query=original_query)
    396 
    397     def find_by_name(self, name):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/splinter/driver/webdriver/__init__.py in find_by(self, finder, selector, original_find, original_query)
    371         while time.time() < end_time:
    372             try:
--> 373                 elements = finder(selector)
    374                 if not isinstance(elements, list):
    375                     elements = [elements]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in find_elements_by_xpath(self, xpath)
    407             elements = driver.find_elements_by_xpath("//div[contains(@class, 'foo')]")
    408         """
--> 409         return self.find_elements(by=By.XPATH, value=xpath)
    410 
    411     def find_element_by_link_text(self, link_text):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in find_elements(self, by, value)
    993         return self.execute(Command.FIND_ELEMENTS, {
    994             'using': by,
--> 995             'value': value})['value'] or []
    996 
    997     @property

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
    316 
    317         params = self._wrap_value(params)
--> 318         response = self.command_executor.execute(driver_command, params)
    319         if response:
    320             self.error_handler.check_response(response)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in execute(self, command, params)
    470         data = utils.dump_json(params)
    471         url = '%s%s' % (self._url, path)
--> 472         return self._request(command_info[0], url, body=data)
    473 
    474     def _request(self, method, url, body=None):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in _request(self, method, url, body)
    494             try:
    495                 self._conn.request(method, parsed_url.path, body, headers)
--> 496                 resp = self._conn.getresponse()
    497             except (httplib.HTTPException, socket.error):
    498                 self._conn.close()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in getresponse(self)
   1329         try:
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:
   1333                 self.close()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in begin(self)
    295         # read until we get a non-100 response
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:
    299                 break

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in _read_status(self)
    256 
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:
    260             raise LineTooLong("status line")

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py in readinto(self, b)
    584         while True:
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:
    588                 self._timeout_occurred = True

ConnectionResetError: [Errno 54] Connection reset by peer

consider adding a requirements.txt

It will make it very convenient for users to install the dependencies.

how the hell am I supposed to use this shit

0 lesson available

Hi first of all thanks for your work, I'm here to ask you if you may know why I can't visualize the lesson: it say 0 lesson available, and I don't know how to fix this. I attach a screenshot. Thank you again for your time.

Encrypting the user credentials the way you do is completely useless!

Hi,

First of all, thank you for sharing your program. I'm creating this issue to let you (and all the other people who still stumble on your repo and try to run the code even though it has not been maintained) know that the way the user credentials are "securely encrypted" is completely wrong!

Encrypting anything is useless as long as you let anyone know which key you used, which is exactly what you did by hard-coding the keys in the Python source file. No matter if you use 2 keys or 10000. The moment you store an encryption key alongside the encrypted file, there is no secret anymore.

So, I suggest that either:
a) You don't encrypt the credentials at all, thus encouraging the users to delete the configuration file once they are done using your program a coupe of times. That is, you forget about encryption at all instead of letting the users think you do it right. This is not safer from what you do now, but at least the users know what to expect.
b) You never store anything and instead prompt the users for their credentials each time. Entering a password at runtime is the only safe way that I know of.

I hope this will help you not make that mistake ever again.

Getting an Error Message

I'm getting this error msg anytime I run the file.

Alternative script as this one is abandoned

https://github.com/coursera-dl/coursera-dl

hanging on loading courses page...

great work but there is a problem !

Welcome to Coursera!
loading courses page...947 seconds

RuntimeError: Bad magic number in .pyc file

When I try to run I get this error

Apparently I cannot login

Using WWW-COURSERA-DOWNLOADER.ipynb, I apparently cannot login.
At cell after fter button_click('Log in') I get

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/splinter/element_list.py in __getitem__(self, index)
     41         try:
---> 42             return self._container[index]
     43         except IndexError:

IndexError: list index out of range

so I inserted

browser.visit('https://www.coursera.org/courses')
screenshot()

and the image shown

(cropped the bottom, irrelevant) is as if I were not logged in. I checked the username and password variables and they are ok.

What are possible causes and solutions?

Cousera log in issue

I have followed all the steps and went perfect; however, I have issue to verify once it logged in and didn't pass the reCAPTCHA.

Is it working in new version?

The command line gave me only "bad magic number" when I want to instal .рус file

ImportError: No module named splinter

That's the error message i got and i have downloaded the splinter.

I know it directory issue, can anyone guide me?

Specialization Course Videos re not listed

Unable to launch headless browser

While executing this cell
browser = Browser('firefox', headless = True)

I am getting this error.
SessionNotCreatedException: Message: Unable to find a matching set of capabilities

I have firefox quantum 61.0.1 (64-bit) installed, and selenium version is 3.11.0

gparasha-macOS:Downloads gparasha$ selenium-server --version
Selenium server version: 3.11.0, revision: e59cfb3

And geckodriver version is geckodriver 0.21.0

Can you tell why I am getting this issue and how to get rid of this

lessons_i is empty

did not download any file. Goes till the course page but does not download anything after that

Cant download videos of good quality

Looks like I can download videos from this script in low quality only. It would be very helpful if you could find a solution for it , in original website firstly putting the video at high quality and then right clicking on video and saving as gives high quality download. See if you could do this.

Doesn't work in Python 3

Thanks for wasting my time. There is not a single mention anywhere about it supporting only Python2. The whole world has moved on, and so should you.

Selenium Issue

I am getting following error while downloading courses:
Getting courses list...
Traceback (most recent call last):
File "www-coursera-downloader.py", line 751, in
main()
File "www-coursera-downloader.py", line 740, in main
fullCourseName = getCourses(email, password)
File "www-coursera-downloader.py", line 202, in getCourses
browser.fill('email', email)
File "/home/metal-machine/www-coursera-downloader/mypy/local/lib/python2.7/site-packages/splinter/driver/webdriver/init.py", line 415, in fill
field = self.find_by_name(name).first
File "/home/metal-machine/www-coursera-downloader/mypy/local/lib/python2.7/site-packages/splinter/element_list.py", line 53, in first
return self[0]
File "/home/metal-machine/www-coursera-downloader/mypy/local/lib/python2.7/site-packages/splinter/element_list.py", line 44, in getitem
self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "email"

Error Every Single time I run the File

User and password have been saved to coursera.pass file.
Please delete the file if you want to change your credentials.

Traceback (most recent call last):
File "www-coursera-downloader.py", line 748, in
main()
File "www-coursera-downloader.py", line 734, in main
email, password = getUserPass("coursera.pass")
File "www-coursera-downloader.py", line 154, in getUserPass
decryptedText = decrypt(strText, key1, key2)
File "www-coursera-downloader.py", line 116, in decrypt
decryptedText = obj2.decrypt(strInput)
File "C:\Users\Jash\Miniconda2\lib\site-packages\Crypto\Cipher\blockalgo.py",
line 295, in decrypt
return self._cipher.decrypt(ciphertext)
ValueError: Input strings must be a multiple of 16 in length

Input strings must be a multiple of 16 in length

it produces the following traceback
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py', wdir='C:/Users/Yatish H R/Desktop/www-coursera-downloader-master')

File "C:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "C:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 751, in
main()

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 737, in main
email, password = getUserPass("coursera.pass")

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 157, in getUserPass
decryptedText = decrypt(strText, key1, key2)

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 119, in decrypt
decryptedText = obj2.decrypt(strInput)

File "C:\anaconda\lib\site-packages\Crypto\Cipher\blockalgo.py", line 295, in decrypt
return self._cipher.decrypt(ciphertext)

ValueError: Input strings must be a multiple of 16 in length