Code Monkey home page Code Monkey logo

www-coursera-downloader's Introduction

FEATURES

  • download quizzes and practice quizzes.
  • download all video lectures;
  • download vtt subtitles;
  • automatic conversion of vtt to srt format;
  • read html readings and save to html file;
  • creation of m3u playlist.

PREVIEW

Click below to preview the Notebook:

REQUIREMENTS:

python3
jupyter notebook

Coursera Downloader

    Is there a way to mass download the materials from a Coursera course?
    
    How can I download all the video lectures of a coursera course in one go?
    
    Are there any ways to batch download the complete course videos on coursera new platform?
    
    How do I write a Python script that automatically downloads all the videos of the course from Coursera?
    
    Ashish Kedia: How can I write a Script in Python to mass download all course videos from Coursera new platform and name them by lecture title?
    

Download all videos in all weeks of all lesson in one specified course.

  • downloading from the old 'http://class.coursera.org' is easy since:

    • it is a simple html and can be parsed with html parser;
    • all links to the course material is provided in one page url;
    • you can use many popular software like 'DownThemAll' to download all the materials you wish to download;
    • there are many solutions already provided in github.com for this purpose;
  • downloading from the new 'https://www.coursera.org' however is harder since:

    • it is javascript rendered and must be parsed using a browser engine, meaning: the html elements you want to parse may not be visible until you view it in a browser;
    • links to the course materials are spread within many page urls;
    • you will get tired of downloading after 144 urls;
  • this compiled python gives solution to download all videos, subtitles and transcripts in:

    https://www.coursera.org/
  • usage:
    jupyter notebook

www-coursera-downloader's People

Contributors

anaraquelcosta avatar jansenicus avatar the13620 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

www-coursera-downloader's Issues

splinter.exceptions email

After running "python www-coursera-downloader.pyc" I get this error:

Getting courses list...
Traceback (most recent call last):
File "www-coursera-downloader.py", line 744, in
arrLessonURL, arrLessonTitle = readCSV(strNamaFile)
File "www-coursera-downloader.py", line 733, in main
except:
File "www-coursera-downloader.py", line 192, in getCourses
print "Using Chrome Web Driver...\n"
File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/init.py", line 413, in fill
field = self.find_by_name(name).first
File "/usr/local/lib/python2.7/dist-packages/splinter/element_list.py", line 53, in first
return self[0]
File "/usr/local/lib/python2.7/dist-packages/splinter/element_list.py", line 44, in getitem
self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "email"

Any ideas how to solve the problem? The dependencies should be fine ...
Maybe someone had had the same issue and can give me a hint.
Thanx in advance

Specialization Course Videos Are Not Listed

Hi

Would you be able to add the ability to list Specialization courses to the list of courses?

The coursera page that PhantonJS scrapes lists "My Specializations" before "My Courses" - but essentially I think they're the same thing? The downloader doe not however list any of the courses listed under "My Sepcializations" :(

Great idea here though - I hope it goes well.

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x105985510>> ignored

So after installing splinter and pycrypto, the script finally ran but only till the point of creating the coursera.pass file. After that, instead of giving the list of courses, the aforementioned errors pop up.

User and password has been saved to coursera.pass file.
Please delete the file if you want to change your credentials.

Getting courses list...

You have not properly installed or configured PhantomJS!
You will see an automated browser popping up and crawling,
which you will not see if you have properly installed or configured PhantomJS.
Do not close that automated browser...

Press any key to continue...

Traceback (most recent call last):
File "www-coursera-downloader.py", line 744, in
>arrLessonURL, arrLessonTitle = readCSV(strNamaFile)
File "www-coursera-downloader.py", line 733, in main
except:
File "www-coursera-downloader.py", line 185, in getCourses
try:
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/splinter/browser.py", line 63, in Browser
>return driver(*args, **kwargs)
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/splinter/driver/webdriver/firefox.py", line 48, in init
timeout=timeout)
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/selenium/webdriver/firefox/webdriver.py", line 140, in init
self.service.start()
File "/Users/mohanasingh/Library/Python/2.7/lib/python/site-packages/selenium/webdriver/common/service.py", line 81, in start
>os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x105908250>> ignored
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.chrome.service.Service object at 0x105985210>> ignored
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.del of <selenium.webdriver.firefox.service.Service object at 0x105985510>> ignored

Not able to download videos

In Download Only Videos
Ln[53]

Lecture 1 2.1 Critical Sections
/learn/concurrent-programming-in-java/lecture/YrqDJ/2-1-critical-sections

Reading2.1 Lecture 2  Summary
/learn/concurrent-programming-in-java/supplement/gaQ9x/2-1-lecture-summary

Lecture 3 2.2 Object Based Isolation (Monitors)
/learn/concurrent-programming-in-java/lecture/djUwe/2-2-object-based-isolation-monitors

Reading2.2 Lecture 4  Summary
/learn/concurrent-programming-in-java/supplement/PEpS3/2-2-lecture-summary

Lecture 5 2.3 Concurrent Spanning Tree Algorithm
/learn/concurrent-programming-in-java/lecture/ZUsiv/2-3-concurrent-spanning-tree-algorithm

Reading2.3 Lecture 6  Summary
/learn/concurrent-programming-in-java/supplement/4VxYN/2-3-lecture-summary

Lecture 7 2.4 Atomic Variables
/learn/concurrent-programming-in-java/lecture/zDzxX/2-4-atomic-variables

Reading2.4 Lecture 8  Summary
/learn/concurrent-programming-in-java/supplement/k5eW4/2-4-lecture-summary

Lecture 9 2.5 Read, Write Isolation
/learn/concurrent-programming-in-java/lecture/GOfdF/2-5-read-write-isolation

Reading2.5 Lecture 10  Summary
/learn/concurrent-programming-in-java/supplement/3fmKA/2-5-lecture-summary

Lecture 2 Demonstration: Global and Object-Based Isolation
/learn/concurrent-programming-in-java/lecture/vMHcW/demonstration-global-and-object-based-isolation

currently downloading: Week-1-Lecture_1_2.1_Critical_Sections.mp4

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getitem__(self, index)
     39         try:
---> 40             return super(ElementList, self).__getitem__(index)
     41         except IndexError:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

ElementDoesNotExist                       Traceback (most recent call last)
/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getattr__(self, name)
     71         try:
---> 72             return getattr(self.first, name)
     73         except (ElementDoesNotExist, AttributeError):

/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in first(self)
     52         """
---> 53         return self[0]
     54 

/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getitem__(self, index)
     43                 u'no elements could be found with {0} "{1}"'.format(
---> 44                     self.find_by, self.query))
     45 

ElementDoesNotExist: no elements could be found with tag_name "video"

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-53-c51a7e7b1a4a> in <module>()
----> 1 get_lectures()

<ipython-input-39-8c7fa7d3c9bf> in get_lectures()
     43         lessons = list(zip(lessons_urls, lessons_titles))
     44 
---> 45         download_week_lessons(lessons, i, browser)
     46 
     47         print()

<ipython-input-40-739964be6c5d> in download_week_lessons(lessons, i, browser)
     10         time.sleep(loading_time)
     11         screenshot()
---> 12         mp4 = browser.find_by_tag('video').find_by_tag('source')['src']
     13         mp4 = mp4.replace('360p/',resolution[chosen_res]+'p/')
     14         print('currently downloading: '+ filename)

/Users/mr-woot/anaconda/lib/python3.5/site-packages/splinter/element_list.py in __getattr__(self, name)
     73         except (ElementDoesNotExist, AttributeError):
     74             raise AttributeError(u"'{0}' object has no attribute '{1}'".format(
---> 75                 self.__class__.__name__, name))

AttributeError: 'ElementList' object has no attribute 'find_by_tag'

Coursera uses localization and change language

Hi!

Thank you for creating this.

I am located at Spain, and Coursera seems to detect user IP localization so for instance, instead of "Log In" uses "Iniciar Sesión" to get credentials. Or "Mis Cursos" instead of "My Courses"

I have change those sentences so the script works for me. I wonder if there is any possibility to configure phantomjs to force Coursera to use English.

RuntimeError: Bad magic number in .pyc file

I am new to python.
I have installed Anaconda and phantomjs packages. then, I navigate to the repository using Anaconda prompt.
After running python www-coursera-downloader.pyc I got the following error
RuntimeError: Bad magic number in .pyc file

what should I do??

Plaintext password

Hey,
the "enter your coursera password" dialog takes plaintext passwords.
Can you mask the input?

Unable to download course contents--Gets a ConnectionReset Error

When I am trying to download the courses, I am getting a connection reset message .


for e in chosen_courses:
    
    course_title = courses_t[int(e)-1]
    print(e + ' '+course_title)
    lecture_homepage = homepage + courses_u[int(e)-1]
    os.chdir(initial_dirname)
    create_download_dir(courses_t[ int(e)- 1])
        
    a,b,c = enumerate_lessons(lecture_homepage)
    lessons = zip(a,b,c)
    
    download_videos(lessons)
    #create_m3u_playlist(course_title)
    
    download_quiz(lessons)
    download_html(lessons)
    
    os.chdir(initial_dirname)
    print('download finished')

Executing the above cell gives me this stacktrace and a strange ConnectionReset error.
Is it implemented by Coursera?

StackTrace:

2 Machine Learning: Regression
/Users/gparasha/Downloads/www-coursera-downloader-master/Machine Learning Regression/Machine Learning Regression

---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)
<ipython-input-85-ce301f5ab8ea> in <module>()
      7     create_download_dir(courses_t[ int(e)- 1])
      8 
----> 9     a,b,c = enumerate_lessons(lecture_homepage)
     10     lessons = zip(a,b,c)
     11 

<ipython-input-73-90d09d2102ab> in enumerate_lessons(lecture_homepage)
      1 def enumerate_lessons(lecture_homepage):
      2 
----> 3     weeks = enumerate_weeks(lecture_homepage)
      4     w_digit = len(str(len(weeks)))
      5 

<ipython-input-72-94389983aa33> in enumerate_weeks(lecture_homepage)
      5     weeks = []
      6 
----> 7     if browser.find_by_text('Preview Week 1'):
      8 
      9         week = lecture_homepage.replace('welcome','week/')

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/splinter/driver/webdriver/__init__.py in find_by_text(self, text)
    406     def find_by_text(self, text):
    407         return self.find_by_xpath('//*[text()="%s"]' % text,
--> 408                                   original_find='text', original_query=text)
    409 
    410     def find_by_id(self, id):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/splinter/driver/webdriver/__init__.py in find_by_xpath(self, xpath, original_find, original_query)
    393         return self.find_by(
    394             self.driver.find_elements_by_xpath, xpath, original_find=original_find,
--> 395             original_query=original_query)
    396 
    397     def find_by_name(self, name):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/splinter/driver/webdriver/__init__.py in find_by(self, finder, selector, original_find, original_query)
    371         while time.time() < end_time:
    372             try:
--> 373                 elements = finder(selector)
    374                 if not isinstance(elements, list):
    375                     elements = [elements]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in find_elements_by_xpath(self, xpath)
    407             elements = driver.find_elements_by_xpath("//div[contains(@class, 'foo')]")
    408         """
--> 409         return self.find_elements(by=By.XPATH, value=xpath)
    410 
    411     def find_element_by_link_text(self, link_text):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in find_elements(self, by, value)
    993         return self.execute(Command.FIND_ELEMENTS, {
    994             'using': by,
--> 995             'value': value})['value'] or []
    996 
    997     @property

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
    316 
    317         params = self._wrap_value(params)
--> 318         response = self.command_executor.execute(driver_command, params)
    319         if response:
    320             self.error_handler.check_response(response)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in execute(self, command, params)
    470         data = utils.dump_json(params)
    471         url = '%s%s' % (self._url, path)
--> 472         return self._request(command_info[0], url, body=data)
    473 
    474     def _request(self, method, url, body=None):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in _request(self, method, url, body)
    494             try:
    495                 self._conn.request(method, parsed_url.path, body, headers)
--> 496                 resp = self._conn.getresponse()
    497             except (httplib.HTTPException, socket.error):
    498                 self._conn.close()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in getresponse(self)
   1329         try:
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:
   1333                 self.close()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in begin(self)
    295         # read until we get a non-100 response
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:
    299                 break

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in _read_status(self)
    256 
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:
    260             raise LineTooLong("status line")

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py in readinto(self, b)
    584         while True:
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:
    588                 self._timeout_occurred = True

ConnectionResetError: [Errno 54] Connection reset by peer

0 lesson available

Hi first of all thanks for your work, I'm here to ask you if you may know why I can't visualize the lesson: it say 0 lesson available, and I don't know how to fix this. I attach a screenshot. Thank you again for your time.
cattura

Encrypting the user credentials the way you do is completely useless!

Hi,

First of all, thank you for sharing your program. I'm creating this issue to let you (and all the other people who still stumble on your repo and try to run the code even though it has not been maintained) know that the way the user credentials are "securely encrypted" is completely wrong!

Encrypting anything is useless as long as you let anyone know which key you used, which is exactly what you did by hard-coding the keys in the Python source file. No matter if you use 2 keys or 10000. The moment you store an encryption key alongside the encrypted file, there is no secret anymore.

So, I suggest that either:
a) You don't encrypt the credentials at all, thus encouraging the users to delete the configuration file once they are done using your program a coupe of times. That is, you forget about encryption at all instead of letting the users think you do it right. This is not safer from what you do now, but at least the users know what to expect.
b) You never store anything and instead prompt the users for their credentials each time. Entering a password at runtime is the only safe way that I know of.

I hope this will help you not make that mistake ever again.

Apparently I cannot login

Using WWW-COURSERA-DOWNLOADER.ipynb, I apparently cannot login.
At cell after fter button_click('Log in') I get

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/splinter/element_list.py in __getitem__(self, index)
     41         try:
---> 42             return self._container[index]
     43         except IndexError:

IndexError: list index out of range

so I inserted

browser.visit('https://www.coursera.org/courses')
screenshot()

and the image shown

image

(cropped the bottom, irrelevant) is as if I were not logged in. I checked the username and password variables and they are ok.

What are possible causes and solutions?

Cousera log in issue

I have followed all the steps and went perfect; however, I have issue to verify once it logged in and didn't pass the reCAPTCHA.

Unable to launch headless browser

While executing this cell
browser = Browser('firefox', headless = True)

I am getting this error.
SessionNotCreatedException: Message: Unable to find a matching set of capabilities

I have firefox quantum 61.0.1 (64-bit) installed, and selenium version is 3.11.0

gparasha-macOS:Downloads gparasha$ selenium-server --version
Selenium server version: 3.11.0, revision: e59cfb3

And geckodriver version is geckodriver 0.21.0

Can you tell why I am getting this issue and how to get rid of this

lessons_i is empty

did not download any file. Goes till the course page but does not download anything after that

Cant download videos of good quality

Looks like I can download videos from this script in low quality only. It would be very helpful if you could find a solution for it , in original website firstly putting the video at high quality and then right clicking on video and saving as gives high quality download. See if you could do this.

Doesn't work in Python 3

Thanks for wasting my time. There is not a single mention anywhere about it supporting only Python2. The whole world has moved on, and so should you.

Selenium Issue

I am getting following error while downloading courses:
Getting courses list...
Traceback (most recent call last):
File "www-coursera-downloader.py", line 751, in
main()
File "www-coursera-downloader.py", line 740, in main
fullCourseName = getCourses(email, password)
File "www-coursera-downloader.py", line 202, in getCourses
browser.fill('email', email)
File "/home/metal-machine/www-coursera-downloader/mypy/local/lib/python2.7/site-packages/splinter/driver/webdriver/init.py", line 415, in fill
field = self.find_by_name(name).first
File "/home/metal-machine/www-coursera-downloader/mypy/local/lib/python2.7/site-packages/splinter/element_list.py", line 53, in first
return self[0]
File "/home/metal-machine/www-coursera-downloader/mypy/local/lib/python2.7/site-packages/splinter/element_list.py", line 44, in getitem
self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "email"

Error Every Single time I run the File

User and password have been saved to coursera.pass file.
Please delete the file if you want to change your credentials.

Traceback (most recent call last):
File "www-coursera-downloader.py", line 748, in
main()
File "www-coursera-downloader.py", line 734, in main
email, password = getUserPass("coursera.pass")
File "www-coursera-downloader.py", line 154, in getUserPass
decryptedText = decrypt(strText, key1, key2)
File "www-coursera-downloader.py", line 116, in decrypt
decryptedText = obj2.decrypt(strInput)
File "C:\Users\Jash\Miniconda2\lib\site-packages\Crypto\Cipher\blockalgo.py",
line 295, in decrypt
return self._cipher.decrypt(ciphertext)
ValueError: Input strings must be a multiple of 16 in length

Input strings must be a multiple of 16 in length

it produces the following traceback
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py', wdir='C:/Users/Yatish H R/Desktop/www-coursera-downloader-master')

File "C:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "C:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 751, in
main()

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 737, in main
email, password = getUserPass("coursera.pass")

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 157, in getUserPass
decryptedText = decrypt(strText, key1, key2)

File "C:/Users/Yatish H R/Desktop/www-coursera-downloader-master/www-coursera-downloader.py", line 119, in decrypt
decryptedText = obj2.decrypt(strInput)

File "C:\anaconda\lib\site-packages\Crypto\Cipher\blockalgo.py", line 295, in decrypt
return self._cipher.decrypt(ciphertext)

ValueError: Input strings must be a multiple of 16 in length

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.