Code Monkey home page Code Monkey logo

coursera-dl's People

Contributors

abhirama avatar bijujosephjacob avatar camilojd avatar capatillo avatar danmbox avatar dgorissen avatar forever-young avatar ilfats avatar indraastra avatar itelichko avatar jackmaney avatar kapyshin avatar olivierkes avatar rnys avatar sebastianlopienski avatar shtratos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coursera-dl's Issues

Specify week(s) to download

Would it be possible to specify the week(s) to download material for? I'm currently taking a course that is six weeks long, with content released on Monday morning each week. The download script detects that it has previously downloaded Week 1's material, but it takes a while for it figure that out. By the time I get to the last week, it will have to walk through the first five weeks before starting to download anything.

when downloading 2 courses, course-2 is saved inside dir for course-1

I started the script this way:

[~/education/Coursera]$ coursera-dl -u [email protected] -p mypass -d ./ sciwrite-2012-001 biostats-2012-001

And biostats-2012-001 was saved inside sciwrite-2012-001 dir.

The output was like this:

* Authenticating as [email protected]...
* Collecting downloadable content from http://class.coursera.org/sciwrite-2012-001/lecture/index
* Got all downloadable content for sciwrite-2012-001
* sciwrite-2012-001 will be downloaded to /Users/myusername/education/Coursera/sciwrite-2012-001
 - Downloading lecture/syllabus pages
 - Unit 1

...
... all is going fine here
...

 - Unit 8
  - Downloading resources for 8.1 How to do a peer review (2847)
  - Downloading resources for 8.2 Communicating with journalists and the lay public (1601)
  - Downloading resources for 8.3 Panel Interview (2250)
  - Downloading resources for 8.4 In-class Editing Exercise (2248)
  - Downloading resources for 8.5 In-class Editing Exercise (2048)
    - failed:  https://d19vezwu8eufl6.cloudfront.net/sciwrite/%2FUnit_PDFs%2FModule8.5.pdf HTTP Error 403: Forbidden
  - Downloading resources for 8.6 Concluding Remarks (258)
* Authenticating as [email protected]...
* Already logged in
* Collecting downloadable content from http://class.coursera.org/biostats-2012-001/lecture/index
* Got all downloadable content for biostats-2012-001
* biostats-2012-001 will be downloaded to /Users/myusername/education/Coursera/sciwrite-2012-001/biostats-2012-001

...
... downloading continues normally
...

only downloads html files.

Trying to get the content for the database course and the script installs fine but when I run:

coursera-dl -u email -p password -d ./ db

It will only download the html files. The user name and password are correct.

No ''load_page" object

Hello. I have the following error:

C:\Portable Python 2.7.3.1\App\Scripts>coursera-dl -u [email protected] -p mypass -d d:\coursera\ comnetworks-2012-001
Coursera-dl v1.4.5 (lxml)

Course 1 of 1

  • Collecting downloadable content from https://class.coursera.org/comnetworks-2012-001/lecture/index
    Traceback (most recent call last):

    File "C:\Portable Python 2.7.3.1\App\Scripts\coursera-dl-script.py", line 8, in load_entry_point('coursera-dl==1.4.5', 'console_scripts', 'coursera-dl')()

    File "build\bdist.win32\egg\courseradownloader\courseradownloader.py", line 503, in main parser = argparse.ArgumentParser(description='Download Coursera.org course videos/docs for offline use.')

    File "build\bdist.win32\egg\courseradownloader\courseradownloader.py", line 257, in download_course

    File "build\bdist.win32\egg\courseradownloader\courseradownloader.py", line 176, in get_downloadable_content

AttributeError: 'CourseraDownloader' object has no attribute 'load_page'

And download doesn't go. Please, tell me what is wrong. Thank you.

Coursera-dl not working with update

I performed an upgrade and installed the additional dependencies. Now I'm recieving this error:
C:\Python27\Scripts>coursera-dl --quiz -u xxxx -p xxxx -d c:_
Downloads\classes\test
Traceback (most recent call last):
File "C:\Python27\Scripts\coursera-dl-script.py", line 5, in
from pkg_resources import load_entry_point
File "C:\Python27\lib\site-packages\setuptools-0.6c12dev_r88846-py2.7.egg\pkg_
resources.py", line 2603, in
File "C:\Python27\lib\site-packages\setuptools-0.6c12dev_r88846-py2.7.egg\pkg_
resources.py", line 666, in require
File "C:\Python27\lib\site-packages\setuptools-0.6c12dev_r88846-py2.7.egg\pkg_
resources.py", line 565, in resolve
pkg_resources.DistributionNotFound: cssselect

I thought it was setup tools so updated ez_setup, still occurs. I've no idea what's going on. Any help???

Thanks :>

It doesn't work recently!

Dear Dirk,
Hi
At first lots of thanks for making such a nice course downloader, Secondly it doesn't work for my recently.
I used it few days ago but after updating my OS(Fedora 18) it doesn't work, e.g. when I wanted to download the scientificcomp-002 course I got the below error:

  • Collecting downloadable content from http://class.coursera.org/scientificcomp-002/lecture/index
    Traceback (most recent call last):
    File "/usr/bin/coursera-dl", line 9, in
    load_entry_point('coursera-dl==1.2.1', 'console_scripts', 'coursera-dl')()
    File "/usr/lib/python2.7/site-packages/coursera_dl-1.2.1-py2.7.egg/courseradownloader/courseradownloader.py", line 472, in main
    d.download_course(cn,dest_dir=args.dest_dir)
    File "/usr/lib/python2.7/site-packages/coursera_dl-1.2.1-py2.7.egg/courseradownloader/courseradownloader.py", line 194, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
    File "/usr/lib/python2.7/site-packages/coursera_dl-1.2.1-py2.7.egg/courseradownloader/courseradownloader.py", line 112, in get_downloadable_content
    hrefs = classResources.findAll('a')
    AttributeError: 'NoneType' object has no attribute 'findAll'

or as another example by downloading neuralnets-2012-001 course it produces:

  • Collecting downloadable content from http://class.coursera.org/neuralnets-2012-001/lecture/index
    Traceback (most recent call last):
    File "/usr/bin/coursera-dl", line 9, in
    load_entry_point('coursera-dl==1.2.1', 'console_scripts', 'coursera-dl')()
    File "/usr/lib/python2.7/site-packages/coursera_dl-1.2.1-py2.7.egg/courseradownloader/courseradownloader.py", line 472, in main
    d.download_course(cn,dest_dir=args.dest_dir)
    File "/usr/lib/python2.7/site-packages/coursera_dl-1.2.1-py2.7.egg/courseradownloader/courseradownloader.py", line 194, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
    File "/usr/lib/python2.7/site-packages/coursera_dl-1.2.1-py2.7.egg/courseradownloader/courseradownloader.py", line 116, in get_downloadable_content
    resourceLinks = [ (h['href'],None) for h in hrefs]
    File "/usr/lib/python2.7/site-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/element.py", line 879, in getitem
    return self.attrs[key]

Could you help me with this?
Best

Downloading of videos does not work

Hi,

Downloading of video lecture does not work anymore. The program exit upon downloading the lecture index page which prompts for login and enrollment.

can browser.retrive be speed up

Is it possible to speed up browser.retreve
It downloads very slowly for me
and sometimes(esp when downloading lectures) it fails with the message like 'no response from server'

spynner.browser.SpynnerTimeout: SPYNNER waitload: Timeout reached: 5 retries for 2s delay

Hi,

I've tried the new version. I had to install libxml2-dev, libxslt1-dev and pyqt4-dev-tools to make coursera-dl to work. However now after running it I've got:

Traceback (most recent call last):
File "/usr/local/bin/coursera-dl", line 8, in
load_entry_point('coursera-dl==1.3', 'console_scripts', 'coursera-dl')()
File "/usr/local/lib/python2.6/dist-packages/courseradownloader/courseradownloader.py", line 481, in main
d.login()
File "/usr/local/lib/python2.6/dist-packages/courseradownloader/courseradownloader.py", line 78, in login
page = wait_for_content(lambda s: s.find(id="signin-password"))
File "/usr/local/lib/python2.6/dist-packages/courseradownloader/courseradownloader.py", line 72, in wait_for_content
webkit.wait_for_content(can_continue, 5, "Timout reached, please check your network and username/password", delay=2)
File "/usr/local/lib/python2.6/dist-packages/spynner/browser.py", line 868, in wait_for_content
raise SpynnerTimeout(msg)
spynner.browser.SpynnerTimeout: SPYNNER waitload: Timeout reached: 5 retries for 2s delay.
Timout reached, please check your network and username/password

Please help

Failed to download "[Errno 22] invalid mode ('wb') or filename"

Hi.
On windows, downloading the "An Introduction to Interactive Programming in Python" course, I'm getting the following error:

Failed to download url https://class.coursera.org/interactivepython-2012-001/wiki/view?page=week2 to c:\CoursEra\interactivepython-2012-001\04 - Week 2b - Buttons and input fields\02 - Input fields (934)\view?page=week2.html: [Errno 22] invalid mode ('wb') or filename: u'c:\CoursEra\interactivepython-2012-001\04 - Week 2b - Buttons and input fields\02 - Input fields (934)\view?page=week2.html'

Can you help me with that?
Thanks in advance.

Correct representation of video length

While downloading the video, I see messages like:

"- Downloading resources for 1.4 XXXX (1511)"

After a while I realized that "(1511)" really represent duration of the video. A more appropiate representation would be "(15:11)". That is, adding a colon.

Thanks for this useful script!.

Can we reduce cosmetic changes to file/folder names

I have 3 sets of files for course scientificthinking. One without time stamp, one with time stamps and (today I upgraded courseradownload.py) and one with time stamp with character '-' in between. These are the same files but different names. It is wastage of resources, both mine and as well coursera, to download these files over and over again.
Dgorissen your project has now life of its own and anything that is not really essential like time stamp on file/folders could be deferred or put in another project. Or maybe you can strengthen the check on existence of file before downloading. It will save lot of resources.
Thanks Rod.

Crash during download & skipped videoes

Just got this error:

* Collecting downloadable content from https://class.coursera.org/posa-001/lecture/index
Traceback (most recent call last):
  File "/Users/allan/Dropbox/Crap/inst/coursera-dl/courseradownloader/courseradownloader.py", line 545, in <module>
    main()
  File "/Users/allan/Dropbox/Crap/inst/coursera-dl/courseradownloader/courseradownloader.py", line 542, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
  File "/Users/allan/Dropbox/Crap/inst/coursera-dl/courseradownloader/courseradownloader.py", line 295, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
  File "/Users/allan/Dropbox/Crap/inst/coursera-dl/courseradownloader/courseradownloader.py", line 207, in get_downloadable_content
    bb = BeautifulSoup(p,self.parser)
NameError: global name 'p' is not defined

The error seem specific to this course (Pattern-Oriented Software Architectures for Concurrent and Networked Software), as two other did not crash.

I think all course material is available, judging from the course dashboard:
image

Also, for some reason, not all the available material was downloaded from the course datasci-001 (Introduction to Data Science). The script just went on to the next, guitar-001, (Introduction to Guitar), where one video were skipped.

Albeit I have no real knowledge about how this script works, I have an idea that the problem is fetching the downloadable content, as several videos were downloaded without problems.

Some weeks are not downloading

I've been using coursera-dl on Ubuntu successfully for several months, but recently I've encountered an issue where only the first weeks of a completed course will download. For example, "A Beginner's Guide to Irrational Behaviour" (behavioralecon-001) has 6 weeks of videos but only the first and some of the second week download. No error message is produced as far as I can make out. I've also encountered the same issue with linearopt-001.

Don`t work

Script initially worked, but today I try to download new materials from Coursera and get nothing.

command line: c:\Python27\Scripts>coursera-dl-script.py -u -p -d d:\Downloads\Coursera\ proglang-2012-001

No logs, no reports.

Course gets 404 from script, works in browser

I got this problem in both current version and the one before that.
For the course bluebrain-001, I get the following error spat back at me. I am able to view and download the material through a browser. All other courses (a handful) that I've tried so far have worked.

Running from linux.

coursera-dl -u xxxxxxxxxxxx@xxxxxxxx -p xxxxxxxxxxxxx bluebrain-001
Warning: lxml not available, falling back to built-in 'html.parser' (see -q option), this may cause problems on Python < 2.7.3
Coursera-dl v1.4.8 (html.parser)
Logging in as 'xxxxxxx@xxxxxxxxx'...

Course 1 of 1
. Collecting downloadable content from https://class.coursera.org/bluebrain-001/lecture/index
Traceback (most recent call last):
File "/usr/local/bin/coursera-dl", line 9, in
load_entry_point('coursera-dl==1.4.8', 'console_scripts', 'coursera-dl')()
File "/usr/local/lib/python2.6/dist-packages/courseradownloader/courseradownloader.py", line 598, in main
d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
File "/usr/local/lib/python2.6/dist-packages/courseradownloader/courseradownloader.py", line 301, in download_course
(weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
File "/usr/local/lib/python2.6/dist-packages/courseradownloader/courseradownloader.py", line 212, in get_downloadable_content
pg = self.browser.open(lurl,timeout=self.TIMEOUT)
File "/usr/local/lib/python2.6/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.6/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 404: Not Found

Download Preview Lectures

Hello Dirk,

Its great that you worked hard for others to be able to download the Coursera Videos. Thanks :)

The courses you register can be downloaded easily with the coursera-dl. But would it be possible for you to tweak something to add, for us to be able to download Preview Videos from coursera. Say for example, the Algorithms class hasn't yet started, but the videos are available on https://class.coursera.org/algo-2012-002/lecture/20.

Is it possible to download them in some way? If yes, please tell.

Thanks 👍

can not download videos, only two files: index.html and lectures.html

coursera-dl-error

I tried the command in the picture, but only got two files: 'index.html' and 'lectures.html',

the course "Think again" is in Week 7.

Other finished courses like "Model Thinking" also had this problem.

I can't watch youtube video directly, usually I will use proxy, maybe this is the reason. But I don't know how to set proxy for coursera-dl, so I can't confirm it.

upgrade error

Thank you for putting this together and maintaining it!

I am trying to upgrade from 1.1.11 to 1.2.1 using pip install --upgrade coursera-dl. This is on ubuntu 12.04 LTS OS. I am getting the following error:

Downloading/unpacking coursera-dl
Running setup.py egg_info for package coursera-dl
running egg_info
writing requirements to pip-egg-info/coursera_dl.egg-info/requires.txt
writing pip-egg-info/coursera_dl.egg-info/PKG-INFO
writing top-level names to pip-egg-info/coursera_dl.egg-info/top_level.txt
writing dependency_links to pip-egg-info/coursera_dl.egg-info/dependency_links.txt
writing entry points to pip-egg-info/coursera_dl.egg-info/entry_points.txt
warning: manifest_maker: standard file '-c' not found

reading manifest file 'pip-egg-info/coursera_dl.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'pip-egg-info/coursera_dl.egg-info/SOURCES.txt'

Downloading/unpacking mechanize (from coursera-dl)
Getting page http://pypi.python.org/simple/mechanize
URLs to search for versions for mechanize (from coursera-dl):

Thank you, Michael

Cannot download course due to AttributeError

Coursera-dl v1.4.8 (lxml)
Logging in as '[email protected]'...

Course 1 of 1
* Collecting downloadable content from https://class.coursera.org/datasci-001/lecture/index
Traceback (most recent call last):
  File "/usr/local/bin/coursera-dl", line 9, in 
    load_entry_point('coursera-dl==1.4.8', 'console_scripts', 'coursera-dl')()
  File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 598, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
  File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 301, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
  File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 186, in get_downloadable_content
    hrefs = classResources.findAll('a')
AttributeError: 'NoneType' object has no attribute 'findAll'

Problem downloading

Hi,

I've had no problems using the downloader until today when I got the following error message:

Warning: lxml not available, falling back to built-in 'html.parser' (see -q option), this may cause problems on Python < 2.7.3
HTML parser set to html.parser

  • Collecting downloadable content from https://class.coursera.org/humanphysio-001/lecture/index
    Traceback (most recent call last):
    File "/usr/local/bin/coursera-dl", line 8, in
    load_entry_point('coursera-dl==1.4.1', 'console_scripts', 'coursera-dl')()
    File "/Library/Python/2.7/site-packages/courseradownloader/courseradownloader.py", line 495, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
    File "/Library/Python/2.7/site-packages/courseradownloader/courseradownloader.py", line 256, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
    File "/Library/Python/2.7/site-packages/courseradownloader/courseradownloader.py", line 175, in get_downloadable_content
    bb = self.load_page(lurl)
    AttributeError: 'CourseraDownloader' object has no attribute 'load_page'

debugger

Hi,
I would like to know what IDE u used to create the script.
I am able to understand the code to some extent but not able to understand the library functions used.
So if you could tell what IDE to use to understand/debug the code would be very helpful

Thanks!

only download's syllabus

Edit: has been fixed!

I used the script on a number of courses in mid december so I don't think it's a problem with my set-up. When I run the script it:

  • authenticates
  • collects downloadable content from ...
  • got all downloadable content for

-downloading lecture/syllabus pages

Then ends the script.

Final result is a new folder created with only the syllabus link in it.

exception when trying to create long directories

I am trying new script with modifications for long filenames. I am getting following error with course "writing 2" and the error is as following:

File "courseradownloader.py", line 650, in
main()
File "courseradownloader_filename.py", line 647, in main
d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
File "courseradownloader_filename.py", line 374, in download_course
os.makedirs(clsdir)
File "E:\Python27\lib\os.py", line 157, in makedirs
mkdir(name, mode)
WindowsError: [Error 206] The filename or extension is too long: 'c:\Cou\writing
-001\07 - Unit 3 Analyzing Rhetorically\07 - Core Analyzing Text [0448] Avail
ble as streaming content at httpgo.osu.eduanalyzingtext. Please point your web
rowser to that URL or download a text transcript using the icon link at right -
'

unable to download subtitles from crypto-006

Hi first thanks for this magnificent resource!

I am trying to download the crypto-006 course. While everything downloads smoothly, an error shows up when trying to download subtitles. Am I missing something? I am using lxml parser

Downloading resources for History of cryptography (19 min)
- failed: https://class.coursera.org/crypto-006/lecture/subtitles?q=3_en&format=txt HTTP Error 500: Internal Server Error
- failed: https://class.coursera.org/crypto-006/lecture/subtitles?q=3_en&format=srt HTTP Error 500: Internal Server Error

  • Downloading resources for Discrete probability (Crash course) (18 min)

KeyError:'href' for course einstein-001

Hi, I'm seeing the following error:

Course 1 of 5
* Collecting downloadable content from https://class.coursera.org/einstein-001/lecture/index
Traceback (most recent call last):
  File "C:\Python27\Scripts\coursera-dl-script.py", line 8, in <module>
    load_entry_point('coursera-dl==1.4.8', 'console_scripts', 'coursera-dl')()
  File "build\bdist.win32\egg\courseradownloader\courseradownloader.py", line 59
8, in main
  File "build\bdist.win32\egg\courseradownloader\courseradownloader.py", line 30
1, in download_course
  File "build\bdist.win32\egg\courseradownloader\courseradownloader.py", line 19
4, in get_downloadable_content
  File "C:\Python27\lib\site-packages\bs4\element.py", line 879, in __getitem__
    return self.attrs[key]
KeyError: 'href'

Failed to find csrf cookie

This is the very first time I ran coursera-dl:

$ coursera-dl -u myUsername -p myPassword 'Think Again'
Coursera-dl v1.4.5 (lxml)
Traceback (most recent call last):
  File "/usr/local/bin/coursera-dl", line 9, in <module>
    load_entry_point('coursera-dl==1.4.5', 'console_scripts', 'coursera-dl')()
  File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 497, in main
    d.login(args.course_names[0])
  File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 71, in login
    if not csrfcookie: raise Exception("Failed to find csrf cookie")
Exception: Failed to find csrf cookie

Python 2.7 on Linux 3.2.

Unable to download archived course

Hi! Thanks for the program! I tried to download an archived course, specifically: Computer Architecture by David Wentzlaff. Code name: comparch-2012-001.

image

That's the error that gave me (Course 12 of 12). Sorry I don't know how to copy the text log!

Download Assignment Content

Downloads assignment content such as pictures, files, etc.
an appropriate flag should be set and shown in the list when "-h" is invoked.

NameError: global name 'p' is not defined

I have the following error..
Don't know much python.. but here is the message...
I have gotten the message for 3 courses...

Coursera-dl v1.4.7 (lxml)

Course 1 of 1

  • Collecting downloadable content from https://class.coursera.org/hwswinterface-001/lecture/index
    Traceback (most recent call last):
    File "/usr/local/bin/coursera-dl", line 9, in
    load_entry_point('coursera-dl==1.4.7', 'console_scripts', 'coursera-dl')()
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 546, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 296, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 208, in get_downloadable_content
    bb = BeautifulSoup(p,self.parser)
    NameError: global name 'p' is not defined

Fails to download videos from course "compfinance-2012-001"

Good news is it correctly downloads resources and stuff, but it fails to download the videos themself from "compfinance-2012-001". Log (with command line args):

BigMac:~ darin$ cd coursera/
BigMac:coursera darin$ coursera-dl -u <...> -p <...> -d . compfinance-2012-001
Authenticating as <...>...
Collecting downloadable content from http://class.coursera.org/compfinance-2012-001/lecture/index
Warning: Failed to find video for Welcome to Introduction to Computational Finance and Financial Econometrics (1314)
Warning: Failed to find video for 1.0 Week 1 Introduction (058)
Warning: Failed to find video for 1.1 Future Value, Present Value and Compounding (1702)
Warning: Failed to find video for 1.2 Asset Returns (1653)
Warning: Failed to find video for 1.3 Portfolio Returns (912)
Warning: Failed to find video for 1.4 Dividends (400)
Warning: Failed to find video for 1.5 Inflation (457)
Warning: Failed to find video for 1.6 Annualizing Returns (532)
Warning: Failed to find video for 1.7 Continuously Compounded Returns (1555)
Warning: Failed to find video for 1.8 CC Portfolio Returns and Inflation (550)
Warning: Failed to find video for 1.9 Simple Returns (401)
Warning: Failed to find video for 1.10 Getting Financial Data from Yahoo (1026)
Warning: Failed to find video for 1.11 Return Calculations (621)
Warning: Failed to find video for 1.12 Growth of 1 (658)
Warning: Failed to find video for 2.0 Week 2 Introduction (106)
Warning: Failed to find video for 2.1 Univariate Random Variables (2011)
Warning: Failed to find video for 2.2 Cumulative Distribution Function (842)
Warning: Failed to find video for 2.3 Quantiles (750)
Warning: Failed to find video for 2.4 Standard Normal Distribution (1602)
Warning: Failed to find video for 2.5 Expected Value and Standard Deviation (1958)
Warning: Failed to find video for 2.6 General Normal Distribution (623)
Warning: Failed to find video for 2.7 Standard Deviation as a Measure of Risk (434)
Warning: Failed to find video for 2.8 Normal Distribution Appropriate for simple returns (1422)
Warning: Failed to find video for 2.9 Skewness and Kurtosis (1539)
Warning: Failed to find video for 2.10 Students-t Distribution (552)
Warning: Failed to find video for 2.11 Linear Functions of Random Variables (1113)
Warning: Failed to find video for 2.12 Value at Risk (1948)
Warning: Failed to find video for 3.0 Week 3 Introduction (104)
Warning: Failed to find video for 3.1 Location-scale Model (1215)
Warning: Failed to find video for 3.2 Bivariate Discrete Distributions (1418)
Warning: Failed to find video for 3.3 Bivariate Continuous Distributions (1415)
Warning: Failed to find video for 3.4 Covariance (1916)
Warning: Failed to find video for 3.5 Correlation and the Bivariate Normal Distribution (1159)
Warning: Failed to find video for 3.6 Linear Combination of 2 Random Variables (1109)
Warning: Failed to find video for 3.7 Portfolio Example (1920)
Warning: Failed to find video for 3.8 Matrix Algebra Review Part 1 (1702)
Warning: Failed to find video for 3.9 Matrix Algebra Review Part 2 (2010)

(and etc etc -- lots more failures to download video here. But then it starts downloading resources and all of those actually download fine. It's just the videos that fail).

Problem with proxy authentication

I am trying to use the script at work, behind a corporate proxy.
Everything works from the network perspective (I pip-installed coursera-dl without problems) but when I try to use the script, I get this error:

File "/home/xyz/.virtualenvs/coursera/lib/python2.7/site-packages/mechanize/_urllib2_fork.py", line 1118, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error Tunnel connection failed: 407 Proxy Authentication Required>

Shouldn't urllib get the setup default proxy configurations from the system?
Also, maybe the addition of a --proxy parameter could come handy...

Unnecessary '\n' character

Completed quiz in video-lecture adds a "\nQuiz Attempted" label. It contains unnecessary '\n' character . '\n' produces fail in mkdir operation on Windows.

Too long path in Windows

Some courses contains a materials with a very long path names.

Example: inforiskman-2012-001\08 - Week 7\08 - Business Continuity and Disaster Recovery Michael Ness, Part 1 - Leadership Selling Your Ideas (1542)\8 - 8 - Business Continuity and Disaster Recovery Michael Ness, Part 1 - Leadership Selling Your Ideas (1542).srt

Error while downloading only the below course. Works fine for other courses.

C:\Coursera>coursera-dl -u [email protected] -p xxxxxxxx hwswinterfac
e-001
Warning: lxml not available, falling back to built-in 'html.parser' (see -q opt
ion), this may cause problems on Python < 2.7.3
Coursera-dl v1.4.7 (html.parser)

Course 1 of 1

  • Collecting downloadable content from https://class.coursera.org/hwswinterface-
    001/lecture/index
    Traceback (most recent call last):
    File "C:\Python27\Scripts\coursera-dl-script.py", line 8, in
    load_entry_point('coursera-dl==1.4.7', 'console_scripts', 'coursera-dl')()
    File "C:\Python27\lib\site-packages\courseradownloader\courseradownloader.py",
    line 546, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
    File "C:\Python27\lib\site-packages\courseradownloader\courseradownloader.py",
    line 296, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
    File "C:\Python27\lib\site-packages\courseradownloader\courseradownloader.py",
    line 208, in get_downloadable_content
    bb = BeautifulSoup(p,self.parser)
    NameError: global name 'p' is not defined

No module named html.entities

Dear Dirk,

Thanks for writing this module. I have the following error arise, even though pip install says my beautifulsoup4 is up to date:

"Traceback (most recent call last):
File "D:\Documents\Desktop\Coursera\coursera-dl.py", line 9, in
from bs4 import BeautifulSoup
File "C:\Python27\lib\site-packages\bs4__init__.py", line 29, in
from .builder import builder_registry
File "C:\Python27\lib\site-packages\bs4\builder__init__.py", line 4, in
from bs4.element import (
File "C:\Python27\lib\site-packages\bs4\element.py", line 5, in
from bs4.dammit import EntitySubstitution
File "C:\Python27\lib\site-packages\bs4\dammit.py", line 11, in
from html.entities import codepoint2name
ImportError: No module named html.entities"

On Windows vista

What do you recommend?

Installation fails on Windows

Windows 7 32-bit, Python 2.7.2

D:\> pip install coursera-dl
Downloading/unpacking coursera-dl
  Running setup.py egg_info for package coursera-dl
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "D:\build\coursera-dl\setup.py", line 5, in <module>
        import version
    ImportError: No module named version
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "D:\build\coursera-dl\setup.py", line 5, in <module>

    import version

ImportError: No module named version

retrieval of files is stuck when connection is lost

Hi

I've tried to download several courses at once and when my network connection was lost - coursera-dl got stuck for hours.

I would've expected it to exit on timeout.

I did Ctrl+C and here's the trace:

Course 3 of 4
* Collecting downloadable content from https://class.coursera.org/bluebrain-001/lecture/index
* Got all downloadable content for bluebrain-001
* bluebrain-001 will be downloaded to /Users/irulan/education/Coursera/bluebrain-001
 - Downloading lecture/syllabus pages
 - Lesson 1 - Brain excitements for the 21st century
  - Downloading resources for Welcome words and great thinkers (09-31)
  - Downloading resources for The blossoming of the brain in the world (07-53)
  - Downloading resources for The connectomics (10-31)
  - Downloading resources for Brainbow (08-06)
  - Downloading resources for Brain Machine Interface (BMI) (16-48)
  - Downloading resources for Optogenetics (07-24)
  - Downloading resources for Simulation of the brain - Blue Brain Project (08-57)
 - Lesson 2 - The materialistic mind  your brains ingredients
  - Downloading resources for The Neuron (09-19)
  - Downloading resources for The Neuron Doctrine (09-11)
  - Downloading resources for The Neuron as IO Device part 1 (08-04)
  - Downloading resources for The Axon (14-10)
  - Downloading resources for The Dendrite (08-34)
  - Downloading resources for Neuron Types (11-08)
  - Downloading resources for The Synapse (14-07)
  - Downloading resources for The Neuron as IO Device part 2 (07-27)
 - Lesson 3 - Electrifying brains passive electrical signals
  - Downloading resources for Sources for Lecture 3 - No Video
    - failed:  http://en.wikipedia.org/wiki/Ohm%27s_law HTTP Error 403: Forbidden
    - failed:  http://en.wikipedia.org/wiki/Kirchhoff%27s_circuit_laws HTTP Error 403: Forbidden
    - failed:  http://en.wikipedia.org/wiki/RC_circuit HTTP Error 403: Forbidden
    - failed:  https://class.coursera.org/bluebrain-001/lecture/download.mp4?lecture_id=39 HTTP Error 500: Internal Server Error
  - Downloading resources for The Cell as RC Circuit (11-01)
  - Downloading resources for The Voltage Equation for the Passive Cell (09-20)
  - Downloading resources for The Membrane Time Constant (14-03)
  - Downloading resources for Temporal Summation (09-31)
  - Downloading resources for The Resting Potential (08-54)
  - Downloading resources for The Synaptic Potential Part 1 (09-38)
  - Downloading resources for The Synaptic Conductance (06-19)
^CTraceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/coursera-dl", line 8, in 
    load_entry_point('coursera-dl==1.4.7', 'console_scripts', 'coursera-dl')()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/courseradownloader/courseradownloader.py", line 546, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/courseradownloader/courseradownloader.py", line 372, in download_course
    self.download(classResource,target_dir=clsdir,target_fname=tfname)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/courseradownloader/courseradownloader.py", line 282, in download
    self.browser.retrieve(url,filepath)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mechanize/_opener.py", line 277, in retrieve
    block = fp.read(bs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mechanize/_response.py", line 195, in read
    data = self.wrapped.read(to_read)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 561, in read
    s = self.fp.read(amt)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)

Problem to download data

When i try download:

berri@nbberri:~/coursera$ sudo coursera-dl -d /home/berri/coursera/ bluebrain-001
Coursera-dl v1.4.8 (lxml)
Credentials found in .netrc file
Logging in as '[email protected]'...

Course 1 of 1

  • Collecting downloadable content from https://class.coursera.org/bluebrain-001/lecture/index
    Traceback (most recent call last):
    File "/usr/local/bin/coursera-dl", line 9, in
    load_entry_point('coursera-dl==1.4.8', 'console_scripts', 'coursera-dl')()
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 598, in main
    d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 301, in download_course
    (weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 212, in get_downloadable_content
    pg = self.browser.open(lurl,timeout=self.TIMEOUT)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
    raise response
    mechanize._response.httperror_seek_wrapper: HTTP Error 404: Not Found

issue(s) downloading some classes

Get following error message only downloading some courses

courses noticed problem:

econ1scientists-2012-001 introstats-001

courses tried in same session that successfully downloaded:

GTG-2013-001 macroeconomics-2012-001

error message:

Course 1 of 1
*collecting downloadable content from https://class.coursera.org/introstats-001/lecture/index
Traceback (most recent call last):
File “c:\Python27\scripts\coursera-dl-script.py”, line 9, in
load_entry_point(’coursera-dl==1.4.8’, ‘console_scripts’, ‘coursera-dl’)Q
Fi le “c:\Python27\lib\site-packages\courseradownloader\courseradownloader.py”,
line 598, in main
d.download_course(cn,dest_dir=args.dest_dir,reverse_sections=args.reverse)
Fi le “c:\Python27\lib\site-packages\courseradownloader\courseradownloader.py”,
line 301, in download_course
(weeklyTopics, allClasses) = self.get_downloadable_content(course_url)
Fi le “c:\Python27\lib\site—packages\courseradownloader\courseradownloader.py”,
line 186, in get_downloadable_content
hrefs = classResources.findAll(’a’)
AttributeError: ‘NoneType’ object has no attribute ‘findAll’

Doesn't recognize the directory in Mac for downloading!

I created a directory called 'test' and then ran the following on the command line
'coursera-dl -u myusername -p mypassword -d /test/courses/ algo-2012-001'

The script seems to be having difficulty in finding the 'test' folder I created to download the course contents. Where should I be creating the directory? at the root? should I also be creating a directory 'courses' underneath the 'test' directory? I am a total newbie and not too familiar with Mac. I am on Mac OS 10.8,2. Please help!!

Here is the error message I got:

Warning: lxml not available, falling back to built-in 'html.parser' (see -q option), this may cause problems on Python < 2.7.3
HTML parser set to html.parser

  • Authenticating as [email protected]...
  • Collecting downloadable content from http://class.coursera.org/algo-2012-001/lecture/index
  • Got all downloadable content for algo-2012-001
    Traceback (most recent call last):
    File "/usr/local/bin/coursera-dl", line 8, in
    load_entry_point('coursera-dl==1.2.1', 'console_scripts', 'coursera-dl')()
    File "/Library/Python/2.7/site-packages/courseradownloader/courseradownloader.py", line 472, in main
    d.download_course(cn,dest_dir=args.dest_dir)
    File "/Library/Python/2.7/site-packages/courseradownloader/courseradownloader.py", line 201, in download_course
    os.mkdir(course_dir)
    OSError: [Errno 2] No such file or directory: '/test/courses/algo-2012-001'

Log file of failed downloads

Now it's difficult to see which materials wasn't downloaded because of some error. It's better lo write all of the failed downloads to a file.

Script no longer works

Hello Dirk.
First of all many many thanks for your effort, that enabled me to follow various courses.
Sadly after the last modification done by Coursera to their site it's no more possible to download anything. In particular, I'm taking "Heterogeneous Parallel Programming" and "Think Again, how to reason and argue" and no more download is possible.
Can you fix someway your precious script?
TIA - Alberto

error with some courses

Thank you for this awesome tool!

I am getting an error with just some courses. This is from a Ubuntu 12.04 LTS OS. Below is the error message:

...

  • Authenticating as site_user...
  • Already logged in
  • Collecting downloadable content from http://class.coursera.org/dataanalysis-001/lecture/index
  • Got all downloadable content for dataanalysis-001
  • dataanalysis-001 will be downloaded to /home/local_user/my/coursera/courses/dataanalysis-001
    • Downloading lecture/syllabus pages
  • Authenticating as site_user...
    Traceback (most recent call last):
    File "/usr/local/bin/coursera-dl", line 9, in
    load_entry_point('coursera-dl==1.1.11', 'console_scripts', 'coursera-dl')()
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 452, in main
    d.download_course(cn,dest_dir=args.dest_dir)
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 183, in download_course
    self.login(cname)
    File "/usr/local/lib/python2.7/dist-packages/courseradownloader/courseradownloader.py", line 42, in login
    page = self.browser.open(self.LOGIN_URL % course_name)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 204, in open
    response = meth(req, response)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 457, in http_response
    'http', request, response, code, msg, hdrs)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 221, in error
    result = apply(self._call_chain, args)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
    result = func(_args)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 571, in http_error_302
    return self.parent.open(new)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 193, in open
    response = urlopen(self, req, data)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 344, in _open
    '_open', req)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
    result = func(_args)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1170, in https_open
    return self.do_open(conn_factory, req)
    File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1118, in do_open
    raise URLError(err)
    urllib2.URLError: <urlopen error [Errno -2] Name or service not known>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.