Code Monkey home page Code Monkey logo

ilc-scraper's Introduction

Impartus Scraper

This is a scraper for the Impartus Lecture Capture system deployed in BITS Pilani Hyderabad Campus. No more crying when Impartus servers crash during exam time because of overload. You can even access lectures from previous years in case your current prof sucks :P

Features

  • Awesome GUI! GUI
  • Download lectures without subscribing to them.
  • Download BOTH the camera views. (Saved as video tracks in output file)
  • Specify ranges to limit the lectures to be downloaded.
  • Fuzzy searching of course names.
  • Skips already downloaded lectures.
  • Multiprocessing support.

Installation

Recommended:

Go to the Releases page and download the latest version according to your OS. Just double click and run!

Manual method:

  1. Ensure you have Python 3.7 or higher installed, and available in your system PATH. To check, run python --version from command line/terminal.
  2. Install ffmpeg from here. Ensure it is in your PATH variable. To check, run ffmpeg -version from command line/terminal.
  3. Install poetry using pip install --user poetry. Restart your PC.
  4. Clone this repo if you know git, or click Download as ZIP and extract it to some location.
  5. Open terminal and cd to download/clone directory.
  6. Run poetry install --no-dev -E gui for the default installation. (If you don't want the GUI, which takes up a lot of extra space, omit the -E gui from the command.)

Configuration (Optional)

In case you plan to use the script multiple times, it'll be useful to not have to type the Impartus creds and download location everytime. You can store these for future use.

  1. Duplicate the sample_config.json file and rename it to imp_config.json.
  2. Enter your Impartus credentials (Username, Password) in the fields. Required to be able to access Impartus.
  3. Specify the Download Location for the lectures in the save_fold field. If you leave this blank, it will default to Impartus Lectures folder inside the download location of this script.

IMP: For Windows, you need to use double \\ in paths. Eg: C:\\Users\\Student\\Downloads\\Lectures or D:\\Impartus.

Running (manual method)

  • GUI: Run poetry run python ilc_scrape.py and enter the details as asked (See Help section). For the first time, you will need to specify the full URL of the course. For future uses, you can select the course names from the dropdown.
  • Command line: Pass argument flags to the script. Eg: poetry run python ilc_scrape.py -n "operating systems" -r 14:16,25 -f will force download lecture numbers 14, 15 and 25 of OS lectures. See python ilc_scrape.py -h to see all available options. If you have the GUI installed, but still wanna use the command line, add --ignore-gooey to the command at the end. This will disable the GUI.

How it works

  • The script simply exploits the fact that the Impartus API for retrieving the lecture info for a course does not check if the user is subscribed to it. Then its a simple task to loop over the lectures and download their video streams.

  • All the videos are available as HLS Streams, which are downloaded using ffmpeg.

  • The fuzzy search for matching the course names works using difflib module. For the first time when the course URL is specified, it is added to a local database, along with its full name. These full names are then compared against the given query and the most similar full name is chosen.

  • The GUI is made using Gooey, which makes it dead simple to convert CLI Python apps to full featured GUIs.

Help

Fixing URL unrecognized errors

  • For the course URL, you should specify a url of the course: http://172.16.3.20/ilc/#/course/12345/678. It will give an error if you write the url of a specific lecture.

  • If you are outside campus premises, you should replace 172.16.3.20 with either a.impartus.com or bitshyd.impartus.com, with everything else being the same.

  • The url will be saved, and you can use dropdown in the future.

Specifying Ranges

The range pattern follows the format of Python Slices.

12 (Only 12 will be downloaded),
1:4 (1 included, 4 excluded),
:10 (Download lecture numbers 1 to 9),
3: (Download all lectures from number 3 onwards). 

You can also specify multiple ranges using commas. 12, 4:6, 15:, :2 will download lectures 1, 4, 5, 12, 15, 16, 17, ... until the latest one.

Contributing

See CONTRIBUTING.md.

ilc-scraper's People

Contributors

iamkroot avatar krishvk avatar mvoggu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.