Code Monkey home page Code Monkey logo

lecture-hoarder's People

Contributors

csnewman avatar dependabot[bot] avatar ed-cooper avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

csnewman

lecture-hoarder's Issues

Check file access permissions

Currently we assume we can read from folders, create new files, etc.

If this is not the case, an exception occurs with a traceback displayed to the user:

Traceback (most recent call last):
  File "run.py", line 199, in <module>
    os.makedirs(course_dir, exist_ok=True)
  File "/usr/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/media/edward/Mass Storage'

Although the problem is clearly identified in the error, we should aim to make the message more user friendly through proper checks

Check for duplicate but out of order podcasts

Occasionally, lecturers may add podcasts that came before ones already available, causing the order of podcasts to change.

Additionally, podcasts may be deleted.

Currently, detection of duplicates requires an exact name match, meaning that in the above situations we download all the subsequent podcasts again, causing multiple podcasts with the same number to appear.

Add use at own risk warning

This project is using an unstable interface with their servers, which hasn't been formally approved.

I therefore feel there should be a disclaimer in the Readme and displayed each time the project is ran.

Filter podcast names

Currently, we filter the names of courses to remove illegal characters - e.g. COMP10120 - First Year Team Project 2018/19 becomes COMP10120 - First Year Team Project 201819

The same also needs to happen for podcast names (which come from podcast_li.a.string)

Validate every usage of BeautifulSoup in UomPodcastProvider

Every time the .find method or equivalent is used, we should validate that html HTML item was actually found and raise a PodcastProviderError otherwise.

Currently these errors are mostly not handled and will result in confusing random exceptions.

Add course filtering

Allow users to choose which modules/courses they want to download.

Probably should be in the config, but maybe also a CLI argument to override it.

Recommend setup by venv

Packages have became outdated over time.

Update README to recommend venvs for setup, so that older packages can be isolated from the main python install.

Abstract web requests

All web request logic is currently handled by __main__.py

This clutters the file, making it hard to understand and maintain

A new interface should be created for handling web requests

In addition, it should be generic, so that dependency injection can be used to assist #3

Change get_podcast_downloader return type

This return type is the only thing preventing the entire PodcastProvider interface being entirely independent of the web.

The return type will need to support asynchronous downloading and contain the total download size (as given by int(http_download_response.headers['Content-Length'])).

This will probably require the creation of a new class for the return type.

Download automatic subtitles

Podcasts now have subtitle files with automatic captions available for download.

Supporting this would be useful.

Only download podcasts from the current year

Currently we download all available podcasts, but typically users only want podcasts from the current year.

We now extract the course series and use it for categorisation (see #19 ) which can be used to bootstrap the implementation for this,

This should be supported by a setting to allow all podcasts to be downloaded.

Abstract into model

Currently we have an undocumented dictionary format for podcast downloads, containing the following properties:

  • name
  • podcast_link
  • download_path
  • status
  • error
  • progress
  • total_size
  • completion_time

For future development, we should develop a dedicated class for podcasts, as well as making status an enum type.

We should also think about breaking up functionality - ideally run.py should only care about general data flow through the program, rather than implementation details such as output formatting, extraction of page data, etc.

Runtime login

It is quite a risk having raw passwords on disk. On each run the program should ask you for your username and password.

Deprecate login_service_url and video_service_base_url settings

Now web handling has been abstracted, putting specific properties to the UomPodcastProvider into the Profile doesn't seem reasonable.

Additionally, the initial reason for them to be in the settings file (see #3) is no longer the case.

Instead, they should be moved to attributes in the UomPodcastProvider class, where they can still be changed by an overriding class, if necessary.

YAML Config

Python is not an appropriate config format, instead YAML (or others) should be used.

The config file should also be automatically generated. It is also advisable that the config is placed inside the users home directory and has the read permission restricted to only that user.

Categorise lectures into years

The year for each lecture is given by the first numeric character in the course name

Having a migration handler would also be useful

Clipping for long podcast names

Long podcast names cause a single download to spread over multiple lines.

This leads to corruption when it comes to trying to overwrite the download status.

We should use the known terminal width to clip podcast names to a suitable length, and add an ellipsis to show that some text is hidden.

Abstract file storage

Similar concept to the abstraction of web requests (#21)

Allows the program to be tested without side effects - potentially useful for a dry run option

Packaging

Releases should have a .deb file produced, that will install the program into the /bin or /usr/bin location.

Add proper command line option support

Initially, we should aim for the settings file to be specified with -s or --settings-file

Future options could include displaying the license, a dry run, manual override for the settings file

Make settings file optional

The codebase now contains sensible default values for all settings.

The program should be able to run without any settings file.

Errors sometimes not being reported correctly

When testing error reporting, I found that simulating an error occurring often lead to unexpected results.

Example code: (line 104)

    # Check status code valid
    if True:  # get_video_service_podcast_page.status_code != 200:
        podcast["completion_time"] = time.time()
        podcast["error"] = "Could not get podcast webpage for " + podcast["name"] + \
                           " - Service responded with status code" + get_video_service_podcast_page.status_code
        podcast["status"] = "error"
        return

All the real errors I have experienced so far have resulted in exceptions occurring, so I'm not too concerned about fixing this immediately.

In addition, any errors that occur can almost always be remedied by running the program again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.