Code Monkey home page Code Monkey logo

hhousen / lecture2notes Goto Github PK

View Code? Open in Web Editor NEW
35.0 4.0 12.0 10.57 MB

Convert lecture videos to notes using AI & machine learning. Code for the research titled "Lecture2Notes: Summarizing Lecture Videos by Classifying Slides and Analyzing Text using Machine Learning."

Home Page: https://lecture2notes.com/

License: GNU Affero General Public License v3.0

Python 10.43% Jupyter Notebook 89.57%
machine-learning education research artifical-intelligence classifier deep-learning

lecture2notes's Introduction

Lecture2Notes

Convert lecture videos to notes using AI & machine learning.

GitHub license Github commits made-with-python Documentation Status GitHub issues GitHub pull-requests DeepSource

Website / Research Paper / Documentation / Project Page

Lecture2Notes is a project that summarizes lectures videos. At a high level, it parses both the visual and auditory components of the video, extracts text from each, combines them, and then summarizes the combined text using automatic summarization algorithms. These pages document the code for the entirety of "Lecture2Notes: Summarizing Lecture Videos by Classifying Slides and Analyzing Text using Machine Learning."

Check out the documentation for usage details. To get started summarizing text, visit the tutorial.

Visit lecture2notes.com to convert your own lectures to notes!

Abstract

Note-taking is a universal activity among students because of its benefits to the learning process. This research focuses on end-to-end generation of formatted summaries of lecture videos. Our automated multimodal approach will decrease the time required to create notes, increase quiz scores and content knowledge, and enable faster learning through enhanced previewing. The project is broken into three main components: the slide classifier, summarization models, and end-to-end-process. The system beings by extracting important keyframes using the slide classifier, a deep CNN. Then, unique slides are determined using a combination of clustering and keypoint matching. The structure of these unique slides is analyzed and converted to a formatted transcript that includes figures present on the slides. The audio is transcribed using one of several methods. We approach the process of combining and summarizing these transcripts in several ways including as keyword-based sentence extraction and temporal audio-slide-transcript association problems. For the summarization stage, we created TransformerSum, a summarization training and inference library that advances the state-of-the-art in long and resource-limited summarization, but other state-of-the-art models, such as BART or PEGASUS, can be used as well. Extractive and abstractive approaches are used in conjunction to summarize the long-form content extracted from the lectures. While the end-to-end process and each individual component yield promising results, key areas of weakness include the speech-to-text algorithm failing to identify certain words and some summarization methods producing sub-par summaries. These areas provide opportunities for further research.

Details

The project is broken into four main components: the slide classifier (including the dataset), the summarization models (neural, non-neural, extractive, and abstractive), the end-to-end-process (one command to convert to notes), and finally the website that enables users to process their own videos.

Process:

  1. Extract frames from video file

  2. Classify extracted frames to find frames containing slides

  3. Perspective crop images containing the presenter and slide to contain only the slide by matching temporal features

  4. Cluster slides to group transitions and remove duplicates

  5. Run a Slide Structure Analysis (SSA) using OCR on the slide frames to obtain a formatted transcript of the text on the slides

  6. Detect and extract figures from the set of unique slide frames

  7. Transcribe the lecture using a speech-to-text algorithm

  8. Summarize the visual and auditory transcripts

    1. Combine
    2. Run some modifications (such as only using complete sentences)
    3. Extractive summarization
    4. Abstractive summarization
  9. Convert intermediate outputs to a final notes file (HTML, TXT, markdown, etc.)

The summarization steps can be toggled off and on (see Combination and Summarization).

Meta

ForTheBadge built-with-love

Hayden Housen โ€“ haydenhousen.com

Distributed under the GNU Affero General Public License v3.0 (AGPL). See the LICENSE for more information.

https://github.com/HHousen

Contributing

  1. Fork it (https://github.com/HHousen/lecture2notes/fork)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request

lecture2notes's People

Contributors

deepsource-autofix[bot] avatar hhousen avatar imgbotapp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lecture2notes's Issues

How to solve this problem

File "main.py", line 6, in
from ..models.slide_classifier.custom_nnmodules import * # noqa: F403,F401
ImportError: attempted relative import with no known parent package

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.