waldo-vision / models Goto Github PK

View Code? Open in Web Editor NEW

10.0 5.0 4.0 681 KB

Repository for model development and training

Home Page: https://waldo.vision

License: Mozilla Public License 2.0

Shell 1.79% Python 98.21%

artificial-intelligence machine-learning masked-autoencoder vision-transformer

models's People

Contributors

Stargazers

Watchers

Forkers

joe-thebro bensikrac tailen swastikom

models's Issues

Clip Segmentation and Trimming

Description:

Develop code that takes a gameplay video as input, segments it into smaller clips of no longer than 30 seconds, and trims irrelevant sections such as menus, intros, and outros. The code should be designed in a modular fashion to accommodate game-specific features, allowing it to work with various games.

Requirements:

Ability to input a gameplay video file in common formats (e.g., MP4, AVI, MOV, etc.).
Segment input video into smaller clips with a maximum length of 30 seconds each.
Detect and trim irrelevant sections of the video such as menus, intros, and outros.
Design code in a modular fashion to accommodate game-specific features.
Include a configuration file or similar mechanism to easily adapt the code for different games.
The output should be a set of video files, each containing a segmented and trimmed clip from the original video.
Provide clear documentation on how to use and adapt the code for various games.

Acceptance Criteria:

Successfully input a gameplay video file in at least one of the common video formats (e.g., MP4, AVI, MOV, etc.).
Automatically segment input video into smaller clips with a maximum length of 30 seconds.
Detect and trim irrelevant sections of the video such as menus, intros, and outros, with at least 80% accuracy.
Modular code design that allows for the easy addition or modification of game-specific features.
Configuration file or similar mechanism included, allowing users to adapt the code for different games without modifying the core code.
Output a set of video files, each containing a segmented and trimmed clip from the original video.
Clear documentation provided on how to use and adapt the code for various games.

Notes:

Since game-specific features may vary, it is suggested to create a basic solution first and then incrementally add support for different games as needed.
Consider using machine learning techniques, such as computer vision or deep learning, to detect and trim irrelevant sections with higher accuracy.
For better compatibility, consider using open-source libraries and tools for video processing, such as OpenCV, FFmpeg, or similar.

github action for pylint is broken

The current github action for running pylint is broken, mainly because setting up conda on a runner is proving to be difficult.
Right now it's not necessary that we setup the entire conda environment, so we need to revert to just installing deps with pip, and reverting the steps to setup conda.

Script for Getting Links from Database

Description:

We need code that downloads sets of gameplay video URLs submitted by users of waldo.vision and stored in our SQL database. The code should ensure the URLs are valid, have been reviewed by users 25 or more times, and have a 90% or higher positive rating. The code must also prevent downloading duplicate links. The analysis team will need to coordinate with the infrastructure team to identify the best way to download these URLs.

Requirements:

Coordinate with the infrastructure team to establish the best method for accessing the SQL database.
Query the SQL database to retrieve URLs of gameplay videos that meet the following criteria:

Have been reviewed by users 25 or more times
Have a 90% or higher positive rating

Validate the retrieved URLs to ensure they are valid gameplay video URLs.
Check for and avoid downloading duplicate links.
Handle download errors, such as network issues or invalid URLs, gracefully.
Provide clear documentation on how to use the code and specify the download directory and format.

Acceptance Criteria:

Successful coordination with the infrastructure team to establish the best method for accessing the SQL database.
Successfully query the SQL database to retrieve URLs of gameplay videos that meet the specified criteria (reviewed 25+ times and 90%+ positive rating).
Validate the retrieved URLs to ensure they are valid gameplay video URLs with at least 95% accuracy.
Check for and avoid downloading duplicate links.
Save downloaded links to a specified file.
Handle download errors, such as network issues or invalid URLs, gracefully, without crashing the program.
Clear documentation provided on how to use the code and specify the download file and format.

Notes:

Consider using an ORM (Object-Relational Mapper) library, such as SQLAlchemy, to interact with the SQL database in a more pythonic and maintainable way.

Add better progress tracking on link_retrieval.py

Currently, link retrieval gives an output link this:

Requesting page 18
Requesting page 19
Requesting page 20
Requesting page 21
Requesting page 22
Requesting page 23
Requesting page 24
Requesting page 25
Requesting page 26
Requesting page 27
Requesting page 28
Requesting page 29
Requesting page 30

A better solution to this since we know the total pages would be to create a progress bar using tqdm or a similar library.

This would allow the developer to know how long the download would take and would make the experience better.

Setup Automatic Code Linting

Setup a Github action to automatically perform python code linting with pylint.
Ensure existing codebase is compliant with pylint.
Update documentation to inform developers about code standards.

Code to convert a Clip into Cropped Frames

Description:

We need a script that takes a short video as input, processes it, and outputs a series of cropped images that are frames of the input video. The script should be easy to use and well-documented, so other team members can understand and extend it if necessary.

Requirements:

The script should accept a video file as input (in common formats like .mp4, .avi, .mov, etc.).
The user should be able to specify the cropping dimensions (width and height) and optionally the position (x and y coordinates) of the cropped area.
The script should convert the video into a series of cropped images that are frames of the input video, preserving the original frame rate.
The script should save the cropped images to a specified output directory.
The script should be able to handle videos of varying lengths and resolutions.
The script should be implemented using a popular and well-supported programming language (e.g., Python) and libraries (e.g., OpenCV).

Acceptance Criteria:

The script should successfully process a video file and output a series of cropped images as specified by the user.
The script should be well-documented and easy for other team members to understand.
The script should be tested with various video formats, resolutions, and lengths to ensure compatibility and robustness.
Additional Context
This script will be used as part of a larger pipeline for video processing and analysis. It is crucial that the script is efficient and reliable, as it may be used on large datasets with multiple videos.

Script for Downloading Submissions from URLs

Description:

We need to develop or implement existing code that takes an input list of YouTube video URLs and downloads the videos to a specified directory. This will be run on a Linux system.

Requirements:

Ability to input a list of YouTube video URLs (e.g., via a text file or command line arguments).
Validate input URLs to ensure they are valid YouTube video URLs.
Download each video in a specified format (e.g., MP4, WebM, etc.).
Save downloaded videos to a specified directory.
Provide progress updates during the download process (e.g., percentage completed, estimated time remaining, etc.).
Handle download errors, such as network issues or invalid URLs, gracefully.
Provide clear documentation on how to use the code and specify the download directory and format.

Acceptance Criteria:

Successfully input a list of YouTube video URLs (e.g., via a text file or command line arguments).
Validate input URLs to ensure they are valid YouTube video URLs with at least 95% accuracy.
Download each video in the specified format (e.g., MP4, WebM, etc.).
Save downloaded videos to the specified directory.
Provide progress updates during the download process, including percentage completed and estimated time remaining.
Handle download errors, such as network issues or invalid URLs, gracefully, without crashing the program.
Clear documentation provided on how to use the code and specify the download directory and format.

Notes:

For better compatibility and to comply with YouTube's terms of service, consider using open-source libraries and tools specifically designed for this purpose, such as youtube-dl, pytube, or similar.
This issue is partially blocked by #1 because we don't know yet the format in which links will be stored locally.

Pre-Train General FPS Model

Description:

Train the implemented video masked autoencoder model on a dataset of general gameplay clips from first-person shooter (FPS) games.

Requirements:

Split the dataset into training, validation, and testing sets.
Train the video masked autoencoder model on the training set and validate its performance on the validation set.
Evaluate the final model's performance using appropriate metrics (e.g., reconstruction error, SSIM, etc.) on the testing set.

Acceptance Criteria:

Prepare a dataset of general FPS gameplay clips from the Waldo Vision database.
Preprocess the dataset appropriately, including resizing, normalization, and data augmentation if necessary.
Split the dataset into training, validation, and testing sets.
Train the video masked autoencoder model on the training set, achieving satisfactory performance as indicated by appropriate metrics.
Monitor training progress and adjust hyperparameters as needed to optimize model performance.
Evaluate the final model's performance using appropriate metrics (e.g., reconstruction error, SSIM, etc.) on the testing set.
Provide clear instructions on how to reproduce the training process and any customization options.

Notes:

Monitor training progress and be prepared to adjust hyperparameters, such as learning rate, batch size, or other factors, to optimize model performance.

Blocked by #5

Implement VideoMAE2

Description:

Implement the VideoMAE2 model on a dataset of videogame gameplay clips.
See the following papers:

Code for VideoMAE2 was recently released

Requirements:

Test the video masked autoencoder model on a small test dataset.
Evaluate the model's performance using appropriate metrics (e.g., reconstruction error, SSIM, etc.).
Provide proper attribution.
Provide clear documentation on how to use the model, including training, evaluation, and any customization options.

Acceptance Criteria:

Successfully implement VideoMAE2 into our codebase.
Test the video masked autoencoder model on the prepared dataset, achieving satisfactory performance as indicated by appropriate metrics.
Ensure compliance with the license and provide proper attribution.
Clear documentation provided on how to use the model, including training, evaluation, and any customization options.

Notes:

Be sure to preprocess the dataset appropriately, including resizing, normalization, and data augmentation if necessary.

waldo-vision / models Goto Github PK

models's People

Contributors

Stargazers

Watchers

Forkers

models's Issues

Description:

Requirements:

Acceptance Criteria:

Notes:

Description:

Requirements:

Acceptance Criteria:

Notes:

Description:

Requirements:

Acceptance Criteria:

Description:

Requirements:

Acceptance Criteria:

Notes:

Description:

Requirements:

Acceptance Criteria:

Notes:

Description:

Requirements:

Acceptance Criteria:

Notes:

Recommend Projects

Recommend Topics

Recommend Org