Code Monkey home page Code Monkey logo

Mind-Animator

The human brain's comprehension of dynamic visual scenes

When receiving dynamic visual information, human brain gradually comprehends low-level structural details such as position, shape and color in the primary visual cortex, discerns motion information, and ultimately constructs high-level semantic information in the higher visual cortex, such as an overall description of the scene.

The overall architecture of Mind-Animator

The overall architecture of Mind-Animator, a two-stage video reconstruction model based on fMRI. Three decoders are trained during the fMRI-to-feature stage to disentangle semantic, structural, and motion feature from fMRI, respectively. In the feature-to-video stage, the decoded information is input into an inflated Text-to-Image (T2I) model for video reconstruction.

Experiments

TODOList

  • Reorganize all the code.
  • Test the data preprocessing code for CC2017 dataset.
  • Upload the data preprocessing code for CC2017 dataset.
  • Upload the code of evaluation metrics.
  • Test the project code for Mind-Animator.
  • Upload the project code for Mind-Animator.
  • Release the checkpoints of Mind-Animator.

Preliminaries

This code was developed and tested with:

  • Python version 3.9.16
  • PyTorch version 1.12.1
  • A100 80G
  • The conda environment defined in environment.yml

First, set up the conda enviroment as follows:

conda env create -f environment.yml  # create conda env
conda activate Mind-Animator          # activate conda env
cd path/to/project/Mind-Animator

Dataset downloading and preparation

Dataset downloading

CC2017 dataset https://purr.purdue.edu/publications/2809/1
HCP dataset https://www.humanconnectome.org/
Algonauts2021 dataset http://algonauts.csail.mit.edu/2021/index.html

Data preparation

CC2017 dataset

  • After downloading the CC2017 dataset, you will obtain a folder organized as shown in the figure. In this project, we utilize the fMRI data located within the cifti folder.
  • Segment the video into 2-second clips and downsample the frame rate of each segment to 4 Hz. You are required to initially create the following four directories for the respective storage purposes: one for the training set video segments (Train_video_path), another for the training set video frames (Train_frames_path), a third for the testing set video segments (Test_video_path), and a fourth for the testing set video frames (Test_frames_path).
    cd path/to/project/Mind-Animator/Data preparation
    python cut_video.py --source_file_root "path/to/your/root/stimuli" --target_file_root_Train "path/to/Train_video_path" --target_file_root_Test "path/to/Test_video_path"
    python get_video_frames.py --Train_videopath_root "path/to/Train_video_path" --Train_target_root "path/to/Train_frames_path" --Test_videopath_root "path/to/Test_video_path" --Test_target_root "path/to/Test_frames_path"
conda activate BLIP2
python video_captioning.py --Train_video_path_root "path/to/Train_frames_path" --Train_captions_save_path "path/to/Train_frames_path" --Test_video_path_root "path/to/Test_frames_path" --Test_captions_save_path "path/to/Test_frames_path"
conda deactivate BLIP2
  • Extract the voxels within the fMRI data that are indicative of activation in the visual cortex. You are required to create directories in advance for each subject to store the following pre-processed data: fMRI data for individual trails of the training set (Train_fMRI_singletrail), averaged fMRI data across multiple trails of the training set (Train_fMRI_multitrail_average), averaged fMRI data across multiple trails of the test set (Test_fMRI_multitrail_average), and mask files (mask_save_root).
python fMRI_preparation_FSLR.py --fMRI_volumes_root "path/to/CC2017_Purdue" --raw_train_data_root "path/to/Train_fMRI_singletrail" --averaged_train_data_root "path/to/Train_fMRI_multitrail_average" --averaged_test_data_root "path/to/Test_fMRI_multitrail_average" --mask_save_root "path/to/mask_save_root"

Feature Extraction

Semantic Feature Extraction

  • Before extracting the text conditions, it is necessary to establish several directories in advance to store the following data: the text condition for the training set (Train_condition) and the text condition for the test set (Test_condition).
cd path/to/project/Mind-Animator/Feature extraction
python Text_condition.py --Train_captions_save_path "path/to/Train_frames_path" --Train_text_condition_save_path "path/to/Train_condition" --Test_captions_save_path "path/to/Test_frames_path" --Test_text_condition_save_path "path/to/Test_condition"
  • Before extracting the CLIP features, it is necessary to first create the following directories to store the extracted features: image features of the training set (Train_img_CLIP_512), text features of the training set (Train_text_CLIP_512), image features of the test set (Test_img_CLIP_512), and text features of the test set (Test_text_CLIP_512).
python CLIP_feature.py --Train_captions_save_path "path/to/Train_frames_path" --Train_text_CLIPfeature_save_path "path/to/Train_text_CLIP_512" --Train_frames_save_path "path/to/Train_frames_path" --Train_img_CLIPfeature_save_path "path/to/Train_img_CLIP_512" --Test_captions_save_path "path/to/Test_frames_path" --Test_text_CLIPfeature_save_path "path/to/Test_text_CLIP_512" --Test_frames_save_path "path/to/Test_frames_path" --Test_img_CLIPfeature_save_path "path/to/Test_img_CLIP_512" 

Structure and Motion Feature Extraction

Run the following code to extract the structural information and consistent motion information from the video using a pre-trained VQVAE.

python Structure_feature.py --target_file_root "path/to/{}_video_path"

Feature Decoding

Video Reconstruction

Run the following code snippet to generate reconstruction results for each subject's test dataset. Please adjust the subj_ID variable accordingly (valid options include: 1, 2, 3). To ensure that the reconstruction results for each subject are stored in the correct location, please modify the output_dir1 and output_dir2 variables within the reconstruction.py file accordingly.

cd path/to/project/Mind-Animator/Video Reconstruction
python reconstruction.py --subj_ID 1 --fMRI_data_path "path/to/Test_fMRI_multitrail_average" --Semantic_model_dir "path/to/Semantic_decoder" --Structure_model_dir "path/to/Structure_decoder" --CMG_model_dir "path/to/CMG_decoder" --results_save_root "path/to/results_save_root"

Calculate the Evaluation Metrics

To calculate the following evaluation metrics, you will first need to download the following models: ViT-Base Patch16 224 (https://huggingface.co/google/vit-base-patch16-224), VideoMAE (https://huggingface.co/MCG-NJU/videomae-base), and ViFi-CLIP (https://github.com/muzairkhattak/ViFi-CLIP).

Semantic-level Metrics

Run the following code to calculate the 2-way-I and 2-way-V metrics.

cd path/to/project/Mind-Animator/Evaluation Metrics
python 2-way-I.py --recons_results_root "path/to/results_save_root"
python 2-way-V.py --recons_results_root "path/to/results_save_root"

Before calculating VIFI_score, it is necessary to first convert the reconstruction results from the .gif format to the .avi format. Please ensure to create folders in advance to store the .avi format ground truth (gt_avi) , reconstruction results (recons_avi) , and VIFI_score results (VIFI_score_path).

cd path/to/project/Mind-Animator/Evaluation Metrics/ViFi_CLIP
python gif_to_avi.py --recons_results_root "path/to/results_save_root" --avi_gt_results_root "path/to/gt_avi" --avi_recons_results_root "path/to/recons_avi"
python VIFI_score.py --avi_gt_results_root "path/to/gt_avi" --avi_recons_results_root "path/to/recons_avi" --VIFICLIP_score_save_root "path/to/VIFI_score_path"

Pixel-level Metrics

Run the following code to calculate Pixel-level Metrics.

cd path/to/project/Mind-Animator/Evaluation Metrics
python Pixel_level_metrics.py --recons_results_root "path/to/results_save_root"

Spatiotemporal-level Metric

Run the following code to calculate the Spatiotemporal-level Metric.

cd path/to/project/Mind-Animator/Evaluation Metrics
python ST_level_metric.py --recons_results_root "path/to/results_save_root" --VIFICLIP_score_save_root "path/to/VIFI_score_path"

mind-animator's Projects

dnnlib icon dnnlib

dnnlib extracted from official StyleGAN implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.