Code Monkey home page Code Monkey logo

unsupervisedvideosummarization's Introduction

Unsupervised Video Summarization via Multi-source Features

This is the official GitHub page for the paper:

Hussain Kanafani, Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth. 2021. Unsupervised Video Summarization via Multi-source Features. In Proceedings of the 2021 International Conference on MultimediaRetrieval (ICMR โ€™21), August 21โ€“24, 2021, Taipei, Taiwan. ACM, New York, NY, USA, https://doi.org/10.1145/3460426.3463597

The paper is available on:

Model architecture: Multi-Source Chunk and Stride Fusion (MCSF)

MCSF

Get started (Requirements and Setup)

python 3.6

cd MCSF
conda create -n mcsf python=3.6
conda activate mcsf  
pip install -r requirements.txt

Project Structure

Directory: 
- /data
        - /plc_365 (places features  for summe and tvsum)
        - /splits (original and non-overlapping splits)
        - /SumMe (processed dataset h5)
        - /TVSum (processed dataset h5)
- /csnet (implementation of csnet method)
- /mcsf-places365-early-fusion 
- /mcsf-places365-late-fusion 
- /mcsf-places365-intermediate-fusion
- /src/evaluation (evaluation using F1-score)
- /src/visualization 
- /sum-ind (implementation of SUM-Ind method)


Datasets

Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and [Wei-Lun Chao] and the h5 files were obtained from Kaiyang Zhou.

Download

wget https://zenodo.org/record/4884870/files/datasets.tar

Files Structure

The implemented models use the provided h5 files which have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset

Original videos and annotations for each dataset are also available in the authors' project webpages:

TVSum dataset: https://github.com/yalesong/tvsum

SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark


MCSF Variations and CSNet

We used SUM-GAN method as a starting point for the implementation.


How to train

Run main.py file with the configurations specified in configs.py to train the model. In config.py you find argument parameters for training:

Parameter type default
mode string possible values (train, test) train
verbose boolean true
video_type string (summe or tvsum) summe
input_size int 1024
hidden_size int 500
split_index int 0
n_epochs int 20
m int (number of divisions used for chunk and stride network) 4


For training the model using a single split, run:

python main.py --split_index N (with N being the index of the split)

How to evaluate

Using multiple human-generated summaries per video: To evaluate CSNET and all other MCSF models by comparing, after each training epoch, the generated summary for each test video against a set of reference human summaries that are available for that video (see the '/user_summary' entry in the explanation of the h5 file structure in the Data section above), run the 'src/evalution/evaluate.py' script after specifying which config file to use: 'config_summe.yaml' or 'config_tvsum.yaml'


SUM-Ind

Train and test codes are written in main.py. To see the detailed arguments, please do python main.py -h.


How to train

python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --verbose

How to test

python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results

Citation

@article{kanafani2021MCSF, 
   title={Unsupervised Video Summarization via Multi-source Features},
   author={Kanafani, Hussain and Ghauri, Junaid Ahmed and Hakimov, Sherzod and Ewerth, Ralph}, 
   Conference={ACM International Conference on Multimedia Retrieval (ICMR)}, 
   year={2021} 
}

unsupervisedvideosummarization's People

Contributors

hussainkanafani avatar sherzod-hakimov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

unsupervisedvideosummarization's Issues

help me with "motion_features" in config or pickle files (mcsf)

Hi, your mcsf model was impressive but when I tried to run your model, I encountered some errors.

When I tried to run train.py at mcsf-early-fusion, I noticed there's no 'motion_features' in config.py or any pickle files (including intermediate, late fusion).
(If I remove motion_features params, the num of param arouse another error. so I replaced with temporary values but also arouse another error)

can you give me a hand? how can i run mcsf model? or make pickle files of motion_features?

Do you use predicted clip scores and ground-truth change points(segments) to generate video summary?

Wondering if you use ground-truth change points(segments) to generate video summary. So ur model just predicts scores of each extracted frame without segment splitting. This seems a little strange... Could you please explain? Correct me if I am wrong.

            cps = dataset[key]['change_points'][...]
            num_frames = dataset[key]['n_frames'][()]
            nfps = dataset[key]['n_frame_per_seg'][...].tolist()
            positions = dataset[key]['picks'][...]
            user_summary = dataset[key]['user_summary'][...]

            machine_summary = vsum_tools.generate_summary(probs, cps, num_frames, nfps, positions)

Problem running TVSum for the late-fusion version

Hello and thank you for making your code open-source,

I am trying to run your code and especially the version with the late-fusion. For SumMe is all good and running w/o any problem. However, for the TVSum dataset, I get an error because of get_difference_attention won't return a proper value for the motion_attention attribute. After some digging I found that this folder doesn't contain 4 pickle files (like the others), but 3 and the missing one is the tvsum_places365_attention.pickle.

Can you upload this file? Or can I use the one of the other folders?

Thanks in advance

How do I generate video summary

Great job,

I finished setting up the project, trained my model and performed test.

My question now is how can I generate video summary (which file should I execute and what parameters should pass)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.