vqassessment / explainablevqa Goto Github PK

[ACMMM Oral, 2023] "Towards Explainable In-the-wild Video Quality Assessment: A Database and a Language-Prompted Approach"

License: MIT License

Python 100.00%

blind-video-quality-assessment explainable-ai video-quality-assessment aesthetic-quality technical-quality endeavor-project maxwell

explainablevqa's Introduction

Towards Explainable Video Quality Assessment

New! Use DOVER++ with the merged DIVIDE-MaxWell dataset!

Official Repository for ACMMM 2023 Paper: "Towards Explainable in-the-wild Video Quality Assessment: a Database and a Language-prompt Approach." Paper Link: Arxiv

Dataset Link: Hugging Face.

Welcome to visit Sibling Repositories from our team:

FAST-VQA ｜ DOVER ｜ Zero-shot BVQI

The database (Maxwell, training part) has been released.

The code, demo and pre-trained weights of MaxVQA are released in this repo.

Installation

Install and modify OpenCLIP:

git clone https://github.com/mlfoundations/open_clip.git
cd open_clip
sed -i '92s/return x\[0\]/return x/' src/open_clip/modified_resnet.py 
pip install -e .

Install DOVER for Pre-processing and FAST-VQA weights:

git clone https://github.com/vqassessment/DOVER.git
cd DOVER
pip install -e .
mkdir pretrained_weights 
cd pretrained_weights 
wget https://github.com/VQAssessment/DOVER/releases/download/v0.1.0/DOVER.pth

MaxVQA

Gradio Demo

demo_maxvqa.py

You can maintain a custom service for multi-dimensional VQA.

Inference from Videos

infer_from_videos.py

Inference from Pre-extracted Features

infer_from_feats.py

For the first run, the script will extract features from videos.

Training on Mixed Existing VQA Databases

For the default setting, train on LIVE-VQC, KoNViD-1k, and YouTube-UGC.

train_multi_existing.py -o LKY.yml

You can also modify the yaml file to include more datasets for training.

Citation

Please feel free to cite our paper if you use this method or the MaxWell database (with explanation-level scores):

%explainable
@inproceedings{wu2023explainable,
      title={Towards Explainable Video Quality Assessment: A Database and a Language-Prompted Approach}, 
      author={Wu, Haoning and Zhang, Erli and Liao, Liang and Chen, Chaofeng and Hou, Jingwen and Wang, Annan and Sun, Wenxiu and Yan, Qiong and Lin, Weisi},
      year={2023},
      booktitle={ACM MM},
}

This dataset is built upon the original DIVIDE-3K dataset (with perspective scores) proposed by our ICCV2023 paper:

%dover and divide
@inproceedings{wu2023dover,
      title={Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives}, 
      author={Wu, Haoning and Zhang, Erli and Liao, Liang and Chen, Chaofeng and Hou, Jingwen and Wang, Annan and Sun, Wenxiu and Yan, Qiong and Lin, Weisi},
      year={2023},
      booktitle={ICCV},
}

explainablevqa's People

Contributors

Stargazers

Watchers

Forkers

teowu zhangerlicarl abutaufique leiweiqiang tom-bbc xiyanghu

explainablevqa's Issues

Question about the Maxwell_train and val dataset

Hi, I have a question about (Maxwell_train and val dataset) these 2 csv files, is the order of these 2 files corresponding to the sequence of the download video sequence? Thank you in advance:)

Maxwell database

Thank you very much for sharing the interesting work and I really enjoyed reading your paper!
When will the relevant data of the dataset Maxwell be released?

Hi, why are the scores in the txt file located under examplar_data_labels/MaxWell inconsistent with those in MaxWell_train/val.csv, and the latter does not provide a corresponding video ID

Same video twice with different results

I executed the same video twice, why are the results inconsistent?

About MaxWell_val.csv

Could you please provide the MaxWell_val.csv file? Thank you very much :)

Hi,
The idea of separate different type of feature is very delicate. I just pulled this project and during running of this demo 'python evaluate_one_video.py -v ./demo/17734.mp4 -f', there was an import error said 'File "/group/dphi_dmz/tianhaos/VQA/DOVER/dover/models/conv_backbone.py", line 7, in
from open_clip import CLIP3D
ImportError: cannot import name 'CLIP3D' from 'open_clip' (/group/dphi_dmz/tianhaos/VQA/open_clip/src/open_clip/init.py)'. Could you guys check if there is a name typo or open_clip version problem? THX

关于plcc_loss函数的问题

在下面这段代码里，我不太理解loss1是做什么的, 可以请您解释一下嘛？

def plcc_loss(y_pred, y):
    sigma_hat, m_hat = torch.std_mean(y_pred, unbiased=False)
    y_pred = (y_pred - m_hat) / (sigma_hat + 1e-8)
    sigma, m = torch.std_mean(y, unbiased=False)
    y = (y - m) / (sigma + 1e-8)
    loss0 = torch.nn.functional.mse_loss(y_pred, y) / 4
    rho = torch.mean(y_pred * y)
    loss1 = torch.nn.functional.mse_loss(rho * y_pred, y) / 4
    return ((loss0 + loss1) / 2).float()

ExplainableVQA/demo_maxvqa.py

Hi, contributor,
I recently read the article Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach. Please let me know. I tried a video of my own, and the score felt like it was between 0-100. The values in the paper then correspond to the paper's "Figure 4: Qualitative studies on different specific factors, with a good video (>0.6) and a bad video (<-0.6) in each dimension of Maxwell; [A-5] Trajectory, [ T-5]Flicker, and [T-8] Fluency are focusing on temporal variations and example videos for them are appended in supplementary package. Zoom in for details.", in my example what counts as good and what counts as bad Yes, looking forward to your reply

shape error

Hi, When I run the demo_maxvqa.py for a test, something is wrong with the shape:

Traceback (most recent call last):
File "E:/MaxVQA-master/demo_maxvqa.py", line 167, in
a = inference(video)
File "E:/MaxVQA-master/demo_maxvqa.py", line 160, in inference
vis_feats = visual_encoder(data["aesthetic"].to(device), data["technical"].to(device))
File "D:\tools\Anaconda\set\envs\python37tf\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "E:\MaxVQA-master\model\visual.py", line 19, in forward
clip_feats = clip_feats[1:].reshape(7,7,-1,1024).permute(3,2,0,1)
RuntimeError: shape '[7, 7, -1, 1024]' is invalid for input of size 64512

vis_feats = visual_encoder(data["aesthetic"].to(device), data["technical"].to(device))
data["aesthetic"]---[3, 64, 224, 224]
data["technical"]---[3, 128, 224, 224]

The specific problem is found in the following two lines of code
clip_feats = self.clip_visual(x_aes)
clip_feats = clip_feats[1:].reshape(7,7,-1,1024).permute(3,2,0,1)
However, the shape of clip_feats is [64, 1024]

Figure 7 in the paper - local quality maps

Hi,

Thank you very much for sharing the interesting work and I really enjoyed reading your paper!

Could you please elaborate on how you produced the local quality maps from the final features in Figure 7? In particular how is the map for each dimension (e.g. sharpness) generated from $f^{Final}_{V_t}$?

Thank you in advance for your help.