The muse-wild_2020 from gphuang

Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism

In this repo, we present our solutions to the MuSe-Wild sub-challenge in MuSe 2020-The Multimodal Sentiment in Real-life Media Challenge.

Requirements

Python 3.7
PyTorch 1.4.0
Pandas
Matplotlib
NumPy
Pickle

Model

We utilize the Long Short Term Memory (LSTM) recurrent neural network as well as the self-attention mechanism (denoted by the dotted lines) for continuous dimensional emotion recognition.

Fusion

We adopt both early fusion and late fusion for multi-modal emotion recognition in this challenge. For early fusion, we simply concatenate multiple uni-modal features and feed them into the model. For late fusion, we employ a second-level LSTM model to fuse the predictions from several uni-modal features.

Usage

Change the dataset path PATH_TO_MUSE_2020 in config.py to yours.
For training the uni-modal and early fusion model, run command like this

python main.py --emo_dim_set [arousal or valence] --feature_set [directory of your feature set] ...

e.g.,

python main.py --emo_dim_set valence --feature_set egemaps fasttext --d_rnn 64 --rnn_n_layers 1 --rnn_bi --attn --n_layers 1 --n_heads 8 --epochs 100 --batch_size 1024 --lr 0.005 --seed 43 --n_seeds 1 --min_lr 1e-5 --rnn_dr 0.0 --attn_dr 0.0 --out_dr 0.0 --win_len 200 --hop_len 100 --add_seg_id --log --gpu 6

The above options can be found in main.py. By default, the feature_set is the directory (can be more than one for multi-modal setting) under [path to your dataset]/c1_muse_wild/feature_segments/label_aligned.

For training the late fusion model, run command like this

python main_fusion.py --emo_dim_set [arousal or valence] --base_dir [your fusion folder] ...

e.g.,

python main_fusion.py --emo_dim_set arousal --base_dir ./fusion/test--d_model 32 --rnn_bi --n_layers 1 --epochs 15 --batch_size 64 --lr 0.001 --seed 42 --n_seeds 3 --loss ccc --min_lr 1e-5 --gpu 3 --log

The above options can be found in main_fusion.py. Note that it's needed to put the multiple uni-modal predictions (generated by uni-modal or early fusion model) into the source sub-folder in base_dir.

Results

In the sub-challenge, Concordance Correlation Coefficient (CCC) is chosen as the evaluation metric. The best submission results on validation set and test set are as follows.

Emotion	Partition	Baseline [1]	Ours
Arousal	Val	0.3978	0.5616
Valence	Val	0.1506	0.4876
Arousal	Test	0.2834	0.4726
Valence	Test	0.2431	0.5996

References

[1] Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Bj ̈orn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris: “The 2020 Multimodal Sentiment Analysis in Real-life Media Workshop and Challenge: Emotional Car Reviews in-the-wild”, Proceedings of ACM-MM 2020, Seattle, United States, 2020.

gphuang / muse-wild_2020 Goto Github PK

muse-wild_2020's Introduction

Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism

Requirements

Model

Fusion

Usage

Results

References

muse-wild_2020's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent