Code Monkey home page Code Monkey logo

modality-transferable-mer's Introduction

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

CC BY 4.0

Paper accepted at the AACL-IJCNLP 2020:

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition, by Wenliang Dai, Zihan Liu, Tiezheng Yu, Pascale Fung.

[ACL Anthology][ArXiv][Semantic Scholar]

If your work is inspired by our paper, or you use any code snippets in this repo, please cite this paper, the BibTex is shown below:

@inproceedings{dai-etal-2020-modality,
    title = "Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition",
    author = "Dai, Wenliang  and
      Liu, Zihan  and
      Yu, Tiezheng  and
      Fung, Pascale",
    booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
    month = dec,
    year = "2020",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.aacl-main.30",
    pages = "269--280",
    abstract = "Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. Besides, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.",
}

Abstract

Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embeddings into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves state-of-the-art performance on most of the emotion categories. In addition, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.

Dataset

We use the pre-processed features from the CMU-Multimodal SDK.

Or you can directly download the data from here.

Preparation for running

  1. Create a new folder named data at the root of this project

  2. Download Emotion Embeddings from here, and then put it in the $data$ folder.

  3. Download data

    • For a quick run
      • Just download our saved torch.utils.data.dataset.Dataset datasets from here, unzip it at the root of this project.
    • For a normal run
      • Download the data from here
      • Check the data_folder_structure.txt file, which shows the structure about how to organize data files
      • Put data files correspondingly
  4. Good to go!

Command line arguments and examples

usage: main.py [-h] -bs BATCH_SIZE -lr LEARNING_RATE [-wd WEIGHT_DECAY] -ep
               EPOCHS [-es EARLY_STOP] [-cu CUDA] [-mo MODEL] [-fu FUSION]
               [-cl CLIP] [-sc] [-se SEED] [-pa PATIENCE] [-ez] [--loss LOSS]
               [--optim OPTIM] [--threshold THRESHOLD] [--verbose]
               [-mod MODALITIES] [--valid] [--test] [--dataset DATASET]
               [--aligned] [--data-seq-len DATA_SEQ_LEN]
               [--data-folder DATA_FOLDER] [--glove-emo-path GLOVE_EMO_PATH]
               [--cap] [--iemocap4] [--iemocap9] [--zsl ZSL]
               [--zsl-test ZSL_TEST] [--fsl FSL] [--ckpt CKPT] [-dr DROPOUT]
               [-nl NUM_LAYERS] [-hs HIDDEN_SIZE]
               [-hss HIDDEN_SIZES [HIDDEN_SIZES ...]] [-bi] [--gru]
               [--hidden-dim HIDDEN_DIM]

Multimodal Emotion Recognition

optional arguments:
  -h, --help            show this help message and exit
  -bs BATCH_SIZE, --batch-size BATCH_SIZE
                        Batch size
  -lr LEARNING_RATE, --learning-rate LEARNING_RATE
                        Learning rate
  -wd WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                        Weight decay
  -ep EPOCHS, --epochs EPOCHS
                        Number of epochs
  -es EARLY_STOP, --early-stop EARLY_STOP
                        Early stop
  -cu CUDA, --cuda CUDA
                        Cude device number
  -mo MODEL, --model MODEL
                        Model type: mult/rnn/transformer/eea
  -fu FUSION, --fusion FUSION
                        Modality fusion type: ef/lf
  -cl CLIP, --clip CLIP
                        Use clip to gradients
  -sc, --scheduler      Use scheduler to optimizer
  -se SEED, --seed SEED
                        Random seed
  -pa PATIENCE, --patience PATIENCE
                        Patience of the scheduler
  -ez, --exclude-zero   Exclude zero in evaluation
  --loss LOSS           loss function: l1/mse/ce/bce
  --optim OPTIM         optimizer function: adam/sgd
  --threshold THRESHOLD
                        Threshold of for multi-label emotion recognition
  --verbose             Verbose mode to print more logs
  -mod MODALITIES, --modalities MODALITIES
                        What modalities to use
  --valid               Valid mode
  --test                Test mode
  --dataset DATASET     Dataset to use
  --aligned             Aligned experiment or not
  --data-seq-len DATA_SEQ_LEN
                        Data sequence length
  --data-folder DATA_FOLDER
                        path for storing the dataset
  --glove-emo-path GLOVE_EMO_PATH
  --cap                 Capitalize the first letter of emotion words
  --iemocap4            Only use 4 emtions in IEMOCAP
  --iemocap9            Only use 9 emtions in IEMOCAP
  --zsl ZSL             Do zero shot learning on which emotion (index)
  --zsl-test ZSL_TEST   Notify which emotion was zsl before
  --fsl FSL             Do few shot learning on which emotion (index)
  --ckpt CKPT
  -dr DROPOUT, --dropout DROPOUT
                        dropout
  -nl NUM_LAYERS, --num-layers NUM_LAYERS
                        num of layers of LSTM
  -hs HIDDEN_SIZE, --hidden-size HIDDEN_SIZE
                        hidden vector size of LSTM
  -hss HIDDEN_SIZES [HIDDEN_SIZES ...], --hidden-sizes HIDDEN_SIZES [HIDDEN_SIZES ...]
                        hidden vector size of LSTM
  -bi, --bidirectional  Use Bi-LSTM
  --gru                 Use GRU rather than LSTM
  --hidden-dim HIDDEN_DIM
                        Transformers hidden unit size

Run the code

main.py is the entry file of the whole project, use corresponding CLIs for different purposes.

Training

Training the model on the CMU-MOSEI dataset

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=eea -bi --hidden-sizes 300 200 100 --num-layers=2 --dropout=0.15 --data-folder=./data/cmu-mosei/ --data-seq-len=20 --dataset=mosei_emo --aligned --loss=bce --clip=1.0 --early-stop=8 -mod=tav --patience=5   

Training the model on the IEMOCAP dataset

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=eea --data-folder=./data/iemocap/ --data-seq-len=50 --dataset=iemocap --loss=bce --clip=1.0 --early-stop=8 --hidden-sizes 300 200 100 -mod=tav --patience=5 --aligned -bi --num-layers=2 --dropout=0.15

Training a early fusion lstm baseline

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=rnn --fusion=ef --data-folder=./data/iemocap/ --data-seq-len=50 --dataset=iemocap --loss=bce --clip=1.0 --early-stop=8 --hidden-sizes 300 200 100 -mod=tav --patience=5 --aligned -bi --num-layers=2 --dropout=0.15

Validating and testing

If you only want to do a validation or testing on a trained model, you can add a --valid or --test flag to the original command, and also include --ckpt=[PathToSavedCheckpoint] to indicate the path of the trained model.

Zero-shot learning (ZSL)

Add a --zsl=[EmotionIndex] cli to the original training command, in which the EmotionIndex is the index of the emotion category that you want to do zero-shot on. As mentioned in the paper, due to different strategies for CMU-MOSEI and IEMOCAP datasets, --zsl=[EmotionIndex] has slightly different meaning for them, we list the correct cli here:

For CMU-MOSEI (ZSL emotion data will be removed from the training data),

  • --zsl=0, do ZSL on anger
  • --zsl=1, do ZSL on disgust
  • --zsl=2, do ZSL on fear
  • --zsl=3, do ZSL on happy
  • --zsl=4, do ZSL on sad
  • --zsl=5, do ZSL on surprise

For IEMOCAP (the training data remains unchanged, as ZSL emotion is from extra low-resource data),

  • --zsl=1, do ZSL on excited
  • --zsl=4, do ZSL on surprised
  • --zsl=5, do ZSL on frustrated

Few-shot learning (FSL)

For few-shot learning, the logic is similar to ZSL, just use --fsl=[EmotionIndex]

Requirements

  1. Python 3.6 +
  2. PyTorch 1.4 +
  3. Nvidia GTX 1080Ti GPU (or more advanced)

modality-transferable-mer's People

Contributors

wenliangdai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

modality-transferable-mer's Issues

数据集

您好,请问本文使用的数据集还能直接获取吗,之前的链接失效了,谢谢!

Use of Custom Dataset

Hi Sir, can I use this model to predict emotion on custom dataset ? If yes what changes I need to do ? Is there any preprocessing I have to do on that dataset ? @wenliangdai

Used FACET features already contain information on the model's target emotions

Hi @wenliangdai , thanks for open-sourcing this paper's code.

So, the MER model uses 35 features on the video input stream. You write in the paper that the dataset is coming from the CMU SDK. In the case of the visual input stream, they offer a variety of options from FACET 4.1 and FACET 4.2 to OpenSmile.

Now, in the paper you mention that you're using the facet features, and since the feature shape is 35, I am assuming you're using FACET 4.2 from the SDK (FACET 4.1 has a feature set with a different dimensionality).

Unfortunately, part of the feature set of this FACET is information on the predicted emotions itself, namely: Anger, Contempt, Disgust, Joy, Fear, Sadness, Surprise, Confusion, and Frustration.

This would mean that this model was having information on the predicted classes in the training set. Am I missing sth here? Perhaps you are using other features after all?

RuntimeError: multi-target not supported for CrossEntropyLoss

After I checked your code, I think this error may because your command written in README is wrong.
Since CMU-MOSEI is a multi-label classification task, we need to use binary cross entropy loss, so the command for CMU-MOSEI should use bce loss instead of ce. And the command for IEMOCAP also has this problem.

BTW, why you also use binary cross entropy for IEMOCAP instead of normal cross entropy ?
=====Below is old message=====
Hello,
I'm trying to run your model on the CMU-MOSEI dataset. And I got the multi-target error.
It seems the emotion label of processed dataset is not one-hot but multi-target.
Do you have any idea to fix this problem ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.