declare-lab / meld Goto Github PK

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

License: GNU General Public License v3.0

Python 100.00%

emotion-recognition sentiment-analysis multimodal-sentiment-analysis multimodal-interactions dialogue-systems conversational-ai chatbot personality-traits personality-profiling emotion

meld's Introduction

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Note

🔥 If you are interested in IQ testing LLMs, check out our new work: AlgoPuzzleVQA

🔥 We have released the visual features extracted using Resnet - https://github.com/declare-lab/MM-Align

🔥 🔥 🔥 For updated baselines please visit this link: conv-emotion

🔥 🔥 🔥 For downloading the data use wget: wget http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz

Leaderboard

Updates

10/10/2020: New paper and SOTA in Emotion Recognition in Conversations on the MELD dataset. Refer to the directory COSMIC for the code. Read the paper -- COSMIC: COmmonSense knowledge for eMotion Identification in Conversations.

22/05/2019: MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation has been accepted as a full paper at ACL 2019. The updated paper can be found here - https://arxiv.org/pdf/1810.02508.pdf

22/05/2019: Dyadic MELD has been released. It can be used to test dyadic conversational models.

15/11/2018: The problem in the train.tar.gz has been fixed.

Research Works using MELD

Zhang, Yazhou, Qiuchi Li, Dawei Song, Peng Zhang, and Panpan Wang. "Quantum-Inspired Interactive Networks for Conversational Sentiment Analysis." IJCAI 2019.

Zhang, Dong, Liangqing Wu, Changlong Sun, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. "Modeling both Context-and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations." IJCAI 2019.

Ghosal, Deepanway, Navonil Majumder, Soujanya Poria, Niyati Chhaya, and Alexander Gelbukh. "DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation." EMNLP 2019.

Introduction

Multimodal EmotionLines Dataset (MELD) has been created by enhancing and extending EmotionLines dataset. MELD contains the same dialogue instances available in EmotionLines, but it also encompasses audio and visual modality along with text. MELD has more than 1400 dialogues and 13000 utterances from Friends TV series. Multiple speakers participated in the dialogues. Each utterance in a dialogue has been labeled by any of these seven emotions -- Anger, Disgust, Sadness, Joy, Neutral, Surprise and Fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.

Example Dialogue

Dataset Statistics

Statistics	Train	Dev	Test
# of modality	{a,v,t}	{a,v,t}	{a,v,t}
# of unique words	10,643	2,384	4,361
Avg. utterance length	8.03	7.99	8.28
Max. utterance length	69	37	45
Avg. # of emotions per dialogue	3.30	3.35	3.24
# of dialogues	1039	114	280
# of utterances	9989	1109	2610
# of speakers	260	47	100
# of emotion shift	4003	427	1003
Avg. duration of an utterance	3.59s	3.59s	3.58s

Please visit https://affective-meld.github.io for more details.

Dataset Distribution

	Train	Dev	Test
Anger	1109	153	345
Disgust	271	22	68
Fear	268	40	50
Joy	1743	163	402
Neutral	4710	470	1256
Sadness	683	111	208
Surprise	1205	150	281

Purpose

Multimodal data analysis exploits information from multiple-parallel data channels for decision making. With the rapid growth of AI, multimodal emotion recognition has gained a major research interest, primarily due to its potential applications in many challenging tasks, such as dialogue generation, multimodal interaction etc. A conversational emotion recognition system can be used to generate appropriate responses by analysing user emotions. Although there are numerous works carried out on multimodal emotion recognition, only a very few actually focus on understanding emotions in conversations. However, their work is limited only to dyadic conversation understanding and thus not scalable to emotion recognition in multi-party conversations having more than two participants. EmotionLines can be used as a resource for emotion recognition for text only, as it does not include data from other modalities such as visual and audio. At the same time, it should be noted that there is no multimodal multi-party conversational dataset available for emotion recognition research. In this work, we have extended, improved, and further developed EmotionLines dataset for the multimodal scenario. Emotion recognition in sequential turns has several challenges and context understanding is one of them. The emotion change and emotion flow in the sequence of turns in a dialogue make accurate context modelling a difficult task. In this dataset, as we have access to the multimodal data sources for each dialogue, we hypothesise that it will improve the context modelling thus benefiting the overall emotion recognition performance. This dataset can also be used to develop a multimodal affective dialogue system. IEMOCAP, SEMAINE are multimodal conversational datasets which contain emotion label for each utterance. However, these datasets are dyadic in nature, which justifies the importance of our Multimodal-EmotionLines dataset. The other publicly available multimodal emotion and sentiment recognition datasets are MOSEI, MOSI, MOUD. However, none of those datasets is conversational.

Dataset Creation

The first step deals with finding the timestamp of every utterance in each of the dialogues present in the EmotionLines dataset. To accomplish this, we crawled through the subtitle files of all the episodes which contains the beginning and the end timestamp of the utterances. This process enabled us to obtain season ID, episode ID, and timestamp of each utterance in the episode. We put two constraints whilst obtaining the timestamps: (a) timestamps of the utterances in a dialogue must be in increasing order, (b) all the utterances in a dialogue have to belong to the same episode and scene. Constraining with these two conditions revealed that in EmotionLines, a few dialogues consist of multiple natural dialogues. We filtered out those cases from the dataset. Because of this error correction step, in our case, we have the different number of dialogues as compare to the EmotionLines. After obtaining the timestamp of each utterance, we extracted their corresponding audio-visual clips from the source episode. Separately, we also took out the audio content from those video clips. Finally, the dataset contains visual, audio, and textual modality for each dialogue.

Paper

The paper explaining this dataset can be found - https://arxiv.org/pdf/1810.02508.pdf

Download the data

Please visit - http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz to download the raw data. Data are stored in .mp4 format and can be found in XXX.tar.gz files. Annotations can be found in https://github.com/declare-lab/MELD/tree/master/data/MELD.

Description of the .csv files

Column Specification

Column Name	Description
Sr No.	Serial numbers of the utterances mainly for referencing the utterances in case of different versions or multiple copies with different subsets
Utterance	Individual utterances from EmotionLines as a string.
Speaker	Name of the speaker associated with the utterance.
Emotion	The emotion (neutral, joy, sadness, anger, surprise, fear, disgust) expressed by the speaker in the utterance.
Sentiment	The sentiment (positive, neutral, negative) expressed by the speaker in the utterance.
Dialogue_ID	The index of the dialogue starting from 0.
Utterance_ID	The index of the particular utterance in the dialogue starting from 0.
Season	The season no. of Friends TV Show to which a particular utterance belongs.
Episode	The episode no. of Friends TV Show in a particular season to which the utterance belongs.
StartTime	The starting time of the utterance in the given episode in the format 'hh:mm:ss,ms'.
EndTime	The ending time of the utterance in the given episode in the format 'hh:mm:ss,ms'.

The files

/data/MELD/train_sent_emo.csv - contains the utterances in the training set along with Sentiment and Emotion labels.
/data/MELD/dev_sent_emo.csv - contains the utterances in the dev set along with Sentiment and Emotion labels.
/data/MELD/test_sent_emo.csv - contains the utterances in the test set along with Sentiment and Emotion labels.
/data/MELD_Dyadic/train_sent_emo_dya.csv - contains the utterances in the training set of the dyadic variant of MELD along with Sentiment and Emotion labels. For getting the video clip corresponding to a particular utterance refer to the columns 'Old_Dialogue_ID' and 'Old_Utterance_ID'.
/data/MELD_Dyadic/dev_sent_emo_dya.csv - contains the utterances in the dev set of the dyadic variant along with Sentiment and Emotion labels. For getting the video clip corresponding to a particular utterance refer to the columns 'Old_Dialogue_ID' and 'Old_Utterance_ID'.
/data/MELD_Dyadic/test_sent_emo_dya.csv - contains the utterances in the test set of the dyadic variant along with Sentiment and Emotion labels. For getting the video clip corresponding to a particular utterance refer to the columns 'Old_Dialogue_ID' and 'Old_Utterance_ID'.

Description of Pickle Files

There are 13 pickle files comprising of the data and features used for training the baseline models. Following is a brief description of each of the pickle files.

Data pickle files:

data_emotion.p, data_sentiment.p - These are the primary data files which contain 5 different elements stored as a list.
- data: It consists of a dictionary with the following key/value pairs.
  - text: original sentence.
  - split: train/val/test - denotes the which split the tuple belongs to.
  - y: label of the sentence.
  - dialog: ID of the dialog the utterance belongs to.
  - utterance: utterance number of the dialog ID.
  - num_words: number of words in the utterance.
- W: glove embedding matrix
- vocab: the vocabulary of the dataset
- word_idx_map: mapping of each word from vocab to its index in W.
- max_sentence_length: maximum number of tokens in an utterance in the dataset.
- label_index: mapping of each label (emotion or sentiment) to its assigned index, eg. label_index['neutral']=0

import pickle
data, W, vocab, word_idx_map, max_sentence_length, label_index = pickle.load(open(filepath, 'rb'))

text_glove_average_emotion.pkl, text_glove_average_sentiment.pkl - It consists of 300 dimensional textual feature vectors of each utterance initialized as the average of the Glove embeddings of all tokens per utterance. It is a list comprising of 3 dictionaries for train, val and the test set with each dictionary indexed in the format dia_utt, where dia is the dialogue id and utt is the utterance id. For eg. train_text_avg_emb['0_0'].shape = (300, )

import pickle
train_text_avg_emb, val_text_avg_emb, test_text_avg_emb = pickle.load(open(filepath, 'rb'))

audio_embeddings_feature_selection_emotion.pkl,audio_embeddings_feature_selection_sentiment.pkl - It consists of 1611/1422 dimensional audio feature vectors of each utterance trained for emotion/sentiment classification. These features are originally extracted from openSMILE and then followed by L2-based feature selection using SVM. It is a list comprising of 3 dictionaries for train, val and the test set with each dictionary indexed in the format dia_utt, where dia is the dialogue id and utt is the utterance id. For eg. train_audio_emb['0_0'].shape = (1611, ) or (1422, )

import pickle
train_audio_emb, val_audio_emb, test_audio_emb = pickle.load(open(filepath, 'rb'))

Model output pickle files:

text_glove_CNN_emotion.pkl, text_glove_CNN_sentiment.pkl - It consists of 100 dimensional textual features obtained after training on a CNN-based network for emotion/sentiment calssification. It is a list comprising of 3 dictionaries for train, val and the test set with each dictionary indexed in the format dia_utt, where dia is the dialogue id and utt is the utterance id. For eg. train_text_CNN_emb['0_0'].shape = (100, )

import pickle
train_text_CNN_emb, val_text_CNN_emb, test_text_CNN_emb = pickle.load(open(filepath, 'rb'))

text_emotion.pkl, text_sentiment.pkl - These files contain the contextual feature representations as produced by the uni-modal bcLSTM model. It consists of 600 dimensional textual feature vector for each utterance for emotion/sentiment classification stored as a dictionary indexed with dialogue id. It is a list comprising of 3 dictionaries for train, val and the test set. For eg. train_text_emb['0'].shape = (33, 600), where 33 is the maximum number of utterances in a dialogue. Dialogues with less utterances are padded with zero-vectors.

import pickle
train_text_emb, val_text_emb, test_text_emb = pickle.load(open(filepath, 'rb'))

audio_emotion.pkl, audio_sentiment.pkl - These files contain the contextual feature representations as produced by the uni-modal bcLSTM model. It consists of 300/600 dimensional audio feature vector for each utterance for emotion/sentiment classification stored as a dictionary indexed with dialogue id. It is a list comprising of 3 dictionaries for train, val and the test set. For eg. train_audio_emb['0'].shape = (33, 300) or (33, 600), where 33 is the maximum number of utterances in a dialogue. Dialogues with less utterances are padded with zero-vectors.

import pickle
train_audio_emb, val_audio_emb, test_audio_emb = pickle.load(open(filepath, 'rb'))

bimodal_sentiment.pkl - This file contains the contextual feature representations as produced by the bi-imodal bcLSTM model. It consists of 600 dimensional bimodal (text, audio) feature vector for each utterance for sentiment classification stored as a dictionary indexed with dialogue id. It is a list comprising of 3 dictionaries for train, val and the test set. For eg. train_bimodal_emb['0'].shape = (33, 600), where 33 is the maximum number of utterances in a dialogue. Dialogues with less utterances are padded with zero-vectors.

import pickle
train_bimodal_emb, val_bimodal_emb, test_bimodal_emb = pickle.load(open(filepath, 'rb'))

Description of Raw Data

There are 3 folders (.tar.gz files)-train, dev and test; each of which corresponds to video clips from the utterances in the 3 .csv files.
In any folder, each video clip in the raw data corresponds to one utterance in the corresponding .csv file. The video clips are named in the format: diaX1_uttX2.mp4, where X1 is the Dialogue_ID and X2 is the Utterance_ID as provided in the corresponding .csv file, denoting the particular utterance.
For example, consider the video clip dia6_utt1.mp4 in train.tar.gz. The corresponding utterance for this video clip will be in the file train_sent_emp.csv with Dialogue_ID=6 and Utterance_ID=1, which is 'You liked it? You really liked it?'

Reading the Data

There are 2 python scripts provided in './utils/':

read_meld.py - displays the path of the video file corresponding to an utterance in the .csv file from MELD.
read_emorynlp - displays the path of the video file corresponding to an utterance in the .csv file from Multimodal EmoryNLP Emotion Detection dataset.

Labelling

For experimentation, all the labels are represented as one-hot encodings, the indices for which are as follows:

Emotion - {'neutral': 0, 'surprise': 1, 'fear': 2, 'sadness': 3, 'joy': 4, 'disgust': 5, 'anger': 6}. Therefore, the label corresponding to the emotion 'joy' would be [0., 0., 0., 0., 1., 0., 0.]
Sentiment - {'neutral': 0, 'positive': 1, 'negative': 2}. Therefore, the label corresponding to the sentiment 'positive' would be [0., 1., 0.]

Class Weights

For the baseline on emotion classification, the following class weights were used. The indexing is the same as mentioned above. Class Weights: [4.0, 15.0, 15.0, 3.0, 1.0, 6.0, 3.0].

Run the baseline

Please follow these steps to run the baseline -

Download the features from here.
Copy these features into ./data/pickles/
To train/test the baseline model, run the file: baseline/baseline.py as follows:
- python baseline.py -classify [Sentiment|Emotion] -modality [text|audio|bimodal] [-train|-test]
- example command to train text unimodal for sentiment classification: python baseline.py -classify Sentiment -modality text -train
- use python baseline.py -h to get help text for the parameters.
For pre-trained models, download the model weights from here and place the pickle files inside ./data/models/.

Citation

Please cite the following papers if you find this dataset useful in your research

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, R. Mihalcea. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation. ACL 2019.

Chen, S.Y., Hsu, C.C., Kuo, C.C. and Ku, L.W. EmotionLines: An Emotion Corpus of Multi-Party Conversations. arXiv preprint arXiv:1802.08379 (2018).

Multimodal EmoryNLP Emotion Recognition Dataset

Description

Multimodal EmoryNLP Emotion Detection Dataset has been created by enhancing and extending EmoryNLP Emotion Detection dataset. It contains the same dialogue instances available in EmoryNLP Emotion Detection dataset, but it also encompasses audio and visual modality along with text. There are more than 800 dialogues and 9000 utterances from Friends TV series exist in the multimodal EmoryNLP dataset. Multiple speakers participated in the dialogues. Each utterance in a dialogue has been labeled by any of these seven emotions -- Neutral, Joyful, Peaceful, Powerful, Scared, Mad and Sad. The annotations are borrowed from the original dataset.

Dataset Statistics

Statistics	Train	Dev	Test
# of modality	{a,v,t}	{a,v,t}	{a,v,t}
# of unique words	9,744	2,123	2,345
Avg. utterance length	7.86	6.97	7.79
Max. utterance length	78	60	61
Avg. # of emotions per scene	4.10	4.00	4.40
# of dialogues	659	89	79
# of utterances	7551	954	984
# of speakers	250	46	48
# of emotion shift	4596	575	653
Avg. duration of an utterance	5.55s	5.46s	5.27s

Dataset Distribution

	Train	Dev	Test
Joyful	1677	205	217
Mad	785	97	86
Neutral	2485	322	288
Peaceful	638	82	111
Powerful	551	70	96
Sad	474	51	70
Scared	941	127	116

Data

Video clips of this dataset can be download from this link. The annotation files can be found in https://github.com/SenticNet/MELD/tree/master/data/emorynlp. There are 3 .csv files. Each entry in the first column of these csv files contain an utterance whose corresponding video clip can be found here. Each utterance and its video clip is indexed by the season no., episode no., scene id and utterance id. For example, sea1_ep2_sc6_utt3.mp4 implies the clip corresponds to the utterance with season no. 1, episode no. 2, scene_id 6 and utterance_id 3. A scene is simply a dialogue. This indexing is consistent with the original dataset. The .csv files and the video files are divided into the train, validation and test set in accordance with the original dataset. Annotations have been directly borrowed from the original EmoryNLP dataset (Zahiri et al. (2018)).

Description of the .csv files

Column Specification

Column Name	Description
Utterance	Individual utterances from EmoryNLP as a string.
Speaker	Name of the speaker associated with the utterance.
Emotion	The emotion (Neutral, Joyful, Peaceful, Powerful, Scared, Mad and Sad) expressed by the speaker in the utterance.
Scene_ID	The index of the dialogue starting from 0.
Utterance_ID	The index of the particular utterance in the dialogue starting from 0.
Season	The season no. of Friends TV Show to which a particular utterance belongs.
Episode	The episode no. of Friends TV Show in a particular season to which the utterance belongs.
StartTime	The starting time of the utterance in the given episode in the format 'hh:mm:ss,ms'.
EndTime	The ending time of the utterance in the given episode in the format 'hh:mm:ss,ms'.

Note: There are a few utterances for which we were not able to find the start and end time due to some inconsistencies in the subtitles. Such utterances have been omitted from the dataset. However, we encourage the users to find the corresponding utterances from the original dataset and generate video clips for the same.

Citation

Please cite the following papers if you find this dataset useful in your research

S. Zahiri and J. D. Choi. Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks. In The AAAI Workshop on Affective Content Analysis, AFFCON'18, 2018.

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, R. Mihalcea. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation. ACL 2019.

meld's People

Stargazers

Watchers

Forkers

soujanyaporia jdc08161063 pappagari mahbubnoor digbose92 databill86 batermj jjwangnlp hecosysu cr7wo lql0716 northeast250 chenbingxiayu gkoumasd goelarushi akatoshi aditya-uniyal wojiushidongfangbubai aucan kd710306 stdoo strategicallynicole ag027592 salchem terry07 mahdimor ideaplexus tamrakars shahidpavis imanojkumar milkwyx jiangchenglin521 jiaqishi ccc0936 mengyaoli87 node2319 wendonggan phantomlei3 lydiatx gmn0105 ohheysherry66 phillip1029 xiaoying1 dengkaidk crbothe g-tryfou giliadis bothe asrlytics monilshah98 qiuchili saiuz fagan2888 qianfan1996 ravising-h fenss pk1893 sharathr19 william-yanhua dangxusheng noaimabari dungmn wanghaoran-ucas shubhamgoel90 sea-turt1e yhn280385395 arindam-jain whasm12 seongjinpark-88 thoughtfulmind iamyourboss lidayuls khushalrathi ruddy202 nitishvsawant jinsmile1996 jiaojiaoswin kanhaiyakumarsingh mcsumail mpn-malavika parwazd hyeonjeongbyeon xuandif-cmu scutcyr rabbeh misoknisky maisuody kju196 zhongzebin ziweig tae898 liuyy99 souvickg anyichenyu aasthajain12 guozanhua durgeshbhagat summuk dwtcourses kattthy

meld's Issues

Use of the dataset for training models in relation to the license

Do you have any expectations for how the GPL v3 license should apply to weights in a model trained using the data as part of a training set, with no use of the software or the pretrained models? I was looking to include this as training data either in Stanford's stanza package, which has an apache v2 license, or CoreNLP software, which has a separate license for commercial applications.

class weight {0: 4.0, 1: 15.0, 2: 15.0, 3: 3.0, 4: 1.0, 5: 6.0, 6: 3.0}, it still is zero

Dear professor

i am really sorry to disturb you. I add the class_weight={0: 4.0, 1: 15.0, 2: 15.0, 3: 3.0, 4: 1.0, 5: 6.0, 6: 3.0}.
the prceision is zero

          precision    recall  f1-score   support

       0     0.4919    0.9889    0.6570      1256
       1     0.0000    0.0000    0.0000       281
       2     0.0000    0.0000    0.0000        50
       3     0.0000    0.0000    0.0000       208
       4     0.0000    0.0000    0.0000       402
       5     0.0000    0.0000    0.0000        68
       6     0.4353    0.1072    0.1721       345

micro avg 0.4900 0.4900 0.4900 2610
macro avg 0.1325 0.1566 0.1184 2610
weighted avg 0.2942 0.4900 0.3389 2610

Weighted FScore:
(0.2942449206381084, 0.4900383141762452, 0.3388985544501019, None)

How to fix it?
thanks, best wishes

How can I know which face on the video sample is speaking to extract my own visual features ??

First of all, I would like to congratulate all you for the big effort you did when creating this MELD dataset. However, I would also like to ask you if it is possible to obtain the facial landmarks (or any other kind of information) that will allow me to extract the face of the person actively speaking as you did for extracting the features you provide.

The reason is because I would like to explore my own visual features.

Thanks in advance. Best regards from Valencia,

David

Failed to load pretrained models

Hi,
I tried to reload the pretrained models, but failed. I assume it is a keras version problem. Would you like to describe the running enviroment? Sorry, I cannot find it in Readme.
Thanks.

Hello

I am learning emotion recognition.So I downloaded your project,but the website of http://bit.ly/MELD-features that I can not enter.Can you send to my .And the others should I download,I can not enter either. Sincerely thank you .Email:[email protected]

How do you divide the dataset?

How do you divide the train set dev set and the test set while keeping the tag ratio balanced?

Is there any face region information in videos?

Hi, Thanks to share good datasets

I want to cut face region who speak in video frame, but many video frame has 2 more person.

Is this any face region information in videos? If any information(xy info, etc...) is exist, please share.

Thank you

Using pre-trained models with my own audio/video files.

Hello, i have downloaded the pre-trained models from the link provided in the repository, but baseline.py doesn't provide any way to use my own audio/video (.mp3/.mp4) files directly. The authors are loading pickle files instead.

Can somebody give me any scripts so i can use the model with my own audio/video files.

Any help would be really appreciated.

Mismatch in data_emotion.p

In MELD/data_emotion.p, why is word_idx_map[','] = 6459, whereas W.shape = (6336, 300)?
Is this a bug?

Missing video in dev set and utility of additional videos

Dear Prof. Poria,
I've downloaded the raw dataset from the offical website because I want to extract multimodal features by myself. However, I find that the video 'dia110_utt7.mp4' (Sr No. 1153 in the 'dev_sent_emo.csv') does not exist in the 'dev_splits_complete' folder. Could you verify this problem and update the dataset?

Besides, I notice that there are additional videos in video folders which have no corresponding annotations in csv files. For example, 'dia66_utt9', 'dia49_utt5', 'dia49_utt4', 'dia66_utt10' in the dev set and 'dia108_utt2', 'final_videos_testdia101_utt0' in the test set. Could you tell me what are videos used for?

Thanks very much!

About visual_features.tar.gz

Hello! I noticed that there is a "visual_features.tar.gz" file in the features file url you provided. I have downloaded it and decompressed it, and the file structure is as follows:

matlab_resnet_faces
- train
  - frame_1_1.mat
  - frame_1_2.mat
  - frame_x_x.mat
    ...
- dev
- test

Because there is no README file, so I cannot know the meaning or source of these files. Can you explain it to me? Hope to hear from you. Thank you!

Really poor results using the given features

I am trying to create a multimodal model using MELD dataset. After a really large number of tries, using the given features or features obtained by me(opensmile and wav2vec for audio and simple textual approaches), I alway get poor results, really far way from the ones described in the paper. Today, I decided to load the audio and the text features obtained by the bcLSTM model and concatenate them, just like shown in the baseline file, and use them as input for a new bcLSTM. Basically, i copy-past what the authors have done, and the results are so bad. Has anyone face this problem or are the given features actually really bad? I also tried to apply the same methods and models to the features procesed by me and the results are still bad. I am doing my master's thesis in multimodal emotion recognition and this dataset is not helping at all.

BC-LSTM text unimodal checkpoint is broken

I could run audio unimodal and bimodal BC-LSTM pretrained model, but got the following error when running text unimodal BC-LSTM.

Model initiated for Sentiment classification
Loading data
Labels used for this classification:  {'neutral': 0, 'positive': 1, 'negative': 2}
Traceback (most recent call last):
  File "baseline.py", line 312, in <module>
    model.test_model()
  File "baseline.py", line 234, in test_model
    model = load_model(self.PATH)
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/saving/save.py", line 200, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/saving/hdf5_format.py", line 180, in load_model_from_hdf5
    model = model_config_lib.model_from_config(model_config,
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/saving/model_config.py", line 52, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/layers/serialization.py", line 208, in deserialize
    return generic_utils.deserialize_keras_object(
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/utils/generic_utils.py", line 674, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/engine/training.py", line 2397, in from_config
    functional.reconstruct_from_config(config, custom_objects))
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/engine/functional.py", line 1273, in reconstruct_from_config
    process_layer(layer_data)
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/engine/functional.py", line 1255, in process_layer
    layer = deserialize_layer(layer_data, custom_objects=custom_objects)
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/layers/serialization.py", line 208, in deserialize
    return generic_utils.deserialize_keras_object(
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/utils/generic_utils.py", line 674, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/layers/core.py", line 1005, in from_config
    function = cls._parse_function_from_config(
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/layers/core.py", line 1057, in _parse_function_from_config
    function = generic_utils.func_load(
  File "/Users/xiaoyu/.pyenv/versions/3.8.7/lib/python3.8/site-packages/keras/utils/generic_utils.py", line 789, in func_load
    code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)

The command I'm running is:
python baseline.py -classify sentiment -modality text -test.

ValueError: "input_length" is 33, but received input has shape (None, 50)

I am trying to run the baseline.py file to test the model for Emotion classification and text modality by using the already trained models (the source of which were provided in the README file).

I am using the following command:
python baseline.py -classify Emotion -modality text -test

The error that I am getting after running this command is as follows:

Using TensorFlow backend.
Model initiated for Emotion classification
Loading data
Labels used for this classification:  {'neutral': 0, 'surprise': 1, 'fear': 2, 'sadness': 3, 'joy': 4, 'disgust': 5, 'anger': 6}
Traceback (most recent call last):
  File "baseline.py", line 286, in <module>
    model.test_model()
  File "baseline.py", line 234, in test_model
    model = load_model(self.PATH)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 584, in load_model
    model = _deserialize_model(h5dict, custom_objects, compile)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 274, in _deserialize_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\saving.py", line 627, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "C:\Program Files\Python37\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
    printable_module_name='layer')
  File "C:\Program Files\Python37\lib\site-packages\keras\utils\generic_utils.py", line 147, in deserialize_keras_object
    list(custom_objects.items())))
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\network.py", line 1075, in from_config
    process_node(layer, node_data)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\network.py", line 1025, in process_node
    layer(unpack_singleton(input_tensors), **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\keras\engine\base_layer.py", line 506, in __call__
    output_shape = self.compute_output_shape(input_shape)
  File "C:\Program Files\Python37\lib\site-packages\keras\layers\embeddings.py", line 136, in compute_output_shape
    (str(self.input_length), str(input_shape)))
ValueError: "input_length" is 33, but received input has shape (None, 50)

Can someone please help me resolve this issue?

guidance for developing a LSTM model

Hi,
I am struggling to find any support for developing a model for a multi-model dataset .
Can you please guide or give me some reference for using this dataset and developing a LSTM or CNN model.
I am new to this field but for now i am able to develop model for images and text separately but having trouble in using a merged input (image+text or image+audio ).

Please provide some direction or explain with respect to the baseline model you provided.
Thanks

missing training data

Hi there,
I've downloaded MELD.Raw.tar.gz, the dev and test data is ok, but the training data is missing.
when I untar train.tar.gz, it always shows the following issue:
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
and after that, I got 'split' directory which contains 4648 files. So I believe the training data is definitely missing.
I tried many times, always the same results.
I wish I can get some help with that.

Video file corrupted: train_splits/dia125_utt3.mp4

dia125_utt3.mp4

As the attachment, the train_splits/dia125_utt3.mp4 is corrupted, and could not be used to extract features.

I put it here in case someone has a similar problem.

Absence of Audio in the Test File

Hello Mr/Ms

I have found an issue where there seems to be no audio in the test dataset which is confusing. I was wondering if maybe there was a problem or maybe there is a separate audio file for this. I hope to hear from you soon.

Videos are not well-aligned with the texts.

Obviously this issue was already brought up at #9

The alignment is pretty bad. It's hard for me to go multimodal at the moment, because of this issue.

I have two questions:

Has this been fixed? Or are you planning on using a better alignment tool?
Can I have access to the original friends videos? I wonder if I can cut the videos into utterances myself using ASR.

About the audio data

Where can I download the audio file

Wrong video files

I used this link to download the audio files for this data set.

However, there are a few problems with at least a few of the video files and/or their transcriptions:

dia309_utt0.mp4: transcription contains description of scene which needs to be removed ("She doesn't hear him and keeps running, Chandler starts chasing her as the theme to")
test_splits_wav/dia220_utt0.mp4: file is wrongly cut (video is 4min long - transcription is way off as it's Ross and Julie meeting Rachel at the airport, not Phoebe talking to Joey )
test_splits_wav/dia38_utt4.mp4: file is wrongly cut (video is 5min long)
train_splits_wav/dia309_utt0.mp4: file is wrongly cut

In addition, I was able to verify that some of the old problems reported here still persist (e.g. dia793_utt0.mp4).

Have they been solved? Have I perhaps downloaded an old version of the data set?

requirements.txt missing

Adding requirements.txt will be very helpful to fix environment issues.

error: bad character range \|-t at position 12

when i run python baseline.py -classify [Sentiment|Emotion] -modality [text|audio|bimodal] [-train|-test].

it's errors. how to fix it? thanks

error Traceback (most recent call last)
in
----> 1 get_ipython().run_line_magic('run', 'baseline.py -classify [Sentiment|Emotion] -modality [text|audio|bimodal] [-train|-test]')

F:\Anaconda\lib\site-packages\IPython\core\interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2325 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2326 with self.builtin_trap:
-> 2327 result = fn(*args, **kwargs)
2328 return result
2329

in run(self, parameter_s, runner, file_finder)

F:\Anaconda\lib\site-packages\IPython\core\magic.py in (f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):

F:\Anaconda\lib\site-packages\IPython\core\magics\execution.py in run(self, parameter_s, runner, file_finder)
736 else:
737 # tilde and glob expansion
--> 738 args = shellglob(map(os.path.expanduser, arg_lst[1:]))
739
740 sys.argv = [filename] + args # put in the proper filename

F:\Anaconda\lib\site-packages\IPython\utils\path.py in shellglob(args)
324 unescape = unescape_glob if sys.platform != 'win32' else lambda x: x
325 for a in args:
--> 326 expanded.extend(glob.glob(a) or [unescape(a)])
327 return expanded
328

F:\Anaconda\lib\glob.py in glob(pathname, recursive)
19 zero or more directories and subdirectories.
20 """
---> 21 return list(iglob(pathname, recursive=recursive))
22
23 def iglob(pathname, *, recursive=False):

F:\Anaconda\lib\glob.py in _iglob(pathname, recursive, dironly)
55 yield from _glob2(dirname, basename, dironly)
56 else:
---> 57 yield from _glob1(dirname, basename, dironly)
58 return
59 # os.path.split() returns the argument itself as a dirname if it is a

F:\Anaconda\lib\glob.py in _glob1(dirname, pattern, dironly)
83 if not _ishidden(pattern):
84 names = (x for x in names if not _ishidden(x))
---> 85 return fnmatch.filter(names, pattern)
86
87 def _glob0(dirname, basename, dironly):

F:\Anaconda\lib\fnmatch.py in filter(names, pat)
50 result = []
51 pat = os.path.normcase(pat)
---> 52 match = _compile_pattern(pat)
53 if os.path is posixpath:
54 # normcase on posix is NOP. Optimize it away from the loop.

F:\Anaconda\lib\fnmatch.py in _compile_pattern(pat)
44 else:
45 res = translate(pat)
---> 46 return re.compile(res).match
47
48 def filter(names, pat):

F:\Anaconda\lib\re.py in compile(pattern, flags)
250 def compile(pattern, flags=0):
251 "Compile a regular expression pattern, returning a Pattern object."
--> 252 return _compile(pattern, flags)
253
254 def purge():

F:\Anaconda\lib\re.py in _compile(pattern, flags)
302 if not sre_compile.isstring(pattern):
303 raise TypeError("first argument must be string or compiled pattern")
--> 304 p = sre_compile.compile(pattern, flags)
305 if not (flags & DEBUG):
306 if len(_cache) >= _MAXCACHE:

F:\Anaconda\lib\sre_compile.py in compile(p, flags)
762 if isstring(p):
763 pattern = p
--> 764 p = sre_parse.parse(p, flags)
765 else:
766 pattern = None

F:\Anaconda\lib\sre_parse.py in parse(str, flags, state)
946
947 try:
--> 948 p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
949 except Verbose:
950 # the VERBOSE flag was switched on inside the pattern. to be

F:\Anaconda\lib\sre_parse.py in _parse_sub(source, state, verbose, nested)
441 start = source.tell()
442 while True:
--> 443 itemsappend(_parse(source, state, verbose, nested + 1,
444 not nested and not items))
445 if not sourcematch("|"):

F:\Anaconda\lib\sre_parse.py in _parse(source, state, verbose, nested, first)
832 sub_verbose = ((verbose or (add_flags & SRE_FLAG_VERBOSE)) and
833 not (del_flags & SRE_FLAG_VERBOSE))
--> 834 p = _parse_sub(source, state, sub_verbose, nested + 1)
835 if not source.match(")"):
836 raise source.error("missing ), unterminated subpattern",

F:\Anaconda\lib\sre_parse.py in _parse(source, state, verbose, nested, first)
596 if hi < lo:
597 msg = "bad character range %s-%s" % (this, that)
--> 598 raise source.error(msg, len(this) + 1 + len(that))
599 setappend((RANGE, (lo, hi)))
600 else:

error: bad character range |-t at position 12

No audio modal in the test set.

Hello, prof.
I've downloaded the raw data, and it seems that the samples in the test set, such as ./output_repeated_splits_test/dia263_utt7.mp4, have no sound.
Is the RAW audio modal data for the test set available?
Thank you very much!

Error in train.tar.gz

Got this error while untar

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Using pre-trained models with our own audio files.

Hi, thanks for making this open source project.

I'm trying to use the pre-trained models you provide on my own audio files in order to extract the emotion and sentiment labels, but the baseline.py does not seem to provide a way to use my own files.

Moreover, the baseline.py file loads .pkl instead of .wav or .mp4 files. How would I go about using my own files an generating similar .pkl files in order to be used with the pre-trained models?

Thanks.

Cannot download the raw data.

First of all, sorry if this is not the correct place to post this question.

I tried the following commands several times but could not download the raw data.

$ wget http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
--2022-10-17 14:00:13--  http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
Resolving web.eecs.umich.edu (web.eecs.umich.edu)... 141.212.113.214
Connecting to web.eecs.umich.edu (web.eecs.umich.edu)|141.212.113.214|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

--2022-10-17 14:02:13--  (try: 2)  http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
Connecting to web.eecs.umich.edu (web.eecs.umich.edu)|141.212.113.214|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

--2022-10-17 14:04:15--  (try: 3)  http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
Connecting to web.eecs.umich.edu (web.eecs.umich.edu)|141.212.113.214|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

--2022-10-17 14:06:20--  (try: 4)  http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
Connecting to web.eecs.umich.edu (web.eecs.umich.edu)|141.212.113.214|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

--2022-10-17 14:07:29--  (try: 5)  http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
Connecting to web.eecs.umich.edu (web.eecs.umich.edu)|141.212.113.214|:80... connected.
HTTP request sent, awaiting response...^C

And I confirmed that I also cannot access Prof. Mihalcea's website, where the data is located.

Could you please check if you can access this download link?
Thank you.

FileNotFoundError: [Errno 2] No such file or directory: './data/pickles/data_emotion.p'

!python baseline.py -classify Emotion -modality bimodal -train
Wanted to run baseline script after having download feature file in ./data/pickles/ dir. But going into this issue. data_emotion.p file is missing and I wonder where I can get it.

Data download link

The download link to the audio data doesn't appear to be working. Am I missing something?

Multiple issues in the dataset.

Audio

There is a disturbance in audio which would have affected the audio features.

Few Examples:
dia793_utt0.mp4
dia164_utt5.mp4
dia682_utt1.mp4
dia529_utt2.mp4
dia1029_utt1.mp4
dia1008_utt1.mp4

Mostly all videos with size > 2.5 MB (around 200 videos in train_set)

Video and text are not matching.

For example

a) dialogue 241. In utterance 1 the sync breaks between the text and the video
utterance 2 in text is "I asked him." while video dia241_utt2.mp4 has just word "now" and the sync issues goes on.

b) dialogue 757 utterance 7 is also not synced with the text.

c) diaglogue 485 utterance 0 in text "Hey, this- Heyy..." but the video is a long clip.

There are many more video-text sync issues.

Is this dataset usable?
Please help me with this.

Baseline results

Hi, I have tried the bc_LSTM baseline with bimodal in emotion classification, but the F1-score and accuracy of 'fear' and 'disgust' are always zero, so I can't reproduce the result in paper.

The command I use:

python baseline.py -classify emotion -modality bimodal -train

The results:

          precision    recall  f1-score   support

       0     0.7322    0.7795    0.7551      1256
       1     0.4799    0.4662    0.4729       281
       2     0.0000    0.0000    0.0000        50
       3     0.2781    0.2019    0.2340       208
       4     0.4813    0.5448    0.5111       402
       5     0.0000    0.0000    0.0000        68
       6     0.3832    0.4377    0.4087       345

The emotion labels:

Emotion - {'neutral': 0, 'surprise': 1, 'fear': 2, 'sadness': 3, 'joy': 4, 'disgust': 5, 'anger': 6}.

Is there something wrong with my understanding?

Unable to download the Raw Data from the website.

I need the Raw Data to extract features according to my project needs. I am unable to Download the features from https://affective-meld.github.io/.

The link http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz is not loading. Kindly look into it. Thanks

How do I convert a video to the data format required for this model?

I've got the bimodal_weights_emotion.hdf5 model from baseline, and the model can recognize the emotion from MELD dataset well. But I don't know how to recognize emotion from my own video. How do I convert a video to the data format required for this model?

I'm a beginner in multimodal emotion recognition. I'd really appreciate it if you could give me some tips

data preprocessing code of MELD

Thank you for releasing the baseline code and the feature files.
Can you open the feature extraction code for speech and text of the MELD dataset?
Thank you.

baseline.py not working

from  baseline.data_helpers import Dataloader

ModuleNotFoundError: No module named 'baseline.data_helpers'; 'baseline' is not a package

ValueError: `class_weight` not supported for 3+ dimensional targets. with class_weight

Hi,

I got an error ValueError: "class_weight" not supported for 3+ dimensional targets. with class_weight, when I run a baseline.py (with only text in emotion classification)with class_weight as below that is provided in README.
I didn't make any changes to bc-LSTM model, but should I make a new loss function considering class_weight or something?
Could you give me advice to be able to use class_weight without any problems?
Thank you in advance.

using command:
python baseline.py -classify emotion -modality text -train

fit parameter:

history = model.fit(self.train_x, self.train_y,
                            epochs=self.epochs,
                            batch_size=self.batch_size,
                            sample_weight=self.train_mask,
                            shuffle=True,
                            callbacks=[early_stopping, checkpoint],
                            validation_data=(
                                self.val_x, self.val_y, self.val_mask),
                            class_weight={0: 4.0, 1: 15.0, 2: 15.0,
                                          3: 3.0, 4: 1.0, 5: 6.0, 6: 3.0}
                            )

The meanings about the features.

Can you exlpain what features each file represents？I'm a little confused about their file names. After reading the codes in basline.py and data_helper.py, you use cnn to extract the textual-features. Does it mean the vedio features? In your fusion model, I can't find the vedio branch. And what does text_glove_average_emotion.pkl mean? And what's the difference between audio_embeddings_feature_selection_emotion.pkl and audio_emotion.pkl?

missing dialog 110 utterance 7

The raw dataset, in dev dataset is missing the dialog 110-utterance 7 video.

The Dataloader class is not getting loaded

The following error is popping up

Traceback (most recent call last):
File "C:/Users/Sanil Andhare/.PyCharm2019.1/FFP/baseline/baseline.py", line 10, in
from baseline.data_helpers import Dataloader
File "C:\Users\Sanil Andhare.PyCharm2019.1\FFP\baseline\baseline.py", line 10, in
from baseline.data_helpers import Dataloader
ModuleNotFoundError: No module named 'baseline.data_helpers'; 'baseline' is not a package

Process finished with exit code 1

About the sequence length and sentence length.

Hi, Meld crew.
I tried running baseline code with text-only sentiment classification, and it worked. However, I got one question in baseline.py (Line 124). It's about the input_length in Embedding layer. I think it should be the sentence_length instead of the sequence_length, since the 2nd dimension of the concatenated_tensor is a negative number (-48) in my case.

Baseline results=0

Hi, I have tried the bc_LSTM baseline with bimodal in emotion classification, but the F1-score and accuracy of 'fear' and 'disgust' are always zero, so I can't reproduce the result in paper.

The command I use:

python baseline.py -classify emotion -modality bimodal -train

The results:

      precision    recall  f1-score   support

   0     0.7322    0.7795    0.7551      1256
   1     0.4799    0.4662    0.4729       281
   2     0.0000    0.0000    0.0000        50
   3     0.2781    0.2019    0.2340       208
   4     0.4813    0.5448    0.5111       402
   5     0.0000    0.0000    0.0000        68
   6     0.3832    0.4377    0.4087       345

The emotion labels:

Emotion - {'neutral': 0, 'surprise': 1, 'fear': 2, 'sadness': 3, 'joy': 4, 'disgust': 5, 'anger': 6}.

I know the main strategy is to adjust the class weight. To be hnoest, I'm new to tensorflow. I don't know what codes are needed to add to achieve it. Could you please give me some suggestion?

Best wishes>

audio features

You mention that feature selection was done using opensmile with initial feature set of 6373, and then feature selection was performed.

What is the config file used for feature extraction ? is it Compare_2016? and how exactly did you do the feature selection? is it possible to provide the indices or names of selected features? Also, audio_emotion.pkl has 122 features that are all zeros out of the 300 selected so they do not provide any information

Error on the function method test_model

Whenever below is called,

def test_model(self):
    model = load_model(self.PATH)
    intermediate_layer_model = Model(
    input=model.input, output=model.get_layer("utter").output)

tensorflow throws an error

Traceback (most recent call last):
  File "baseline/baseline.py", line 288, in <module>
    model.train_model()
  File "baseline/baseline.py", line 228, in train_model
    self.test_model()
  File "baseline/baseline.py", line 235, in test_model
    intermediate_layer_model = Model(input=model.input, output=model.get_layer("utter").output)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 262, in __init__
    'name', 'autocast'})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 778, in validate_kwargs
    raise TypeError(error_message, kwarg)
TypeError: ('Keyword argument not understood:', 'input')

Perhaps utter is not the right name of the layer?

Running on python3.6, running below python packages

tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorboardcolab==0.0.22
tensorflow==2.3.0
tensorflow-addons==0.8.3
tensorflow-datasets==2.1.0
tensorflow-estimator==2.3.0
tensorflow-gcs-config==2.3.0
tensorflow-hub==0.9.0
tensorflow-metadata==0.24.0
tensorflow-privacy==0.2.2
tensorflow-probability==0.11.0
Keras==2.4.3
Keras-Preprocessing==1.1.2
keras-vis==0.4.1

Absense of No.60 dialogue in data/MELD/train.csv

Hi professor,

I find that there is no No.60 dialogue in data/MELD/train.csv, can you check it?

Best wishes,
Mian

declare-lab / meld Goto Github PK

meld's Introduction

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Note

Leaderboard

Updates

Research Works using MELD

Introduction

Example Dialogue

Dataset Statistics

Dataset Distribution

Purpose

Dataset Creation

Paper

Download the data

Description of the .csv files

Column Specification

The files

Description of Pickle Files

Data pickle files:

Model output pickle files:

Description of Raw Data

Reading the Data

Labelling

Class Weights

Run the baseline

Citation

Multimodal EmoryNLP Emotion Recognition Dataset

Description

Dataset Statistics

Dataset Distribution

Data

Description of the .csv files

Column Specification

Citation

meld's People

Stargazers

Watchers

Forkers

meld's Issues

Recommend Projects

Recommend Topics

Recommend Org