ddlbojack / emotion2vec Goto Github PK

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 96.10% Shell 2.35% Jupyter Notebook 1.56%

iemocap pytorch-implementation speech-emotion-recognition speech-representation

emotion2vec's Introduction

Hi there 👋

emotion2vec's People

Contributors

Stargazers

Watchers

emotion2vec's Issues

The WeChat group QR code has expired again

其实我是有一个需求，是长音频需要切片算情感分类概率，比如每5s得到一个，但是目前pipeline api封装得太死了，不支持这么操作，只支持全局平均算出一个。如果pipeline接口能额外输入一个切片长度，得到的概率向量多一个时间维度，就好了

The WeChat group QR code has expired.

Please update the QR code.

extreactfeature won't work with the new models

extrafeature only work with the base model. is there any plan to fix this?

Inference

Thank you for providing the code!
I am a novice in the field of SER. I have trained the downstream model using the provided train.npy, train.lengths, and train.emo files, but I'm unsure how to use the obtained model for category inference on the features within train.npy.
I noticed that the shape of the train.npy you provided is (1253877, 768). In my understanding, it represents 1253877 samples with 768-dimensional features each. I would like to classify these 1253877 samples using the pre-trained model. How can I achieve this?

About reproducing data2vec2 results

When loading the data2vec2 model using fairseq. checkpoint_utils. load_model_ensemble_and_task ([ckpt_path]), an error occurred while loading the data2vec2 model: KeyError : "_name", Could you please tell me how to solve the problem of loading the model

Info about checkpoint file

Hi @ddlBoJack,

Please share some information about the checkpoint file shared in the readme. Is it the best performing model so far?

Also the train.py file given for IEMOCAP, is it the frame-level or utterance level features?

Thanks,

Emotion2Vec Pretraining code

Thank you for your contribution; your work is truly amazing. However, I would like to train emotion2vec for a pretraining task. Could you provide the source code or offer any suggestions?

The WeChat group QR code has expired

sry for missing the last update

What is Emo-262?

What is the dataset Emo-262? Does your group collect it and will it be available for the public? How can I get it?

Hint: The word LSSED in the Table 2 caption is wrong and was written as LSED. Maybe you can check your paper writing.

Finetuning

Could you please share the script to train the network for upstream task? I want to finetune the model.

Thanks!

A question

Hey Author , Thanks for the opensource

I wanted to ask if emotion2vec is better than https://github.com/audeering/w2v2-how-to

Thanks in advance

群聊的二维码过期了

您好，群聊的二维码过期了

Wechat Group application

Hello! One of my work recently used Emotion2Vec. Could I join this group chat to communicate with you? My wechat can be get by my profile picture(QR code) If you are not busy, you can get my wechat by scanning it! Thank you very much.

About platform

I want to know if the emotion2vec can run on arm server.

Two key models in finetune without annotated data

非常感谢作者开源这么好的情绪预训练模型。

我在modelscope上看到有这样的描述：
首先使用语音情感识别学术数据集fine-tune emotion2vec，然后对15万小时中英数据进行标注，筛选文本情感与语音情感相同，并且置信度高的数据。
请问能否开源下文本情绪模型和采用学术数据集训练的语音情绪模型吗，我想基于此方法训练一个3分类模型。

谢谢！

Request for test and dev files

Dear Authors,

You have only shared the train.npy, train.lengths, train.emo in the iemocap_downstream folder.
Do you mind sharing also the test and dev versions of the files? This will make testing your models more convenient.

Thank you in advance.

Best regards,
Aaron

微信群

你好可以更新微信群二维码吗

二维码过期了

_MISSING_TYPE

omegaconf.errors.ValidationError: Object of unsupported type: '_MISSING_TYPE'
full_key:
reference_type=None
object_type=None
Is this due to a software package conflict？I cant solve this problem.

群二维码过期了，请问能更新一下吗

如题

The WeChat group QR code has expired

OOM while processing IEMOCAP dataset

I was trying to create iemocap embedding on my own, but my GPU with 8GB memory gave me OOM from cuda. How much size do I need to process this?

fine-tuning pre train model

Hi, thank you very much for your work.

I want to continue to do some interesting work based on your work.
I have not found any related model fine-tuning on modelscore and github.
Can you please guide me on how to use your model for model fine-tuning and retraining?

many thanks

About feature layer

Thank you for sharing your nice work!

In the script emotion2vec_extract_features.sh, I noticed that features are extracted from the last layer.
Have you tried extracting features from other layers as well?
I'm just curious if this approach is based on empirical insight.

utterance embedding

How are utterance embedding obtained? Are they obtained from frame-level features through convolution or pooling?

Optimal segment length

Hello!

Thank you for such a nice work!

I am performing speaker diarization with pyannote, and want to use the audio segments which i recieve from the diarization model to perfrom emotion detection on them. The segments are of different sizes, I'm sure I'll have to do some kind of splitting because of the CUDA OOM for very long segments (like 200 sec), but I'm wondering what is the optimal segment size for the emotion2vec_plus_large model? 3 seconds, 15 seconds or whatever?

Thank you!

ddlbojack / emotion2vec Goto Github PK

emotion2vec's Introduction

Hi there 👋

emotion2vec's People

Contributors

Stargazers

Watchers

Forkers

emotion2vec's Issues

Recommend Projects

Recommend Topics

Recommend Org