yeexiaozheng / multimodal-sentiment-analysis Goto Github PK

View Code? Open in Web Editor NEW

173.0 2.0 20.0 439 KB

多模态情感分析——基于BERT+ResNet的多种融合方法

License: Apache License 2.0

Python 100.00%

multimodal multimodal-sentiment-analysis pytorch torchvision transformer attention

multimodal-sentiment-analysis's Introduction

Hi there 👋

🔭 I’m currently working on ECNU & ByteDance(dimission now)
🌱 I’m currently learning BlockChain
👯 I’m looking to collaborate on unique picture method and some blockchain project
🤔 I’m looking for help with how to deal with unique picture
💬 Ask me about MachineLeaning, DeepLearning, BlockChain
📫 How to reach me: QQ: 1102100299/Mail: [email protected](please star the repository which you ask me thanks uu)

multimodal-sentiment-analysis's People

Stargazers

Watchers

multimodal-sentiment-analysis's Issues

你好我有一些新的idea，基于你这个项目，可以加你的微信聊聊吗

结构图是怎么画的？

佬，又来麻烦你了~
我现在想要为模型画上结构图，看到你放的图，觉得蛮好，就像问下是怎么画的？我还想画更详细点。

请问可以用中文数据train 吗

您好，我想用中文的数据集跑一下训练，请问除了bert的预训练模型之外，还有什么需要改的地方吗，非常需要您的建议与指导，谢谢！

trainer文件中的代码疑问

pred, loss = self.model(texts, texts_mask, imgs, labels=labels)以及下面的几个同样调用model实例对象的代码，好像并没有用forward()函数，而且各个模型的代码中Model类中并没有声明__call__函数。

佬，我想调用训练好的模型对用户输入的文本-图像数据进行情感分类

由头
小弟初学AI，毕设是做多模态情感分析，刚好搜到佬的项目，于是怀着膜拜之心下载、阅读、训练。我现在想的是做一个前端，用户输入文本和图像，然后调用训练好的模型对其进行情感分类。
问题
但是我看不太懂你的数据处理方式，具体来说我无法处理用户输入的文本-图像数据，我发现你的数据貌似是要使用json格式进行提取（我不懂哈），然后我就怎么着也不能完美的处理外界数据。
代码

1.我在项目目录中新建了web目录，在web目录下新建run_model.py文件（其余web目录下的文件暂时忽略/(ㄒoㄒ)/~~），如下：

2.我已经训练好了模型

3.这是run_model.py文件内容：
``
import sys
import os

指定自定义streamlit模块的绝对路径

custom_streamlit_path = 'E:/aGraduation_Program_File/Multimodal-Sentiment-Analysis-main/'

如果这个路径还不在sys.path中，就添加进去

if custom_streamlit_path not in sys.path:
sys.path.insert(0, custom_streamlit_path)

现在，尝试导入streamlit

import streamlit as st

from PIL import Image
import torch
from torchvision import transforms
#from transformers import AutoTokenizer
import io
from Models.OTEModel import Model # 根据实际使用的模型进行调整
from utils.DataProcess import LabelVocab # LabelVocab在Dataprocess.py中定义
from utils.APIs.APIDecode import api_decode # 解码模型输出的函数
from Config import config # 导入配置信息

配置信息

config = config()

加载模型

def load_model(model_path):
model = Model(config) # 假设模型初始化需要配置信息
model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
model.eval()
return model

model_path = "E:/aGraduation_Program_File/Multimodal-Sentiment-Analysis-main/output/OTE/pytorch_model.bin"
model = load_model(model_path)

初始化tokenizer和LabelVocab

#tokenizer = AutoTokenizer.from_pretrained(config.bert_name)
label_vocab = LabelVocab()

图像预处理函数

def process_image(image):
def get_resize(image_size):
for i in range(20):
if 2 ** i >= image_size:
return 2 ** i
return image_size

img_transform = transforms.Compose([
    transforms.Resize(get_resize(config.image_size)),
    transforms.CenterCrop(config.image_size),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

image = Image.open(io.BytesIO(image.read()))
return img_transform(image).unsqueeze(0)

文本预处理函数

'''def process_text(text):
text = text.replace('#', '')
tokens = tokenizer('[CLS]' + text + '[SEP]', return_tensors="pt", padding=True, truncation=True, max_length=512)
return tokens['input_ids'], tokens['attention_mask']
'''
from transformers import RobertaTokenizer

从本地路径加载 'roberta-base' 分词器

tokenizer = RobertaTokenizer.from_pretrained('C:/Users/Lenovo/.cache/huggingface/hub/models--roberta-base')

def preprocess_text(text, max_length=512):
"""
对给定文本进行预处理，生成input_ids和attention_mask。

参数:
- text (str): 输入文本。
- max_length (int): 文本的最大长度。

返回:
- input_ids (torch.Tensor): 文本的input ids。
- attention_mask (torch.Tensor): 文本的attention mask。
"""
# 使用encode_plus方法对文本进行编码
encoded_dict = tokenizer.encode_plus(
    text,  # 输入文本
    add_special_tokens=True,  # 添加特殊标记
    max_length=max_length,  # 设定最大文本长度
    padding='max_length',  # 填充至max_length长度
    truncation=True,  # 超过max_length会被截断
    return_attention_mask=True,  # 返回attention mask
    return_tensors='pt',  # 返回PyTorch张量
)

input_ids = encoded_dict['input_ids']
attention_mask = encoded_dict['attention_mask']
print("run_model")
print(input_ids.shape, attention_mask.shape)
return input_ids, attention_mask

'''
def preprocess_text(text, max_length=512):
"""
对给定文本进行预处理，生成input_ids和attention_mask。

参数:
- text (str): 输入文本。
- max_length (int): 文本的最大长度。

返回:
- input_ids (torch.Tensor): 文本的input ids。
- attention_mask (torch.Tensor): 文本的attention mask。
"""
# 使用encode_plus方法对文本进行编码
encoded_dict = tokenizer.encode_plus(
    text,  # 输入文本
    add_special_tokens=True,  # 添加'[CLS]'和'[SEP]'
    max_length=max_length,  # 设定最大文本长度
    pad_to_max_length=True,  # 填充至max_length长度
    return_attention_mask=True,  # 返回attention mask
    return_tensors='pt',  # 返回PyTorch张量
)

input_ids = encoded_dict['input_ids']
attention_mask = encoded_dict['attention_mask']

return input_ids, attention_mask

'''

应用标题

st.title("多模态情感分析")

文件上传器

uploaded_image = st.file_uploader("上传图片", type=["jpg", "jpeg", "png"])
uploaded_text = st.text_area("输入文本", "")

if uploaded_image is not None and uploaded_text != "":
processed_image = process_image(uploaded_image)
input_ids, attention_mask = preprocess_text(uploaded_text)

# 假设你的模型接受已经处理的图像和文本作为输入，并返回情感分类结果
# 你需要根据你的模型具体实现调整下面的预测代码
with torch.no_grad():
    prediction = model(processed_image, input_ids, attention_mask)
    sentiment = api_decode(prediction, label_vocab)  # 假设api_decode能够根据预测结果返回情感标签

    st.write(f"预测情感: {sentiment}")

``
run_model.md

4.运行结果如下：

关于数据集和融合方式的一些小问题

您好，我想问一下找个数据集是自己爬取的还是公开数据集，另外几种融合方式是参考论文的还是自己想出来的，如果是有公开数据集或者参考的论文，可以发一下吗，非常感谢您

您好，请问可以训练iemocap和meld数据集吗？

attention疑问

    attention_out = self.attention(torch.cat(
        [text_feature.unsqueeze(0), img_feature.unsqueeze(0)],
    dim=2)).squeeze()

请问，上面代码是在对同一batch内的不同样本进行attention操作么？

调用模型后输出的标签不正常

1.佬，又来打扰你了。上次我改了一下还是有好多问题（T~T）。不过还是一定要蟹蟹你呢！感谢大佬抽时间看我这个小菜鸡的issue！
2.现在我遇到的问题如下：
模型接受数据：
# 模型输入 model_input = { 'texts': text_input_ids, # 将 input_ids 改为 texts 'texts_mask': text_attention_mask, # 将 attention_mask 改为 texts_mask 'imgs': image_tensor, # 'guids' 和 'labels' # 'guids': torch.tensor([guid]), # 'labels': torch.tensor([tokenizer.label_vocab.label_to_id(label)]),
得到的输出：

3.全部代码如下：
import torch
from PIL import Image
from torchvision import transforms
from transformers import RobertaTokenizer
from Models.OTEModel import Model # 根据实际使用的模型进行调整
from utils.DataProcess import Processor
from utils.DataProcess import LabelVocab

导入配置信息，这里假设 Config 类已经定义并提供了必要的配置

from Config import config

初始化配置

config = config()

模型路径和配置

model_path = "E:/aGraduation_Program_File/Multimodal-Sentiment-Analysis-main/output/OTE/pytorch_model.bin"
model = Model(config)

加载模型权重

model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
model.eval()

用户输入的数据

guid = 0
label = 'null'
text = 'so fast'
image_path = 'E:/aGraduation_Program_File/Multimodal-Sentiment-Analysis-main/web/Cavendish.jpg'

文本预处理

tokenizer = RobertaTokenizer.from_pretrained('C:/Users/Lenovo/.cache/huggingface/hub/models--roberta-base') # 初始化分词器
text_tokens = tokenizer.encode(text, add_special_tokens=True)
text_input_ids = torch.tensor([text_tokens]) # 转换为张量并添加批次维度
text_attention_mask = torch.ones_like(text_input_ids, dtype=torch.long)

图像预处理

image = Image.open(image_path)
preprocess = transforms.Compose([
transforms.Resize((224, 224)), # 调整图像大小为 224x224
transforms.ToTensor(), # 将PIL图像转换为Tensor
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # 归一化
])
image_tensor = preprocess(image).unsqueeze(0) # 添加批次维度

模型输入

model_input = {
'texts': text_input_ids, # 将 input_ids 改为 texts
'texts_mask': text_attention_mask, # 将 attention_mask 改为 texts_mask
'imgs': image_tensor,
# 'guids' 和 'labels'
# 'guids': torch.tensor([guid]),
# 'labels': torch.tensor([tokenizer.label_vocab.label_to_id(label)]),
}
processor = Processor(config)
label_vocab = LabelVocab()

使用模型进行预测

with torch.no_grad():
outputs = model(**model_input)

print('pred_label'：',outputs)
print('输出的格式为：',type(outputs))

run_model_noweb.md
4.我想是不是pred_label需要进行解码，但我进行解码后还是不行。所以是哪出错了？呜呜呜...大佬，救我！