Code Monkey home page Code Monkey logo

cluener2020's Issues

requirements.txt有问题

@ file:///tmp/build/80754af9/pillow_1603822238230/work
requirements.txt里的这些都是什么?

请问如何使用gpu来训练呢?

网上说的使用gpu训练有几个地方都需要设置,数据集、损失函数、模型. 代码里面模型的设置. 请问代码可以在gpu环境下运行吗

ValueError: mask of the first timestep must all be on

作者大大你好,我只修改了config.py里面的标签,数据里面的空格我用的‘,’代替的,然后跑BERT-LSTM-CRF是正常的,但是相同的数据在BERT-CRF时就会报这个错误,错误是在model.py中的loss = self.crf(logits, labels, loss_mask) * (-1)请问是什么原因呀,麻烦您啦

运行出错,关于BertConfig

按照readme中的python、torch版本都配置了
但是运行出错:loading weights file pretrained_bert_models/chinese_roberta_wwm_large_ext/pytorch_model.bin的时候
AttributeError: 'BertConfig' object has no attribute 'lstm_embedding_size'

ValueError: cannot copy sequence with size 37 to array axis with dimension 36

你好 我换成BIEOS数据标签后,test数据没有标签。我每个字添加一个临时标签都是O,
然后允许模型,出现了以下错误,请指教!

File "/NER/CLUENER2020/BERT-LSTM-CRF/train.py", line 83, in evaluate
    for idx, batch_samples in enumerate(dev_loader):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 560, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "NER/CLUENER2020/BERT-LSTM-CRF/data_loader.py", line 97, in collate_fn
    batch_labels[j][:cur_tags_len] = labels[j]

没有显示报错但是模型没有正常加载运行

系统日志如下:
2024-05-23 21:16:07,477:INFO: device: cuda:0
2024-05-23 21:16:07,478:INFO: --------Process Done!--------
2024-05-23 21:16:26,466:INFO: --------Dataset Build!--------
2024-05-23 21:16:26,467:INFO: --------Get Dataloader!--------
然后运行结束,由于cuda版本为12.1无法兼容低版本的pytorch,使用的pytorch版本为2.3
其余的包都正常安装,当时项目无法运行,求解答,谢谢!!

model.py中nn是哪个package里面来的?

model.py 中:

from transformers.models.bert.modeling_bert import *
from torch.nn.utils.rnn import pad_sequence
from torchcrf import CRF


class BertNER(BertPreTrainedModel):
    def __init__(self, config):
        super(BertNER, self).__init__(config)
        self.num_labels = config.num_labels

        self.bert = BertModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.bilstm = nn.LSTM(
            input_size=config.lstm_embedding_size,  # 1024
            hidden_size=config.hidden_size // 2,  # 1024
            batch_first=True,
            num_layers=2,
            dropout=config.lstm_dropout_prob,  # 0.5
            bidirectional=True
        )
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
        self.crf = CRF(config.num_labels, batch_first=True)

        self.init_weights()

比如这里self.dropout = nn.Dropout(config.hidden_dropout_prob), 并没有导入import torch.nn as nn 这里的nn是哪个package里面来的?

使用BIO标注问题

您好,我的数据集是BIO标注的,修改完相应的代码之后,在badcase中存在I-X标签单独出现的情况,请问这种情况是哪里出了问题,需要修改哪里的代码呢?

执行data_process.py得到.npz文件

作者你好,想问一下,我将test.json文件里的数据改了之后,重新执行data_process.py为什么没有生成对应的npz文件啊,是要指定文件名吗

Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.

Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it.
Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it.
Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.

data_process模块运行有误

np.savez_compressed API在运行时会提示ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10748,) + inhomogeneous part.
因为numpy列表不允许其元素为变长列表,我试过几次都无法正确执行将json转为.npz文件,都会提示这个错误

IndexError: index out of range in self

请问一下,在某些输入数据中,产生了这样的问题.
Traceback (most recent call last):
File "D:/Bert+Once/CLUENER2020/BERT-CRF/CheckBug.py", line 14, in
embed = embedding(input_to_embed)
File "D:\Bert+Once\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Bert+Once\venv\lib\site-packages\torch\nn\modules\sparse.py", line 145, in forward
return F.embedding(
File "D:\Bert+Once\venv\lib\site-packages\torch\nn\functional.py", line 1913, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
是什么原因呢.

如果替换成英文数据集

用bert-base-uncased对英文句子进行tokenize的时候,token 的长度自然与labels不一致,请问该如何处理呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.