请问大佬测试过HAN模型吗?我训练的时候会报RuntimeError: CUDA error: device-side assert triggered,请问是什么原因呢?
请帮忙解答,非常感谢!
error log:
0it [00:00, ?it/s]Building prefix dict from the default dictionary ...
hierattnet
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model from cache /tmp/jieba.cache
Loading model cost 2.178 seconds.
Loading model cost 2.178 seconds.
Prefix dict has been built successfully.
Prefix dict has been built successfully.
50000it [06:22, 130.57it/s]
5000it [00:37, 132.38it/s]
10000it [01:20, 123.86it/s]HierAttNet(
(word_att_net): WordAttNet(
(dropout): Dropout(p=0.5, inplace=False)
(embedding): Embedding(144241, 300)
(rnn): GRU(300, 64, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
)
(sent_att_net): SentAttNet(
(rnn): GRU(128, 64, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
(fc): Linear(in_features=128, out_features=10, bias=True)
)
)
Trainable parameters: 398602
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block:
……
……
……
[14,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Traceback (most recent call last):
File "train.py", line 134, in
run('configs/multi_classification/han_config.json')
File "train.py", line 105, in run
main(config, use_transformers=False)
File "train.py", line 80, in main
trainer.train()
File "/home/work/zzk/text_classification/base/base_trainer.py", line 67, in train
result = self._train_epoch(epoch)
File "/home/work/zzk/text_classification/trainer/trainer.py", line 52, in _train_epoch
output = self.model(input_token_ids,bert_masks, seq_lens).squeeze(1)
File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/work/zzk/text_classification/model/model.py", line 404, in forward
word_output, hidden = self.word_att_net(input_token_ids,seq_lens)
File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/work/zzk/text_classification/model/model.py", line 473, in forward
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, sorted_seq_lengths, batch_first=self.batch_first)
File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 234, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered