Code Monkey home page Code Monkey logo

internevo's Introduction

InternEvo

👋 join us on Discord and WeChat

Latest News 🔥

  • 2024/01/17: To delve deeper into the InternLM series of models, please check InternLM in our organization.

Introduction

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies. With a single codebase, it supports pre-training on large-scale clusters with thousands of GPUs, and fine-tuning on a single GPU while achieving remarkable performance optimizations. InternEvo achieves nearly 90% acceleration efficiency during training on 1024 GPUs.

Based on the InternEvo training framework, we are continually releasing a variety of large language models, including the InternLM-7B series and InternLM-20B series, which significantly outperform numerous renowned open-source LLMs such as LLaMA and other leading models in the field.

Quick Start

Please refer to Usage Tutorial to start InternEvo installation, data processing, pre-training and fine-tuning.

For more details, please check internevo.readthedocs.io

System Architecture

Please refer to the System Architecture document for architecture details.

Performance

InternEvo deeply integrates Flash-Attention, Apex and other high-performance model operators to improve training efficiency. By building the Hybrid Zero technique, it achieves efficient overlap of computation and communication, significantly reducing cross-node communication traffic during training. InternEvo supports expanding the 7B model from 8 GPUs to 1024 GPUs, with an acceleration efficiency of up to 90% at the thousand-GPU scale, a training throughput of over 180 TFLOPS, and an average of over 3600 tokens per GPU per second. The following table shows InternEvo's scalability test data at different configurations:

GPU Number 8 16 32 64 128 256 512 1024
TGS 4078 3939 3919 3944 3928 3920 3835 3625
TFLOPS 193 191 188 188 187 185 186 184

TGS represents the average number of tokens processed per GPU per second. For more performance test data, please refer to the Training Performance document for further details.

Contribution

We appreciate all the contributors for their efforts to improve and enhance InternEvo. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.

Acknowledgements

InternEvo codebase is an open-source project contributed by Shanghai AI Laboratory and researchers from different universities and companies. We would like to thank all the contributors for their support in adding new features to the project and the users for providing valuable feedback. We hope that this toolkit and benchmark can provide the community with flexible and efficient code tools for fine-tuning InternEvo and developing their own models, thus continuously contributing to the open-source community. Special thanks to the two open-source projects, flash-attention and ColossalAI.

Citation

@misc{2023internlm,
    title={InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities},
    author={InternLM Team},
    howpublished = {\url{https://github.com/InternLM/InternLM}},
    year={2023}
}

internevo's People

Contributors

00index avatar blankde avatar del-zhenwu avatar gaoyang07 avatar harold-lkk avatar hellock avatar huangting4201 avatar jiaopl avatar kimmishi avatar kkscilife avatar leeeizhang avatar li126com avatar lvhan028 avatar mwiacx avatar pryest avatar sallyjunjun avatar solenoidwgt avatar sunpengsdu avatar vansin avatar x54-729 avatar yhcc avatar yingtongxiong avatar ywmditto avatar zachtzy avatar zaglc avatar zehuichen123 avatar zhangxc11 avatar zhjunqin avatar zigzagcai avatar zwwwayne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

internevo's Issues

[Feature] only overlap sync_grad in pp0 with pipeline parallelism

Describe the feature

image only overlap sync_grad in pp0 with pipeline parallelism

if the network is poor, sync_grad may be the main performance bottleneck, and the pipeline bubble could be huge if we overlap sync_grad with computation, since pp0 must wait the communication of other stages in current implementation

Will you implement it?

  • I would like to implement this feature and create a PR!

[QA] 并行训练

描述问题

请问目前支持哪些并行训练方式,和训练框架和deepspeed,megatron,fsdp这些的区别和联系?谢谢

请问是否支持internLM2[QA]

Describe the question.

目前在internLM2的教程中只有XTuner版本的,InternEvo的版本尚未发布吗,请问有计划什么时候发布吗?请问InternEVO相对于XTuner的区别在哪?

[Feature] use consist way for get device

Describe the feature

currently we use several ways to get device

  • internlm_accelerator.device()
  • internlm_accelerator.current_device()
  • from internlm.utils.common import get_current_device

we should the consist way to perform the get_device operation.

In addition, the interface " internlm_accelerator.device()" is necessary?

Will you implement it?

  • I would like to implement this feature and create a PR!

[Typo] `schedulder` -> `scheduler`

Describe the question.

A typo is found when loading and saving scheduler states:

scheduler_states = llm_load(os.path.join(ckpt_path, "schedulder.pt"))

Maybe it's a better option to aggregate all constant variables into a single module and modify the file directly for global change?

https://github.com/huggingface/transformers/blob/efdd436663436e78d8ad3213d11325d86578db95/src/transformers/trainer.py#L246-L253

[Bug] 使用moe的config微调报错

描述该错误

非常感谢您的工作!
我在使用代码进行sft时遇到了一个问题。在不使用moe的config时能够很好的运行,在使用moe的config文件后报错。
运行代码:

torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_MoE4_sft.py --launcher "torch"

报错信息:

Traceback (most recent call last):
  File "train.py", line 324, in <module>
    main(args)
  File "train.py", line 105, in main
    model = initialize_model()
  File "/root/wbq/internlm_moe/InternEvo/internlm/utils/timeout.py", line 102, in wrapper
    result = func(*args, **kwargs)
  File "/root/wbq/internlm_moe/InternEvo/internlm/train/pipeline.py", line 167, in initialize_model
    model = MODEL_INITIALIZER.get_module(module_name=gpc.config.model_type)(**(gpc.config.model))
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 584, in build_model_with_moe_cfg
    return _build_generic_model_1d(num_layers=num_layers, num_chunks=num_chunks, **cfg)
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 482, in _build_generic_model_1d
    chunk = PackedFlashInternLm1D(**filter_kwargs(PackedFlashInternLm1D.__init__, kwargs)).to(device)
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 356, in __init__
    [
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 357, in <listcomp>
    PackedFlashBaseLayer1D(
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modeling_moe.py", line 94, in __init__
    self.mixer = MHA(
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modules/multi_head_attention.py", line 364, in __init__
    self.rotary_emb = RotaryEmbedding(
  File "/root/wbq/internlm_moe/InternEvo/internlm/model/modules/embedding.py", line 287, in __init__
    self.inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2, device=device, dtype=torch.float32) / dim))
TypeError: arange() received an invalid combination of arguments - got (int, int, int, dtype=torch.dtype, device=device), but expected one of:
 * (Number end, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (Number start, Number end, *, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (Number start, Number end, Number step, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

环境信息

torch==2.1.0+cu118
transformers<4.30.0
sentencepiece
numpy
tqdm
psutil
packaging
pre-commit
ninja
gputil
pytest
packaging
boto3
botocore
torch-scatter
pyecharts
py-libnuma
pynvml
tensorboard

其他信息

1、我只修改了./configs/7B_MoE4_sft.py中训练集和测试集的地址

[Bug] 昇腾910微调internLM报错

Describe the bug

Traceback (most recent call last):
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/queues.py", line 368, in put
self._writer.send_bytes(obj)
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/queues.py", line 368, in put
self._writer.send_bytes(obj)
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
BrokenPipeError: [Errno 32] Broken pipe
File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)

During handling of the above exception, another exception occurred:

File "/root/miniconda3/envs/internLM/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
Traceback (most recent call last):
BrokenPipeError: [Errno 32] Broken pipe

Environment

python==3.8
torch==2.0.1

Other information

No response

[Feature] torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors.

Describe the feature

***/evo_runner/_work/InternEvo/InternEvo/internlm/solver/optimizer/utils.py:389: UserWarning: The torch.cuda.DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)

Will you implement it?

  • I would like to implement this feature and create a PR!

[Feature] define a new config named "use_packed_dataset"

Describe the feature

Currently, we only use "use_flash_attention" to define that the dataset is packed. In the future, we need to extend this ability to multiple chips. Thus we need to define a new configuration named use_packed_dataset to control the logic in the training system in stead of always using "use_flash_attention". The default value would be true.

Will you implement it?

  • I would like to implement this feature and create a PR!

在安装docker环境时,总是爆出这个错误,无法解决

描述该错误

make -f docker.Makefile BASE_OS=ubuntu20.04 时,总是会出一个错误,无法解决。发生在[intrenlm-dev 3/3] RUN git submodule update --init --recursive 这一步

环境信息

ERROR: failed to solve: process "/bin/sh -c git submodule update --init --recursive && /opt/conda/bin/pip --no-cache-dir install -r requirements/torch.txt && /opt/conda/bin/pip --no-cache-dir install -r requirements/runtime.txt && cd /InternLM/third_party/flash-attention && /opt/conda/bin/python setup.py install && cd ./csrc && cd fused_dense_lib && /opt/conda/bin/pip install -v . && cd ../xentropy && /opt/conda/bin/pip install -v . && cd ../rotary && /opt/conda/bin/pip install -v . && cd ../layer_norm && /opt/conda/bin/pip install -v . && cd ../../../../ && cd ./third_party/apex && /opt/conda/bin/pip --no-cache-dir install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && /opt/conda/bin/pip cache purge && rm -rf ~/.cache/pip" did not complete successfully: exit code: 1
make: *** [docker.Makefile:103: devel-image] Error 1

其他信息

有什么方法可以解决这问题。
所给的docker创建文件是否正确
RUN git submodule update --init --recursive 想知道这一步在docker文件的哪一部分,想要先注释掉,后续再安装

[Feature] Should we remove other dependency of flashattention?

Describe the feature

Should we remove other dependency of flash-attention, and only keep the core attention related ops?

If possible, we can only use pip to install flash-attention, avoiding a lot of compiling operations.

To seek whether it is possible, we need to check whether it would reduce the training performance a lot.

Will you implement it?

  • I would like to implement this feature and create a PR!

[Feature] Support customized model size for training

描述该功能

hi there,

could you give some suggestions for training small model size, such as 1B or 3B, and related configurations?

thanks a ton!

是否希望自己实现该功能?

  • 我希望自己来实现这一功能,并向 InternLM 贡献代码!

[Bug]当数据不够的时候,会出现StopIteration。

描述该错误

在我的数据不足以跑完整个totalstep的时候,会出现StopIteration的报错。原因是虽然在外面使用了try-except,但是在第512行的next(train_state.batch_sampler_iter仍然会出现越界,原因是train_state.batch_sampler也跟着迭代,所以即使train_state.batch_sampler_iter重新赋值,仍然会越界。

环境信息

image

其他信息

做了如下修改之后,可以跑通。
image
image

[Feature] 放松依赖版本限制

描述该功能

目前仓库对pip安装的依赖库版本有严格的指定,对于广泛的使用会有一定的不便。需要逐步放松对应的限制

是否希望自己实现该功能?

  • 我希望自己来实现这一功能,并向 InternLM 贡献代码!

[Bug] do not use torch.cuda.current_device() as device, since it only retures an int

Describe the bug

we have a lot of cases like following:

data = torch.empty(partition_size, dtype=tensor.dtype, device=torch.cuda.current_device(), requires_grad=False)

where we directly use device=torch.cuda.current_device(). However, it is not recommended to do like it, since torch.cuda.current_device() only returns device id. It is OK to run such codes on GPUs. However, maybe there are some problems when running on NPU

Environment

python3.8 + torch2.1

Other information

No response

[Feature] random dataset supports to define the seq_length for generation

Describe the feature

In a lot of cases, we need full seq_length for performance test with the random_dataset. Currently, we should use with pack_sample_into_one to achieve this goal. However, it requires us to use flash_attention since it actually generates packed dataset. Thus, we need the random_dataset has the ability to generate full seq_length samples

Will you implement it?

  • I would like to implement this feature and create a PR!

[Bug] support profiling on NPU

Describe the bug

we need to automatically switch the profiling, current we force to use activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],

Environment

PyTorch2.1

Other information

No response

[Feature] CPU synchronization Problem

Describe the feature

Some CPU synchronizations block the GPU kernel, leading to bubbles between GPU kernels. It should be optimized in the future.

  1. item() in rotary embedding.
  2. moe_loss construction.

Will you implement it?

  • I would like to implement this feature and create a PR!

[Feature] Partially frozen model support

Describe the feature

Current implementation does not support partially training, i.e., part of the model parameters are frozen.
If I understand it correctly, the assertion around line 584 in hybrid_zero_optim.py suggests that all parameters should be involved in the training.
image

Very looking forward this feature being implemented as there are many scenarios where users only want to finetune parts of the model.

Is there any plan on this feature?

Will you implement it?

  • I would like to implement this feature and create a PR!

[Bug] internlm docker image issue

Describe the bug

The container version of internlm has not been built successfully. After pulling the image and entering the container, it was found that it is not possible to directly use the container for training and inference. The environment inside the container also differs significantly from the actual runtime environment needed.

Environment

The container version of internlm has not been built successfully. After pulling the image and entering the container, it was found that it is not possible to directly use the container for training and inference. The environment inside the container also differs significantly from the actual runtime environment needed.

Other information

No response

[Bug] pip安装InternEvo,无法通过__version__获取版本信息

Describe the bug

执行如下代码,报错

>>> import InternEvo
>>> print(InternEvo.__version__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'InternEvo' has no attribute '__version__'

Environment

python3.10

Other information

No response

升级CUDA版本以支持Windows版本的flash-attention

描述该功能

目前能编译出来的windows版本的flash-attention是依赖cu121+py310+torch2.1
而InternEvo又只依赖cu118,导致两个库冲突了,无法在windows上训练
未来会有计划升级到cu121吗?谢谢!

是否希望自己实现该功能?

  • 我希望自己来实现这一功能,并向 InternLM 贡献代码!

[Doc] https://arxiv.org/pdf/2401.09149.pdf Typo

📚 The doc issue

Conversely, if a = 1,InternEvo utilizes 34bSH bytes for activation storage. -> Conversely, if a = 0,InternEvo utilizes 34bSH bytes for activation storage.

Suggest a potential alternative/fix

No response

[Bug] 训练报错indexSelectLargeIndex: block: [604,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

描述该错误

2024-04-19 06:06:37,071 INFO writer.py:60 in init_tb_writer -- Login tensorboard logs to: RUN/7b_internlm2_train/04-19-06.06.02/tensorboards 2024-04-19 06:06:37,761 ERROR train.py:307 in <module> -- Raise exception from c394df8c9997 with rank id: 0 Traceback (most recent call last): File "/data/InternEvo/train.py", line 305, in <module> main(args) File "/data/InternEvo/train.py", line 215, in main _, _, loss = trainer.execute_schedule( File "/data/InternEvo/internlm/core/trainer.py", line 213, in execute_schedule return self._schedule.forward_backward_step(self._engine, /data_iter, **kwargs) File "/data/InternEvo/internlm/utils/timeout.py", line 102, in wrapper result = func(*args, **kwargs) File "/data/InternEvo/internlm/core/scheduler/no_pipeline_scheduler.py", line 220, in forward_backward_step _output, _loss, _moe_loss = self._train_one_batch( File "/data/InternEvo/internlm/core/scheduler/no_pipeline_scheduler.py", line 125, in _train_one_batch output = self._call_engine(engine, /data) File "/data/InternEvo/internlm/core/scheduler/base_scheduler.py", line 86, in _call_engine return engine(**inputs) File "/data/InternEvo/internlm/core/engine.py", line 164, in __call__ return self.model(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/InternEvo/internlm/core/naive_amp.py", line 155, in forward out = self.model(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/InternEvo/internlm/model/modeling_internlm2.py", line 934, in forward hidden_states = self.tok_embeddings(input_ids) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/InternEvo/internlm/model/modules/embedding.py", line 66, in forward output = F.embedding(input_, self.weight, self.padding_idx, *self.embed_args, **self.embed_kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

terminate called after throwing an instance of 'std::runtime_error' what(): [Rank 2] NCCL watchdog thread terminated with exception: CUDA error: device-side assert triggered Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f97371d5617 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f973719098d in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f9737286518 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f973868a150 in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f973868df78 in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x24b (0x7f97386a47bb in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f97386a4ac8 in /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xd6bf0 (0x7f97a7af3bf0 in /usr/local/gcc-10.2.0/lib64/libstdc++.so.6)
frame #8: + 0x8609 (0x7f97d2bff609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f97d29ca133 in /usr/lib/x86_64-linux-gnu/libc.so.6)

../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [42,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [43,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [44,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [45,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [46,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [47,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [48,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [49,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [50,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [604,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelect

环境信息

软件环境: ubuntu 官方镜像
硬件环境:A800

其他信息

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.