openbmb / bminf Goto Github PK

View Code? Open in Web Editor NEW

571.0 571.0 67.0 2.55 MB

Efficient Inference for Big Models

License: Apache License 2.0

Python 75.79% Dockerfile 0.63% Makefile 0.88% Cuda 8.37% Shell 2.09% Jupyter Notebook 12.24%

deep-learning gpu pretrained-language-models

bminf's People

Contributors

Stargazers

Watchers

Forkers

jayzzhou-thu trendingtechnology tobran sxy0818 zhihao-chen allensmile chenchy zzg-971030 harry-zhou duanzhihua qhduan yingnengd thucsthanxu13 mirrorange bitallin mfkiwl shuhua886 xiaoqingnlp jonson5768 tan92hl adambear megahertz0 wuxie6424 jctime laoli2046 kevin99z wakafengfan pennlee2018 vincentkan swagshaw lucasjinreal kunlun-zhu yzy-thu baai-wudao gavin90s clpl littlepeachs beatlesctr skytodmoon microipv6 cara1i tolatolatop ilovenlp001 hyscyclone bug2create wqlpositive ericxsun ftgreat unix1986 derekliu-hz fudp 15737939656 mars-wei yanjiangjerry tanghui315 zhaoqinghehehe kfiring chenhaiwu lxh5431 sdake balleybai msgpo trustedsoftware sanyaade-projects 5l1v3r1 cuiyuheng

bminf's Issues

RuntimeError: CUBLAS error: CUBLAS_STATUS_NOT_INITIALIZED [BUG]

running the example file fill_blank.py, it raise error as follows:

Loading model
Start
Input:  北京环球度假区相关负责人介绍，北京环球影城指定单日门票将采用____制度，即推出淡季日、平季日、旺季日和特定日门票。____价格为418元，____价格为528元，____价格为638元，____价格为____元。北京环球度假区将提供90天滚动价格日历，以方便游客提前规划行程。
Traceback (most recent call last):
  File "abc.py", line 28, in <module>
    main()
  File "abc.py", line 25, in main
    fill_blank(cpm2, input_text)
  File "abc.py", line 9, in fill_blank
    for result in cpm2.fill_blank(text,
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/models/cpm2.py", line 245, in fill_blank
    for token in res:
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/models/cpm2.py", line 129, in _gen_iter
    self._model.embedding(
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/arch/t5/model.py", line 165, in embedding
    self.input_embedding.embedding_forward(ctx, tensor_ids, x_out)
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/layers/embedding.py", line 27, in embedding_forward
    ck.embedding_forward(
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/embedding.py", line 25, in embedding_forward
    embedding_kernel.cu_embedding_forward(
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 48, in __call__
    func = self._prepare_func()
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
    self._module.get_module(), self._func_name
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 23, in get_module
    Device(curr_device).use()   # force initialize context
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/device/__init__.py", line 152, in use
    self._device.use()
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/device/__init__.py", line 120, in use
    self.cublasLtHandle = cublaslt.cublasLtCreate()
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 105, in cublasLtCreate
    checkCublasStatus(cublasLt.cublasLtCreate(ctypes.byref(handle)))
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 98, in checkCublasStatus
    raise RuntimeError("CUBLAS error: {}".format(
RuntimeError: CUBLAS error: CUBLAS_STATUS_NOT_INITIALIZED

Environment:
Python 3.8.10
cudatoolkit 11.3.1

[FEATURE] Does BMInf support multiple nodes deployment?

I was reading the documents and the technical paper, seems like the experiment are done in single Node. Does BMInf support to multiple nodes inference deployment for large model like GLM-130?

[BUG] RuntimeError: Unexpected model output: 26239

Describe the bug

输入:
import bminf
cpm2 = bminf.models.CPM2()
result = cpm2.fill_blank("有一个服装品牌叫做<span>专门设计彩绘T恤",
top_p=0.5,
top_n=5,
temperature=0.5,
frequency_penalty=0,
presence_penalty=0
)

报错信息:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 6, in
presence_penalty=0
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 252, in fill_blank
raise RuntimeError("Unexpected model output: %d" % token)
RuntimeError: Unexpected model output: 26239

请帮忙看看是什么原因

Environment:
python3.6
torch1.8.1

Error during installation, `No matching distribution found for cupy-cuda90<10,>=9` [BUG]

Error log:

Collecting cupy-cuda90<10,>=9 (from bminf)                                
  ERROR: Could not find a version that satisfies the requirement cupy-cuda
90<10,>=9 (from bminf) (from versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0, 
4.4.1, 4.5.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 6.0.0, 6.1.0, 6.2.0, 6.3.
0, 6.4.0, 6.5.0, 6.6.0, 6.7.0, 7.0.0, 7.1.0, 7.1.1, 7.2.0, 7.3.0, 7.4.0, 7
.5.0, 7.6.0, 7.7.0, 7.8.0, 8.0.0, 8.1.0, 8.2.0, 8.3.0, 8.4.0, 8.5.0, 8.6.0
, 9.0.0a1, 9.0.0a2)                                                     
ERROR: No matching distribution found for cupy-cuda90<10,>=9 (from bminf)

/usr/local/cuda/version.txt:

CUDA Version 9.0.176                                                      
CUDA Patch Version 9.0.176.1                                              
CUDA Patch Version 9.0.176.2                                              
CUDA Patch Version 9.0.176.3

BMInf支持 CPM-Bee/Live 模型吗?

想了解下对Live模型的加载与体验

[QUESTION] Is it able to load complete model instead of INT8 model?

I've already downloaded the complete model from https://wudaoai.cn. I'm willing to do inference with the complete model. Is it able to use BMInf's API (such as EVA.dialogue()) to do inference ?

[BUG]RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

Describe the bug
使用docker环境，运行三个demo时后台都报错误
File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 265, in _igemm
stream.ptr
File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus
raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

请问是什么原因，是哪个版本有问题吗？

Environment:
cuda：10.1
模型：EVA-int8
显存：12G

[BUG] 存下显存泄漏，及访问过快时候报错

Describe the bug

how to load CPM1 model form local, now i used the following way：
1、build my model
model = GPT2Model(num_layers=args.num_layers,
vocab_size=args.vocab_size,
hidden_size=args.hidden_size,
num_attention_heads=args.num_attention_heads,
embedding_dropout_prob=args.hidden_dropout,
attention_dropout_prob=args.attention_dropout,
output_dropout_prob=args.hidden_dropout,
max_sequence_length=args.max_position_embeddings,
checkpoint_activations=args.checkpoint_activations,
checkpoint_num_layers=args.checkpoint_num_layers,
parallel_output=args.parallel_output)

the code from here
2、load_state_dict
load state_dict form local model

3、use wrapper to use bminf
model = bminf.wrapper(model)

Expected behavior

Screenshots

请求之前的显存占用

请求之后的显存占用

在访问速度过快的时候，也会报错。

其他：
怎么wrapper 一个transformers中加载出的模型？示例中实现没看明白。
Environment:

apex 0.1
bminf 2.0.0
deepspeed 0.3.15

请教CUPY/CUDA

您好，如图所述，我想查看 cupy操作cuda的函数的具体定义和用法，但是可能是因为cupy封装了c/c++代码，所以看不到，请问可以去哪里看呢？
能帮忙解释一下图中第3个参数 routine 里面 4个函数执行顺序吗（我了解大概是创建结构体、计算对称量化的scale）

跳到定义处，就只有这样的doc

请问下图红框内为什么那样写？

3. 想问一下为什么选择使用cupy直接操作cuda呢，比如allocator、igemm、fgemm的应用？这样相比使用框架（如pytorch等）实现量化有更大的好处吗？感觉cupy+cuda实现方式要求挺高的

非常感谢

@a710128

怎样用BMInf加速GLM推理速度

在V100下GLM的推理速度在10-20s区间内
用BMInf加速GLM，推理速度在1m以上
请问这是什么原因以及怎么加速GLM推理呢？

[FEATURE]How to finetune CPM2.1？

I am not familiar with int8. But i suppose it can not be trained like other fp32 models? Any suggestion about how to finetune it?

And does cpm2.1 has any report or paper? I did not find it anywhere.

Thank you!

在使用CPM ANT+上使用BMinf时报错：

File "/home/wenxuan/lihaijie_files/cpm-live/examples/tune_cpm_ant.py", line 56, in
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 274, in freeze_module
self._freeze_module_recursive(module, exclude, "") # modify the active state dict that still need grad
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316, in _freeze_module_recursive
self._freeze_module_recursive(c, exclude=exclude, prefix=next_prefix)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316, in _freeze_module_recursive
self._freeze_module_recursive(c, exclude=exclude, prefix=next_prefix)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 304, in _freeze_module_recursive
p.requires_grad = False
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().

AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

Cuda compilation tools, release 10.0, V10.0.130

torch 1.6.0

python 3.6

and get error:

Traceback (most recent call last):
File "/home/wac/PycharmProjects/CPM-1-Generate/test.py", line 7, in
cpm2.generate(text)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/models/cpm2.py", line 219, in generate
frequency_penalty, presence_penalty, 189
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/models/cpm2.py", line 103, in pre_processing
ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 102, in _igemm
lthandle = get_handle(device)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 65, in get_handle
v = cublasLt.cublasLtHandle_t()
AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

RuntimeError: CUBLAS error: CUBLAS_STATUS_EXECUTION_FAILED[BUG]

Describe the bug

用的CPM-2提供的docker镜像，出现该bug，其他环境没有尝试。

Minimal steps to reproduce

Expected behavior

运行examples下面的generate以及quick start中的代码都报错

Screenshots

Environment:

请问CPM2.1模型所对应的vocab.txt哪里可以下载？

fill_blank报错，换成别的文本进行填空提示Unexpected model output: 26239

将输入改成
input_text = "近日，北京智源人工智能研究院和清华大学研究团队发布了以中文为核心的大规模预训练语言模型 CPM-LM，参数规模达 26 亿，预训练中文数据规模 100 GB。"
会报错
"Unexpected model output: 26239"
请问fill_blank输入的文本有什么要求？或者对要填空的词有什么要求？
用的是
cpm2 = bminf.models.CPM2()

用pip 安装的，bminf-1.0.0

[BUG] Error was raised when importing model in v1.0.x

Describe the bug
CUDA error was raised when importing models. This issue only happens with BMInf 1.0.x version. I could run BmInf 0.0.5 successfully. Any help would be appreciated. Thanks.

Minimal steps to reproduce
Tried the following on both WSL2 Ubuntu 20.04 with GTX 3080 16G and native Ubuntu 18.04 with GTX 1070 8G

conda create --name bminfnew python=3.8
conda activate bminfnew
conda install cudatoolkit=11.3
pip install bminf==1.0.1

Then run

import bminf
cpm2 = bminf.models.CPM2()

Expected behavior
Start downloading the model.

Screenshots

Python 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bminf
>>> cpm2 = bminf.models.CPM2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 55, in __init__
    SizeLimitedAllocator( self._cudaAlloc.allocate( dynamic_memory ))
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/core/allocators/cuda.py", line 20, in allocate
    ptr = cudart.cudaMalloc(nbytes).value
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 375, in cudaMalloc
    checkCUDAStatus(cuda.cudaMalloc(ctypes.byref(ptr), size))
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 327, in checkCUDAStatus
    raise RuntimeError("CUDA Runtime Error: %s" % cudaGetErrorString(error))
RuntimeError: CUDA Runtime Error: out of memory

Environment:
Tried with various cuda versions including 10.2 11.0 and 11.3

Examples of CPM-1/2 啥时候出呀，还有生成模型的代码

请问是否支持显存按需调用，如果支持可否给个代码演示

非常喜欢bminf。
请问是否支持显存按需调用。
非常感谢。

[MODEL] Debug Self-Trained GPT-Model

Introduction
When I load a trained gpt-2 model into BMInf and do some inference, it would produce NaN in forwarding propagation. Although I can get DEBUG INFO, I still do not know what's going wrong. Here is the log info, how can I fix it?

2021-10-08 03:12:08,611 - model - INFO - MAX_LENGTH: 1024
2021-10-08 03:12:08,622 - model - INFO - Start loading parameters from disk to cpu
2021-10-08 03:12:08,622 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: size 75027456
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: parameters 0, sub_layers 5
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - In input_embedding: ==
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 30781440
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Out input_embedding: ==
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - In position_embedding: ==
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 1572864
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out position_embedding: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In input_mask: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: size 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: parameters 0, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out input_mask: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layers: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: size 42670080
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: parameters 0, sub_layers 6
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In 0: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out 0: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In 1: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out 1: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In 2: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,664 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out 2: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In 3: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out 3: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In 4: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out 4: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In 5: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,682 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out 5: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out layers: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - In encoder_final_layer_nrom: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out encoder_final_layer_nrom: ==
2021-10-08 03:12:08,687 - model - INFO - Start loading parameters from cpu to gpu
2021-10-08 03:12:08,687 - model - INFO - Using static loader: total: 75027456, dynamic_memory 536870912, memory_limit 11453988864
2021-10-08 03:12:08,688 - bminf.allocator.base - INFO - Allocate 30781440
2021-10-08 03:12:08,695 - bminf.allocator.base - INFO - Allocate 1572864
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,712 - bminf.allocator.base - INFO - Allocate 536870912
2021-10-08 03:12:08,713 - model - INFO - Cleaning useless parameters on cpu
2021-10-08 03:12:08,715 - model - INFO - End of model initialization
2021-10-08 03:12:08,715 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,859 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,860 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,861 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 18874368
2021-10-08 03:12:08,862 - model - INFO - Calc encoder layer 0
2021-10-08 03:12:08,862 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,871 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,872 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, False) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) Missing
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,928 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,928 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) Missing
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) Missing
2021-10-08 03:12:08,931 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) Missing
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,939 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,939 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,940 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,940 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) Missing
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,941 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,942 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) Missing
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) Missing
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,943 - model - INFO - Calc encoder layer 1
2021-10-08 03:12:08,943 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,944 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,947 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,948 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,948 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,950 - model - INFO - Calc encoder layer 2
2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,952 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,954 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,955 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,956 - model - INFO - Calc encoder layer 3
2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,958 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,961 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,962 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,963 - model - INFO - Calc encoder layer 4
2021-10-08 03:12:08,963 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,964 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,964 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,970 - model - INFO - Calc encoder layer 5
2021-10-08 03:12:08,970 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,971 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,974 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,975 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,979 - bminf.allocator.base - INFO - Allocate 40080
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 20040, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 1, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 20040, 1, 20040, 0, 1, 20040) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (0, 68, True, False) Missing
Loading model
Start
[[nan nan nan ... nan nan nan]]

[BUG]运行generate_cpm2.py 报value error

运行generate_cpm2.py 报value error

(EVAAA) [root@localhost examples]# python generate_cpm2.py
Loading model
Input: 天空是蔚蓝色，窗外有
Output: 天空是蔚蓝色，窗外有Traceback (most recent call last):
File "generate_cpm2.py", line 32, in
main()
File "generate_cpm2.py", line 29, in main
generate(cpm2_1, input_text)
File "generate_cpm2.py", line 11, in generate
value, stoped = model.generate(
ValueError: too many values to unpack (expected 2)

[BUG] GPU RTX4090 report errors

ERROR in app: Exception on /api/fillblank [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2070, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1515, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1513, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1499, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "main.py", line 66, in fillBlank
result = fillblank.fillBlank(model)
File "/app/controller/fill_blank_controller.py", line 18, in fillBlank
presence_penalty = presence_penalty1)
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 151, in fill_blank
frequency_penalty, presence_penalty, 0)
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 103, in pre_processing
ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 180, in _igemm
cublasLt.checkCublasStatus( cublasLt.cublasLtMatrixTransform(lthandle, transform_desc_b, ctypes.byref(v1), b.data.ptr, layout_b, ctypes.byref(v0), 0, 0, trans_b.ptr, layout_trans_b, stream.ptr) )
File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus
raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

[BUG] eva2 = bminf.models.EVA2()

EVA报错

In [11]: eva2 = bminf.models.EVA2()

KeyError Traceback (most recent call last)
in ()
----> 1 eva2 = bminf.models.EVA2()

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/models/eva2.py in init(self, device, memory_limit, config)
56 raise ValueError("Memory is not enough")
57
---> 58 super().init(config)
59
60 def dialogue(self,

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/model.py in init(self, config)
73 vocab_path = data.ensure_file(config.MODEL_NAME, "vocab.txt")
74
---> 75 self.tokenizer = T5Tokenizer(vocab_path)
76
77 self.device = config.DEVICE

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in init(self, vocab_path, max_len, max_sentinels)
81 self.translator_dec = str.maketrans("\u2582\u2583", " \n")
82
---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)]
84
85 @Property

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in (.0)
81 self.translator_dec = str.maketrans("\u2582\u2583", " \n")
82
---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)]
84
85 @Property

KeyError: '<s_0>'

generate.py示例程序错误

在

BMInf/examples/generate.py

Lines 4 to 21 in d40c6f5

    
           def generate(model : bminf.models.CPM1, sentence): 
        
               with tqdm() as progress_bar: 
        
                   progress_bar.write(sentence) 
        
                   while True: 
        
                       result = model.generate( 
        
                           sentence,  
        
                           max_tokens=8, 
        
                           top_n=5, 
        
                           top_p=None, 
        
                           temperature=0.85, 
        
                           frequency_penalty=0, 
        
                           presence_penalty=0 
        
                       ) 
        
                       sentence += result 
        
                       progress_bar.write(sentence) 
        
                       progress_bar.update(1) 
        
                       if result.find("<eod>") != -1: 
        
                           break

中，函数设置了生成的result中含有<eod>后才停止。

但在

BMInf/bminf/models/cpm1.py

Lines 90 to 99 in d40c6f5

    
           ret = [] 
        
           for _ in range(max_tokens): 
        
               dec_inputs = sampler.sample(x[0]) 
        
               if dec_inputs == self.tokenizer.eod_id: 
        
                   break 
        
               ret.append(dec_inputs) 
        
               x = self.decode_step(ctx, [dec_inputs]) 
        
           return self.id_to_text(ret)

中，采样出<eod>后循环终止，其未被加入到ret中，因此不会被解码出来，result中不可能含有<eod>。

因此，generate.py中实际会无限循环生成文本。

[BUG] RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

在Google Colab提供的 12G RAM，Tesla K80 GPU运行时上运行。
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2

报错如下：

RuntimeError Traceback (most recent call last)
in ()
25 print("Loading model")
26 cpm2_1 = bminf.models.CPM2()
---> 27 generate(cpm2_1, input_text)

in generate(model, text)
16 temperature=0.85,
17 frequency_penalty=0,
---> 18 presence_penalty=0,
19 )
20 text += value

/content/BMInf/bminf/models/cpm2.py in generate(self, input_sentence, max_tokens, top_n, top_p, temperature, frequency_penalty, presence_penalty, stop_tokens)
217 [len(input_sentence)],
218 max_tokens, top_n, top_p, temperature,
--> 219 frequency_penalty, presence_penalty, 189
220 )
221

/content/BMInf/bminf/models/cpm2.py in pre_processing(self, input_sentence, spans_position, max_tokens, top_n, top_p, temperature, frequency_penalty, presence_penalty, start_span_idx)
101 input_length = len(idx)
102
--> 103 ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
104 self.init_decoder_context(ctx)
105

/content/BMInf/bminf/arch/t5/model.py in encode(self, input_idx, input_length)
236 encoder_attn_mask,
237 x_pos,
--> 238 True
239 )
240 with calc_stream:

/content/BMInf/bminf/layers/transformer_block.py in forward(self, allocator, hidden_state, attention_mask, self_attn_position_bias, inplace)
40
41 logger.info("Encoder transformer block -- self attention")
---> 42 x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
43 assert x.dtype == cupy.float16
44 assert x.shape == (batch_size, dim_model, seq_len)

/content/BMInf/bminf/layers/attention.py in forward(self, allocator, hidden_state, attention_mask, self_attn_position_bias)
61 self.w_project_qkv.value[i:i+1],
62 False,
---> 63 qkv_i32
64 )
65 elementwise_copy_scale(

/content/BMInf/bminf/functions/gemm.py in igemm(allocator, a, aT, b, bT, c)
84 device = a.device
85 stream = cupy.cuda.get_current_stream()
---> 86 _igemm(allocator, a, aT, b, bT, c, device, stream)
87 return c
88

/content/BMInf/bminf/functions/gemm.py in _igemm(allocator, a, aT, b, bT, c, device, stream)
263 0,
264 0,
--> 265 stream.ptr
266 ))
267 if c.shape[2] != trans_ldc:

/content/BMInf/bminf/backend/cublaslt.py in checkCublasStatus(cublas_status)
99 return
100 if cublas_status in cublas_errors:
--> 101 raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
102 else:
103 raise RuntimeError("cublas error code: %d" % cublas_status)

RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

该笔记本的全部代码如下：

!git clone https://github.com/OpenBMB/BMInf.git
%cd BMInf
!python setup.py install

import bminf
import sys

def generate(model : bminf.models.CPM2, text):
print("Input: ", text)
sys.stdout.write("Output: %s" % text)
stoped = False
while not stoped:
value, stoped = model.generate(
input_sentence = text[-32:],
max_tokens=32,
top_n=5,
top_p=None,
temperature=0.85,
frequency_penalty=0,
presence_penalty=0,
)
text += value
sys.stdout.write(value)
sys.stdout.flush()
sys.stdout.write("\n")

input_text = input("请输入提示内容：")
print("Loading model")
cpm2_1 = bminf.models.CPM2()
generate(cpm2_1, input_text)

我们收集的是医疗问答数据怎么训练自己收集的数据呢？[FEATURE]

您好
我们这里收集的是医疗问答数据，想基于eva进行训练，但是我们尝试了一些方法还是没有实现，请问一下您这边实现了没有呢？谢谢您Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

为什么实际的显存占用总比我设置的多两个G?

我设置的memory limit为6<<30的时候，最后的显存占用是8G, 设置为1<<30的时候，就是3G, 总是多2G, 不知道为什么

[BUG]RuntimeError: Library cublasLt is not initialized

Describe the bug

运行路径https://github.com/OpenBMB/BMInf 的demo时候，出现RuntimeError: Library cublasLt is not initialized错误

Minimal steps to reproduce

import bminf #成功导入
cpm2 = bminf.models.CPM2() #成功定义
cpm2.fill_blank('好') #报错 RuntimeError: Library cublasLt is not initialized
Expected behavior

Screenshots

Environment:

NVIDIA-SMI 465.19.01
Driver Version: 465.19.01
NVIDIA A40
CUDA Version: 11.3
Memory:45634MiB

[FEATURE]Compare to FasterTransformer

Is there any comparison between BMInf and Nvidia's FasterTransformer?

I would like to use some tools to improve our model's inference performance. BMInf is great, and it seems like use CUDA implementation to boost inference performance, just like FasterTransformer. So, is there any comparison in inference time between BMInf and FasterTransformer?

[FEATURE] Can the framework be combined with Faster Transformer?

Is your feature request related to a problem? Please describe.
There are other speedup methods for transformers like FasterTransformer.

Describe the solution you'd like
Can you describe how your method compares to FT method and if it can be combined and potentially show an example?

[FEATURE] Question: why does CPM1.0 need 2 GPU cards while BMInf only need 1 GPU?

where does it optimized?

[BUG]请问BMInf支持transformers的模型吗？我用BMInf包装模型推理时报错了

模型代码：

self.model = MyBert.from_pretrained(pretrained_model_name_or_path=model_path,)
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model.to(self.device)
self.model = bminf.wrapper(self.model)

错误信息：

input_embed = self.model.bert(**input_tokenized)["last_hidden_state"]
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 1022, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 611, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 497, in forward
    self_attention_outputs = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 427, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 293, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 81, in forward
    out = OpLinear.apply(x, self.weight_quant, self.weight_scale)
  File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 31, in forward
    gemm_int8(
  File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/gemm.py", line 139, in gemm_int8
    assert m % 4 == 0 and n % 4 == 0 and k % 4 == 0
AssertionError

QUESTION:请问如何从CPM2.1模型中提取出句向量(输入词则提取词向量)

ctx = self.encode(np.array([idx]),[input_length])
我提取ctx的hidden_states取平均作为句子的embedding，但发现这样做的效果不是很好，请问该如何使用CPM2.1正确的获取句子的特征表示呢？

CPM2.1模型做文本生成时的问题

我在尝试用CPM2.1进行文本生成时，为了生成更长的结果，我修改了下面这行代码，使程序不会在生成标点符号时就停止。

BMInf/bminf/models/cpm2.py

Line 216 in 59e8903

    
           if decoder_ipts in [7,24,17,47,16,12,18,13,19,9,42,53,51,27,2154,2891,2154,6027]:

我的调用方法如下：

在修改代码后，我发现生成的结果result中会有换行符（由词表中id为3的token转换而来），并且在换行后，上下文就不再连贯了，像是另起了一个段，有时候甚至话题都变了，如下图。

这个例子在开始生成时直接换行了。

这个例子在换行后话题出现了较大变化。

这种现象的出现是因为训练时就是以这种方式分隔段落的吗？
可否允许从generate函数传入自定义的”停止字符“来控制生成行为？
可否给出一个CPM2.1用来生成长篇文本的示例？

cublas error: CUBLAS_STATUS_NOT_SUPPORTED

运行examples/generate.py时报错，上层调用栈是functions/gemm.py的第249行。

BMInf/bminf/functions/gemm.py

Line 249 in d40c6f5

cublasLt.checkCublasStatus( cublasLt.cublasLtMatmul(

使用的环境为：
cuda 10.1（cublas版本为10.1.0.63）
BMInf 0.0.4 通过clone + python setup.py install方式安装
torch 1.7.1

是否支持cpm2 finetune repo的模型

请问该工具是否直接支持 cpm2-finetune配套的cpm2模型（需要到智源页面申请）；
我下载了中英文模型，100亿参数，vocab大小51967；本来有4个单独的文件，我按照官方脚本将其合并成1个单文件模型，测试显示没问题；
修改一些参数后，用bminf 下的 example/generate_cpm2.py加载我合并的单文件模型进行测试，无法加载，错误如下：

更新1

我发现应该是要加载压缩量化等技术处理之后的模型，tool下有个migrate_cpm2.py，我用它做了量化工作，得到11g的模型；建议可以把文档写的详细一点。
用 migrate_cpm2.py量化后，重新微调是如何做的，类似训练过程量化（quantized-aware）吗？
我用generate_cpm2.py加载上面量化后的11g模型，推理时设置最多生成100个字，查看显存占用要用13g+（A100），不知道要怎么做到你们 doc 说的显存调度，可以在2080ti下跑推理(2080ti只有11g显存)？
请问怎么样把模型的模块拆分到不同gpu？这样可以解决第3步11g显存不够用的问题。比如把encoder、decoder分配到不同的gpu。我看模型构建并不是用torch等框架，数据迁移到显存主要靠with device 和 allocator好像，所以没太懂怎么把不同模块分配到不同gpu；

@a710128 期望回复，谢谢

[BUG] TypeError: object of type 'TransformerBlockList' has no len()

Describe the bug
提示如上提示，是因为bminf问题吗？

CUDA Error: no kernel image is available for execution on the device

Failed to run with P100 GPU which works fine with other pytorch cuda code, GPU info:
GPU Device 0: "Pascal" with compute capability 6.0
Compute 6.0 CUDA device: [Tesla P100-PCIE-16GB]

error trace:

/opt/conda/lib/python3.8/site-packages/cpm_kernels/library/cuda.py in checkCUStatus(error)
    214 def checkCUStatus(error : int) -> None:
    215     if error != CUDA_SUCCESS:
--> 216         raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error))
    217 
    218 @cuda.bind("cuDriverGetVersion", [ctypes.POINTER(ctypes.c_int)], CUresult)

RuntimeError: CUDA Error: no kernel image is available for execution on the device

What the minimal compute version does cpm-kernels need?

running examples and get error: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

when running examples/fill_blank.py, get error:
AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

cuda version is 10.0
have successfully installed bminf 0.4.0
any idea how to solve this problem?

cpm1 or cpm2 named_parameters() is empty []

when I want to get parameters,I got the empty [] from cpm1 or cpm2 named_parameters()?

CPM2自动问答任务怎么推理

我看推理只有"生成"和"填空"，请问自动问答用BMInf怎么进行推理呢

[FEATURE] full examples with some known models from the HF in a Collab Notebook

Is your feature request related to a problem? Please describe.
For example I cannot get HF Bert working. I don't know when I can use your project

import bminf
import torch 

encoded_input_cpu = tokenizer(text, return_tensors='pt').to('cpu')
model = BertModel.from_pretrained("bert-base-uncased").to('cpu')
# apply wrapper
with torch.cuda.device(0):
    model = bminf.wrapper(model.to('cpu'))
    with print_time_delta('generate'):
      output = model(**encoded_input_cpu)

Describe the solution you'd like
Can you provide full examples with some known models from the HF in a Collab Notebook?

Describe alternatives you've considered

[FEATURE]允许加载本地模型

Is your feature request related to a problem? Please describe.

有时候跑模型的服务器是物理断网的，需要手动下载模型上传后再加载。
从前（0.0.4版本）可以通过设置config的MODEL_NAME实现本地加载，但代码更新到1.0.0以后不能这样做了（除非修改BMInf的源码）。

Describe the solution you'd like

在初始化模型时提供一个接口，指定本地路径进行加载（可能通过修改现有的version字段实现）。

Describe alternatives you've considered

无。

其他：请问0.0.4到1.0.0之间，CPM2.1模型是否更新过？使用1.0.0的代码加载0.0.4时期下载的模型时报错了。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def generate(model : bminf.models.CPM1, sentence):
	with tqdm() as progress_bar:
	progress_bar.write(sentence)
	while True:
	result = model.generate(
	sentence,
	max_tokens=8,
	top_n=5,
	top_p=None,
	temperature=0.85,
	frequency_penalty=0,
	presence_penalty=0
	)
	sentence += result
	progress_bar.write(sentence)
	progress_bar.update(1)
	if result.find("<eod>") != -1:
	break

	ret = []

	for _ in range(max_tokens):
	dec_inputs = sampler.sample(x[0])
	if dec_inputs == self.tokenizer.eod_id:
	break
	ret.append(dec_inputs)
	x = self.decode_step(ctx, [dec_inputs])

	return self.id_to_text(ret)