Code Monkey home page Code Monkey logo

bminf's People

Contributors

a710128 avatar clpl avatar jayzzhou-thu avatar jctime avatar prnake avatar thucsthanxu13 avatar zibuyu avatar zt-wang19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bminf's Issues

RuntimeError: CUBLAS error: CUBLAS_STATUS_NOT_INITIALIZED [BUG]

running the example file fill_blank.py, it raise error as follows:

Loading model
Start
Input:  北京环球度假区相关负责人介绍北京环球影城指定单日门票将采用____制度即推出淡季日平季日旺季日和特定日门票____价格为418元____价格为528元____价格为638元____价格为____元北京环球度假区将提供90天滚动价格日历以方便游客提前规划行程Traceback (most recent call last):
  File "abc.py", line 28, in <module>
    main()
  File "abc.py", line 25, in main
    fill_blank(cpm2, input_text)
  File "abc.py", line 9, in fill_blank
    for result in cpm2.fill_blank(text,
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/models/cpm2.py", line 245, in fill_blank
    for token in res:
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/models/cpm2.py", line 129, in _gen_iter
    self._model.embedding(
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/arch/t5/model.py", line 165, in embedding
    self.input_embedding.embedding_forward(ctx, tensor_ids, x_out)
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/layers/embedding.py", line 27, in embedding_forward
    ck.embedding_forward(
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/embedding.py", line 25, in embedding_forward
    embedding_kernel.cu_embedding_forward(
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 48, in __call__
    func = self._prepare_func()
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
    self._module.get_module(), self._func_name
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 23, in get_module
    Device(curr_device).use()   # force initialize context
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/device/__init__.py", line 152, in use
    self._device.use()
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/device/__init__.py", line 120, in use
    self.cublasLtHandle = cublaslt.cublasLtCreate()
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 105, in cublasLtCreate
    checkCublasStatus(cublasLt.cublasLtCreate(ctypes.byref(handle)))
  File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 98, in checkCublasStatus
    raise RuntimeError("CUBLAS error: {}".format(
RuntimeError: CUBLAS error: CUBLAS_STATUS_NOT_INITIALIZED

Environment:
Python 3.8.10
cudatoolkit 11.3.1

[BUG] RuntimeError: Unexpected model output: 26239

Describe the bug

输入:
import bminf
cpm2 = bminf.models.CPM2()
result = cpm2.fill_blank("有一个服装品牌叫做<span>专门设计彩绘T恤",
top_p=0.5,
top_n=5,
temperature=0.5,
frequency_penalty=0,
presence_penalty=0
)

报错信息:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 6, in
presence_penalty=0
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 252, in fill_blank
raise RuntimeError("Unexpected model output: %d" % token)
RuntimeError: Unexpected model output: 26239

请帮忙看看是什么原因

Environment:
python3.6
torch1.8.1

Error during installation, `No matching distribution found for cupy-cuda90<10,>=9` [BUG]

Error log:

Collecting cupy-cuda90<10,>=9 (from bminf)                                
  ERROR: Could not find a version that satisfies the requirement cupy-cuda
90<10,>=9 (from bminf) (from versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0, 
4.4.1, 4.5.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 6.0.0, 6.1.0, 6.2.0, 6.3.
0, 6.4.0, 6.5.0, 6.6.0, 6.7.0, 7.0.0, 7.1.0, 7.1.1, 7.2.0, 7.3.0, 7.4.0, 7
.5.0, 7.6.0, 7.7.0, 7.8.0, 8.0.0, 8.1.0, 8.2.0, 8.3.0, 8.4.0, 8.5.0, 8.6.0
, 9.0.0a1, 9.0.0a2)                                                     
ERROR: No matching distribution found for cupy-cuda90<10,>=9 (from bminf)

/usr/local/cuda/version.txt:

CUDA Version 9.0.176                                                      
CUDA Patch Version 9.0.176.1                                              
CUDA Patch Version 9.0.176.2                                              
CUDA Patch Version 9.0.176.3

[BUG]RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

Describe the bug
使用docker环境,运行三个demo时后台都报错误
File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 265, in _igemm
stream.ptr
File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus
raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

请问是什么原因,是哪个版本有问题吗?

Environment:
cuda:10.1
模型:EVA-int8
显存:12G

[BUG] 存下显存泄漏,及访问过快时候报错

Describe the bug

how to load CPM1 model form local, now i used the following way:
1、build my model
model = GPT2Model(num_layers=args.num_layers,
vocab_size=args.vocab_size,
hidden_size=args.hidden_size,
num_attention_heads=args.num_attention_heads,
embedding_dropout_prob=args.hidden_dropout,
attention_dropout_prob=args.attention_dropout,
output_dropout_prob=args.hidden_dropout,
max_sequence_length=args.max_position_embeddings,
checkpoint_activations=args.checkpoint_activations,
checkpoint_num_layers=args.checkpoint_num_layers,
parallel_output=args.parallel_output)

the code from here
2、load_state_dict
load state_dict form local model

3、use wrapper to use bminf
model = bminf.wrapper(model)

Expected behavior

Screenshots

请求之前的显存占用
image
请求之后的显存占用
image

在访问速度过快的时候,也会报错。
image

其他:
怎么wrapper 一个transformers中加载出的模型?示例中实现没看明白。
Environment:

apex 0.1
bminf 2.0.0
deepspeed 0.3.15

请教CUPY/CUDA

  1. 您好,如图所述,我想查看 cupy操作cuda的函数的具体定义和用法,但是可能是因为cupy封装了c/c++代码,所以看不到,请问可以去哪里看呢?
    能帮忙解释一下图中第3个参数 routine 里面 4个函数执行顺序吗(我了解大概是 创建结构体、计算对称量化的scale)

image

跳到定义处,就只有这样的doc
image


  1. 请问下图红框内为什么那样写?

image


3. 想问一下为什么选择使用cupy直接操作cuda呢,比如allocator、igemm、fgemm的应用?这样相比使用框架(如pytorch等)实现量化有更大的好处吗?感觉cupy+cuda实现方式 要求挺高的

非常感谢

@a710128

怎样用BMInf加速GLM推理速度

在V100下GLM的推理速度在10-20s区间内
用BMInf加速GLM,推理速度在1m以上
请问这是什么原因以及怎么加速GLM推理呢?

[FEATURE]How to finetune CPM2.1?

I am not familiar with int8. But i suppose it can not be trained like other fp32 models? Any suggestion about how to finetune it?

And does cpm2.1 has any report or paper? I did not find it anywhere.

Thank you!

在使用CPM ANT+上使用BMinf时报错:

File "/home/wenxuan/lihaijie_files/cpm-live/examples/tune_cpm_ant.py", line 56, in
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 274, in freeze_module
self._freeze_module_recursive(module, exclude, "") # modify the active state dict that still need grad
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316, in _freeze_module_recursive
self._freeze_module_recursive(c, exclude=exclude, prefix=next_prefix)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316, in _freeze_module_recursive
self._freeze_module_recursive(c, exclude=exclude, prefix=next_prefix)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 304, in _freeze_module_recursive
p.requires_grad = False
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().

AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

Cuda compilation tools, release 10.0, V10.0.130

torch 1.6.0

python 3.6

and get error:

Traceback (most recent call last):
File "/home/wac/PycharmProjects/CPM-1-Generate/test.py", line 7, in
cpm2.generate(text)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/models/cpm2.py", line 219, in generate
frequency_penalty, presence_penalty, 189
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/models/cpm2.py", line 103, in pre_processing
ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 102, in _igemm
lthandle = get_handle(device)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 65, in get_handle
v = cublasLt.cublasLtHandle_t()
AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

fill_blank报错,换成别的文本进行填空提示Unexpected model output: 26239

将输入改成
input_text = "近日,北京智源人工智能研究院和清华大学研究团队发布了以中文为核心的大规模预训练语言模型 CPM-LM,参数规模达 26 亿,预训练中文数据规模 100 GB。"
会报错
"Unexpected model output: 26239"
请问fill_blank输入的文本有什么要求?或者对要填空的词有什么要求?
用的是
cpm2 = bminf.models.CPM2()

用pip 安装的,bminf-1.0.0

[BUG] Error was raised when importing model in v1.0.x

Describe the bug
CUDA error was raised when importing models. This issue only happens with BMInf 1.0.x version. I could run BmInf 0.0.5 successfully. Any help would be appreciated. Thanks.

Minimal steps to reproduce
Tried the following on both WSL2 Ubuntu 20.04 with GTX 3080 16G and native Ubuntu 18.04 with GTX 1070 8G

conda create --name bminfnew python=3.8
conda activate bminfnew
conda install cudatoolkit=11.3
pip install bminf==1.0.1

Then run

import bminf
cpm2 = bminf.models.CPM2()

Expected behavior
Start downloading the model.

Screenshots

Python 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bminf
>>> cpm2 = bminf.models.CPM2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 55, in __init__
    SizeLimitedAllocator( self._cudaAlloc.allocate( dynamic_memory ))
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/core/allocators/cuda.py", line 20, in allocate
    ptr = cudart.cudaMalloc(nbytes).value
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 375, in cudaMalloc
    checkCUDAStatus(cuda.cudaMalloc(ctypes.byref(ptr), size))
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 327, in checkCUDAStatus
    raise RuntimeError("CUDA Runtime Error: %s" % cudaGetErrorString(error))
RuntimeError: CUDA Runtime Error: out of memory

Environment:
Tried with various cuda versions including 10.2 11.0 and 11.3

[MODEL] Debug Self-Trained GPT-Model

Introduction
When I load a trained gpt-2 model into BMInf and do some inference, it would produce NaN in forwarding propagation. Although I can get DEBUG INFO, I still do not know what's going wrong. Here is the log info, how can I fix it?

2021-10-08 03:12:08,611 - model - INFO - MAX_LENGTH: 1024
2021-10-08 03:12:08,622 - model - INFO - Start loading parameters from disk to cpu
2021-10-08 03:12:08,622 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: size 75027456
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: parameters 0, sub_layers 5
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - In input_embedding: ==
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 30781440
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Out input_embedding: ==
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - In position_embedding: ==
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 1572864
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out position_embedding: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In input_mask: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: size 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: parameters 0, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out input_mask: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layers: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: size 42670080
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: parameters 0, sub_layers 6
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In 0: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out 0: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In 1: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out 1: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In 2: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,664 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out 2: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In 3: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out 3: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In 4: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out 4: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In 5: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,682 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out 5: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out layers: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - In encoder_final_layer_nrom: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out encoder_final_layer_nrom: ==
2021-10-08 03:12:08,687 - model - INFO - Start loading parameters from cpu to gpu
2021-10-08 03:12:08,687 - model - INFO - Using static loader: total: 75027456, dynamic_memory 536870912, memory_limit 11453988864
2021-10-08 03:12:08,688 - bminf.allocator.base - INFO - Allocate 30781440
2021-10-08 03:12:08,695 - bminf.allocator.base - INFO - Allocate 1572864
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,712 - bminf.allocator.base - INFO - Allocate 536870912
2021-10-08 03:12:08,713 - model - INFO - Cleaning useless parameters on cpu
2021-10-08 03:12:08,715 - model - INFO - End of model initialization
2021-10-08 03:12:08,715 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,859 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,860 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,861 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 18874368
2021-10-08 03:12:08,862 - model - INFO - Calc encoder layer 0
2021-10-08 03:12:08,862 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,871 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,872 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, False) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) Missing
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,928 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,928 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) Missing
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) Missing
2021-10-08 03:12:08,931 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) Missing
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,939 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,939 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,940 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,940 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) Missing
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,941 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,942 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) Missing
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) Missing
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,943 - model - INFO - Calc encoder layer 1
2021-10-08 03:12:08,943 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,944 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,947 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,948 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,948 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,950 - model - INFO - Calc encoder layer 2
2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,952 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,954 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,955 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,956 - model - INFO - Calc encoder layer 3
2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,958 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,961 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,962 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,963 - model - INFO - Calc encoder layer 4
2021-10-08 03:12:08,963 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,964 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,964 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,970 - model - INFO - Calc encoder layer 5
2021-10-08 03:12:08,970 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,971 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,974 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,975 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,979 - bminf.allocator.base - INFO - Allocate 40080
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 20040, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 1, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 20040, 1, 20040, 0, 1, 20040) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (0, 68, True, False) Missing
Loading model
Start
[[nan nan nan ... nan nan nan]]

[BUG]运行generate_cpm2.py 报value error

运行generate_cpm2.py 报value error

(EVAAA) [root@localhost examples]# python generate_cpm2.py
Loading model
Input: 天空是蔚蓝色,窗外有
Output: 天空是蔚蓝色,窗外有Traceback (most recent call last):
File "generate_cpm2.py", line 32, in
main()
File "generate_cpm2.py", line 29, in main
generate(cpm2_1, input_text)
File "generate_cpm2.py", line 11, in generate
value, stoped = model.generate(
ValueError: too many values to unpack (expected 2)

[BUG] GPU RTX4090 report errors

ERROR in app: Exception on /api/fillblank [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2070, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1515, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1513, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1499, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "main.py", line 66, in fillBlank
result = fillblank.fillBlank(model)
File "/app/controller/fill_blank_controller.py", line 18, in fillBlank
presence_penalty = presence_penalty1)
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 151, in fill_blank
frequency_penalty, presence_penalty, 0)
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 103, in pre_processing
ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 180, in _igemm
cublasLt.checkCublasStatus( cublasLt.cublasLtMatrixTransform(lthandle, transform_desc_b, ctypes.byref(v1), b.data.ptr, layout_b, ctypes.byref(v0), 0, 0, trans_b.ptr, layout_trans_b, stream.ptr) )
File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus
raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

[BUG] eva2 = bminf.models.EVA2()

EVA报错

In [11]: eva2 = bminf.models.EVA2()

KeyError Traceback (most recent call last)
in ()
----> 1 eva2 = bminf.models.EVA2()

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/models/eva2.py in init(self, device, memory_limit, config)
56 raise ValueError("Memory is not enough")
57
---> 58 super().init(config)
59
60 def dialogue(self,

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/model.py in init(self, config)
73 vocab_path = data.ensure_file(config.MODEL_NAME, "vocab.txt")
74
---> 75 self.tokenizer = T5Tokenizer(vocab_path)
76
77 self.device = config.DEVICE

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in init(self, vocab_path, max_len, max_sentinels)
81 self.translator_dec = str.maketrans("\u2582\u2583", " \n")
82
---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)]
84
85 @Property

~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in (.0)
81 self.translator_dec = str.maketrans("\u2582\u2583", " \n")
82
---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)]
84
85 @Property

KeyError: '<s_0>'

generate.py示例程序错误


def generate(model : bminf.models.CPM1, sentence):
with tqdm() as progress_bar:
progress_bar.write(sentence)
while True:
result = model.generate(
sentence,
max_tokens=8,
top_n=5,
top_p=None,
temperature=0.85,
frequency_penalty=0,
presence_penalty=0
)
sentence += result
progress_bar.write(sentence)
progress_bar.update(1)
if result.find("<eod>") != -1:
break

中,函数设置了生成的result中含有<eod>后才停止。

但在

ret = []
for _ in range(max_tokens):
dec_inputs = sampler.sample(x[0])
if dec_inputs == self.tokenizer.eod_id:
break
ret.append(dec_inputs)
x = self.decode_step(ctx, [dec_inputs])
return self.id_to_text(ret)

中,采样出<eod>后循环终止,其未被加入到ret中,因此不会被解码出来,result中不可能含有<eod>

因此,generate.py中实际会无限循环生成文本。

[BUG] RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

在Google Colab提供的 12G RAM,Tesla K80 GPU运行时上运行。
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2

报错如下:

RuntimeError Traceback (most recent call last)
in ()
25 print("Loading model")
26 cpm2_1 = bminf.models.CPM2()
---> 27 generate(cpm2_1, input_text)

in generate(model, text)
16 temperature=0.85,
17 frequency_penalty=0,
---> 18 presence_penalty=0,
19 )
20 text += value

/content/BMInf/bminf/models/cpm2.py in generate(self, input_sentence, max_tokens, top_n, top_p, temperature, frequency_penalty, presence_penalty, stop_tokens)
217 [len(input_sentence)],
218 max_tokens, top_n, top_p, temperature,
--> 219 frequency_penalty, presence_penalty, 189
220 )
221

/content/BMInf/bminf/models/cpm2.py in pre_processing(self, input_sentence, spans_position, max_tokens, top_n, top_p, temperature, frequency_penalty, presence_penalty, start_span_idx)
101 input_length = len(idx)
102
--> 103 ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
104 self.init_decoder_context(ctx)
105

/content/BMInf/bminf/arch/t5/model.py in encode(self, input_idx, input_length)
236 encoder_attn_mask,
237 x_pos,
--> 238 True
239 )
240 with calc_stream:

/content/BMInf/bminf/layers/transformer_block.py in forward(self, allocator, hidden_state, attention_mask, self_attn_position_bias, inplace)
40
41 logger.info("Encoder transformer block -- self attention")
---> 42 x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
43 assert x.dtype == cupy.float16
44 assert x.shape == (batch_size, dim_model, seq_len)

/content/BMInf/bminf/layers/attention.py in forward(self, allocator, hidden_state, attention_mask, self_attn_position_bias)
61 self.w_project_qkv.value[i:i+1],
62 False,
---> 63 qkv_i32
64 )
65 elementwise_copy_scale(

/content/BMInf/bminf/functions/gemm.py in igemm(allocator, a, aT, b, bT, c)
84 device = a.device
85 stream = cupy.cuda.get_current_stream()
---> 86 _igemm(allocator, a, aT, b, bT, c, device, stream)
87 return c
88

/content/BMInf/bminf/functions/gemm.py in _igemm(allocator, a, aT, b, bT, c, device, stream)
263 0,
264 0,
--> 265 stream.ptr
266 ))
267 if c.shape[2] != trans_ldc:

/content/BMInf/bminf/backend/cublaslt.py in checkCublasStatus(cublas_status)
99 return
100 if cublas_status in cublas_errors:
--> 101 raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
102 else:
103 raise RuntimeError("cublas error code: %d" % cublas_status)

RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED

该笔记本的全部代码如下:

!git clone https://github.com/OpenBMB/BMInf.git
%cd BMInf
!python setup.py install

import bminf
import sys

def generate(model : bminf.models.CPM2, text):
print("Input: ", text)
sys.stdout.write("Output: %s" % text)
stoped = False
while not stoped:
value, stoped = model.generate(
input_sentence = text[-32:],
max_tokens=32,
top_n=5,
top_p=None,
temperature=0.85,
frequency_penalty=0,
presence_penalty=0,
)
text += value
sys.stdout.write(value)
sys.stdout.flush()
sys.stdout.write("\n")

input_text = input("请输入提示内容:")
print("Loading model")
cpm2_1 = bminf.models.CPM2()
generate(cpm2_1, input_text)

[BUG]RuntimeError: Library cublasLt is not initialized

Describe the bug

运行路径https://github.com/OpenBMB/BMInf 的demo时候,出现RuntimeError: Library cublasLt is not initialized错误

Minimal steps to reproduce

import bminf #成功导入
cpm2 = bminf.models.CPM2() #成功定义
cpm2.fill_blank('好') #报错 RuntimeError: Library cublasLt is not initialized
Expected behavior

Screenshots

image
image

Environment:

NVIDIA-SMI 465.19.01
Driver Version: 465.19.01
NVIDIA A40
CUDA Version: 11.3
Memory:45634MiB

[FEATURE]Compare to FasterTransformer

Is there any comparison between BMInf and Nvidia's FasterTransformer?

I would like to use some tools to improve our model's inference performance. BMInf is great, and it seems like use CUDA implementation to boost inference performance, just like FasterTransformer. So, is there any comparison in inference time between BMInf and FasterTransformer?

[BUG]请问BMInf支持transformers的模型吗?我用BMInf包装模型推理时报错了

模型代码:

self.model = MyBert.from_pretrained(pretrained_model_name_or_path=model_path,)
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model.to(self.device)
self.model = bminf.wrapper(self.model)

错误信息:

input_embed = self.model.bert(**input_tokenized)["last_hidden_state"]
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 1022, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 611, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 497, in forward
    self_attention_outputs = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 427, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 293, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 81, in forward
    out = OpLinear.apply(x, self.weight_quant, self.weight_scale)
  File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 31, in forward
    gemm_int8(
  File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/gemm.py", line 139, in gemm_int8
    assert m % 4 == 0 and n % 4 == 0 and k % 4 == 0
AssertionError

CPM2.1模型做文本生成时的问题

我在尝试用CPM2.1进行文本生成时,为了生成更长的结果,我修改了下面这行代码,使程序不会在生成标点符号时就停止。

if decoder_ipts in [7,24,17,47,16,12,18,13,19,9,42,53,51,27,2154,2891,2154,6027]:

我的调用方法如下:
image

在修改代码后,我发现生成的结果result中会有换行符(由词表中id为3的token转换而来),并且在换行后,上下文就不再连贯了,像是另起了一个段,有时候甚至话题都变了,如下图。

这个例子在开始生成时直接换行了。
image

这个例子在换行后话题出现了较大变化。
image

  1. 这种现象的出现是因为训练时就是以这种方式分隔段落的吗?
  2. 可否允许从generate函数传入自定义的”停止字符“来控制生成行为?
  3. 可否给出一个CPM2.1用来生成长篇文本的示例?

是否支持cpm2 finetune repo的模型

请问该工具是否直接支持 cpm2-finetune配套的cpm2模型(需要到智源页面申请);
我下载了中英文模型,100亿参数,vocab大小51967;本来有4个单独的文件,我按照官方脚本将其合并成1个单文件模型,测试显示没问题;
修改一些参数后,用bminf 下的 example/generate_cpm2.py加载我合并的单文件模型进行测试,无法加载,错误如下:
image

更新1

  1. 我发现应该是要加载压缩量化等技术处理之后的模型,tool下有个migrate_cpm2.py,我用它做了量化工作,得到11g的模型;建议可以把文档写的详细一点。

  2. 用 migrate_cpm2.py量化后,重新微调是如何做的,类似 训练过程量化(quantized-aware)吗?
    image

  3. 我用generate_cpm2.py加载上面 量化后的11g模型,推理时设置最多生成100个字,查看显存占用要用13g+(A100),不知道要怎么做到你们 doc 说的 显存调度,可以在2080ti下跑推理(2080ti只有11g显存)?

  4. 请问怎么样把模型的模块拆分到不同gpu?这样可以解决第3步11g显存不够用的问题。比如把encoder、decoder分配到不同的gpu。我看模型构建并不是用torch等框架,数据迁移到显存主要靠with device 和 allocator好像,所以没太懂怎么把不同模块分配到不同gpu;

@a710128 期望回复,谢谢

CUDA Error: no kernel image is available for execution on the device

Failed to run with P100 GPU which works fine with other pytorch cuda code, GPU info:
GPU Device 0: "Pascal" with compute capability 6.0
Compute 6.0 CUDA device: [Tesla P100-PCIE-16GB]

error trace:

/opt/conda/lib/python3.8/site-packages/cpm_kernels/library/cuda.py in checkCUStatus(error)
    214 def checkCUStatus(error : int) -> None:
    215     if error != CUDA_SUCCESS:
--> 216         raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error))
    217 
    218 @cuda.bind("cuDriverGetVersion", [ctypes.POINTER(ctypes.c_int)], CUresult)

RuntimeError: CUDA Error: no kernel image is available for execution on the device

What the minimal compute version does cpm-kernels need?

[FEATURE] full examples with some known models from the HF in a Collab Notebook

Is your feature request related to a problem? Please describe.
For example I cannot get HF Bert working. I don't know when I can use your project

import bminf
import torch 

encoded_input_cpu = tokenizer(text, return_tensors='pt').to('cpu')
model = BertModel.from_pretrained("bert-base-uncased").to('cpu')
# apply wrapper
with torch.cuda.device(0):
    model = bminf.wrapper(model.to('cpu'))
    with print_time_delta('generate'):
      output = model(**encoded_input_cpu)

Describe the solution you'd like
Can you provide full examples with some known models from the HF in a Collab Notebook?

Describe alternatives you've considered

[FEATURE]允许加载本地模型

Is your feature request related to a problem? Please describe.

有时候跑模型的服务器是物理断网的,需要手动下载模型上传后再加载。
从前(0.0.4版本)可以通过设置config的MODEL_NAME实现本地加载,但代码更新到1.0.0以后不能这样做了(除非修改BMInf的源码)。

Describe the solution you'd like

在初始化模型时提供一个接口,指定本地路径进行加载(可能通过修改现有的version字段实现)。

Describe alternatives you've considered

无。

其他:请问0.0.4到1.0.0之间,CPM2.1模型是否更新过?使用1.0.0的代码加载0.0.4时期下载的模型时报错了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.