openbmb / bminf Goto Github PK
View Code? Open in Web Editor NEWEfficient Inference for Big Models
License: Apache License 2.0
Efficient Inference for Big Models
License: Apache License 2.0
running the example file fill_blank.py
, it raise error as follows:
Loading model
Start
Input: 北京环球度假区相关负责人介绍,北京环球影城指定单日门票将采用____制度,即推出淡季日、平季日、旺季日和特定日门票。____价格为418元,____价格为528元,____价格为638元,____价格为____元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。
Traceback (most recent call last):
File "abc.py", line 28, in <module>
main()
File "abc.py", line 25, in main
fill_blank(cpm2, input_text)
File "abc.py", line 9, in fill_blank
for result in cpm2.fill_blank(text,
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/models/cpm2.py", line 245, in fill_blank
for token in res:
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/models/cpm2.py", line 129, in _gen_iter
self._model.embedding(
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/arch/t5/model.py", line 165, in embedding
self.input_embedding.embedding_forward(ctx, tensor_ids, x_out)
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/bminf/layers/embedding.py", line 27, in embedding_forward
ck.embedding_forward(
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/embedding.py", line 25, in embedding_forward
embedding_kernel.cu_embedding_forward(
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 48, in __call__
func = self._prepare_func()
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
self._module.get_module(), self._func_name
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 23, in get_module
Device(curr_device).use() # force initialize context
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/device/__init__.py", line 152, in use
self._device.use()
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/device/__init__.py", line 120, in use
self.cublasLtHandle = cublaslt.cublasLtCreate()
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
return f(*args, **kwargs)
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 105, in cublasLtCreate
checkCublasStatus(cublasLt.cublasLtCreate(ctypes.byref(handle)))
File "/home/hmqf/miniconda3/envs/script_bert/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 98, in checkCublasStatus
raise RuntimeError("CUBLAS error: {}".format(
RuntimeError: CUBLAS error: CUBLAS_STATUS_NOT_INITIALIZED
Environment:
Python 3.8.10
cudatoolkit 11.3.1
I was reading the documents and the technical paper, seems like the experiment are done in single Node. Does BMInf support to multiple nodes inference deployment for large model like GLM-130?
Describe the bug
输入:
import bminf
cpm2 = bminf.models.CPM2()
result = cpm2.fill_blank("有一个服装品牌叫做<span>专门设计彩绘T恤",
top_p=0.5,
top_n=5,
temperature=0.5,
frequency_penalty=0,
presence_penalty=0
)
报错信息:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 6, in
presence_penalty=0
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 252, in fill_blank
raise RuntimeError("Unexpected model output: %d" % token)
RuntimeError: Unexpected model output: 26239
请帮忙看看是什么原因
Environment:
python3.6
torch1.8.1
Error log:
Collecting cupy-cuda90<10,>=9 (from bminf)
ERROR: Could not find a version that satisfies the requirement cupy-cuda
90<10,>=9 (from bminf) (from versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0,
4.4.1, 4.5.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 6.0.0, 6.1.0, 6.2.0, 6.3.
0, 6.4.0, 6.5.0, 6.6.0, 6.7.0, 7.0.0, 7.1.0, 7.1.1, 7.2.0, 7.3.0, 7.4.0, 7
.5.0, 7.6.0, 7.7.0, 7.8.0, 8.0.0, 8.1.0, 8.2.0, 8.3.0, 8.4.0, 8.5.0, 8.6.0
, 9.0.0a1, 9.0.0a2)
ERROR: No matching distribution found for cupy-cuda90<10,>=9 (from bminf)
/usr/local/cuda/version.txt:
CUDA Version 9.0.176
CUDA Patch Version 9.0.176.1
CUDA Patch Version 9.0.176.2
CUDA Patch Version 9.0.176.3
想了解下 对Live模型的加载与体验
I've already downloaded the complete model from https://wudaoai.cn. I'm willing to do inference with the complete model. Is it able to use BMInf's API (such as EVA.dialogue()
) to do inference ?
Describe the bug
使用docker环境,运行三个demo时后台都报错误
File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 265, in _igemm
stream.ptr
File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus
raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED
请问是什么原因,是哪个版本有问题吗?
Environment:
cuda:10.1
模型:EVA-int8
显存:12G
Describe the bug
how to load CPM1 model form local, now i used the following way:
1、build my model
model = GPT2Model(num_layers=args.num_layers,
vocab_size=args.vocab_size,
hidden_size=args.hidden_size,
num_attention_heads=args.num_attention_heads,
embedding_dropout_prob=args.hidden_dropout,
attention_dropout_prob=args.attention_dropout,
output_dropout_prob=args.hidden_dropout,
max_sequence_length=args.max_position_embeddings,
checkpoint_activations=args.checkpoint_activations,
checkpoint_num_layers=args.checkpoint_num_layers,
parallel_output=args.parallel_output)
the code from here
2、load_state_dict
load state_dict form local model
3、use wrapper to use bminf
model = bminf.wrapper(model)
Expected behavior
Screenshots
其他:
怎么wrapper 一个transformers中加载出的模型?示例中实现没看明白。
Environment:
apex 0.1
bminf 2.0.0
deepspeed 0.3.15
3. 想问一下为什么选择使用cupy直接操作cuda呢,比如allocator、igemm、fgemm的应用?这样相比使用框架(如pytorch等)实现量化有更大的好处吗?感觉cupy+cuda实现方式 要求挺高的
非常感谢
在V100下GLM的推理速度在10-20s区间内
用BMInf加速GLM,推理速度在1m以上
请问这是什么原因以及怎么加速GLM推理呢?
I am not familiar with int8. But i suppose it can not be trained like other fp32 models? Any suggestion about how to finetune it?
And does cpm2.1 has any report or paper? I did not find it anywhere.
Thank you!
File "/home/wenxuan/lihaijie_files/cpm-live/examples/tune_cpm_ant.py", line 56, in
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 274, in freeze_module
self._freeze_module_recursive(module, exclude, "") # modify the active state dict that still need grad
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316, in _freeze_module_recursive
self._freeze_module_recursive(c, exclude=exclude, prefix=next_prefix)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316, in _freeze_module_recursive
self._freeze_module_recursive(c, exclude=exclude, prefix=next_prefix)
File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 304, in _freeze_module_recursive
p.requires_grad = False
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
Cuda compilation tools, release 10.0, V10.0.130
torch 1.6.0
python 3.6
Traceback (most recent call last):
File "/home/wac/PycharmProjects/CPM-1-Generate/test.py", line 7, in
cpm2.generate(text)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/models/cpm2.py", line 219, in generate
frequency_penalty, presence_penalty, 189
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/models/cpm2.py", line 103, in pre_processing
ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 102, in _igemm
lthandle = get_handle(device)
File "/home/wac/PycharmProjects/CPM-1-Generate/env/lib/python3.6/site-packages/bminf/functions/gemm.py", line 65, in get_handle
v = cublasLt.cublasLtHandle_t()
AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'
请问CPM2.1模型所对应的vocab.txt哪里可以下载?
将输入改成
input_text = "近日,北京智源人工智能研究院和清华大学研究团队发布了以中文为核心的大规模预训练语言模型 CPM-LM,参数规模达 26 亿,预训练中文数据规模 100 GB。"
会报错
"Unexpected model output: 26239"
请问fill_blank输入的文本有什么要求?或者对要填空的词有什么要求?
用的是
cpm2 = bminf.models.CPM2()
用pip 安装的,bminf-1.0.0
Describe the bug
CUDA error was raised when importing models. This issue only happens with BMInf 1.0.x version. I could run BmInf 0.0.5 successfully. Any help would be appreciated. Thanks.
Minimal steps to reproduce
Tried the following on both WSL2 Ubuntu 20.04 with GTX 3080 16G
and native Ubuntu 18.04 with GTX 1070 8G
conda create --name bminfnew python=3.8
conda activate bminfnew
conda install cudatoolkit=11.3
pip install bminf==1.0.1
Then run
import bminf
cpm2 = bminf.models.CPM2()
Expected behavior
Start downloading the model.
Screenshots
Python 3.8.12 (default, Oct 12 2021, 13:49:34)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bminf
>>> cpm2 = bminf.models.CPM2()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 55, in __init__
SizeLimitedAllocator( self._cudaAlloc.allocate( dynamic_memory ))
File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/core/allocators/cuda.py", line 20, in allocate
ptr = cudart.cudaMalloc(nbytes).value
File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
return f(*args, **kwargs)
File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 375, in cudaMalloc
checkCUDAStatus(cuda.cudaMalloc(ctypes.byref(ptr), size))
File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 327, in checkCUDAStatus
raise RuntimeError("CUDA Runtime Error: %s" % cudaGetErrorString(error))
RuntimeError: CUDA Runtime Error: out of memory
Environment:
Tried with various cuda versions including 10.2 11.0 and 11.3
非常喜欢bminf。
请问是否支持显存按需调用。
非常感谢。
Introduction
When I load a trained gpt-2 model into BMInf and do some inference, it would produce NaN in forwarding propagation. Although I can get DEBUG INFO, I still do not know what's going wrong. Here is the log info, how can I fix it?
2021-10-08 03:12:08,611 - model - INFO - MAX_LENGTH: 1024
2021-10-08 03:12:08,622 - model - INFO - Start loading parameters from disk to cpu
2021-10-08 03:12:08,622 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: size 75027456
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: parameters 0, sub_layers 5
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - In input_embedding: ==
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 30781440
2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Out input_embedding: ==
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - In position_embedding: ==
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 1572864
2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out position_embedding: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In input_mask: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: size 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: parameters 0, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out input_mask: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layers: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: size 42670080
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: parameters 0, sub_layers 6
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In 0: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out 0: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In 1: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out 1: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In 2: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,664 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out 2: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In 3: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out 3: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In 4: ==
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out 4: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In 5: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - In self_attention: ==
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
2021-10-08 03:12:08,682 - bminf.layers.base - DEBUG - Out self_attention: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In wi: ==
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Out wi: ==
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - In wo: ==
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out wo: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out 5: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out layers: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - In encoder_final_layer_nrom: ==
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out encoder_final_layer_nrom: ==
2021-10-08 03:12:08,687 - model - INFO - Start loading parameters from cpu to gpu
2021-10-08 03:12:08,687 - model - INFO - Using static loader: total: 75027456, dynamic_memory 536870912, memory_limit 11453988864
2021-10-08 03:12:08,688 - bminf.allocator.base - INFO - Allocate 30781440
2021-10-08 03:12:08,695 - bminf.allocator.base - INFO - Allocate 1572864
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1769472
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 589824
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 6144
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 2359296
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,712 - bminf.allocator.base - INFO - Allocate 536870912
2021-10-08 03:12:08,713 - model - INFO - Cleaning useless parameters on cpu
2021-10-08 03:12:08,715 - model - INFO - End of model initialization
2021-10-08 03:12:08,715 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,859 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,860 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,861 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 18874368
2021-10-08 03:12:08,862 - model - INFO - Calc encoder layer 0
2021-10-08 03:12:08,862 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,871 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,872 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, False) Missing
2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) Missing
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,928 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,928 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) Missing
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) Missing
2021-10-08 03:12:08,931 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) Missing
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,939 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,939 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,940 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,940 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) Missing
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,941 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,942 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) Missing
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) Missing
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,943 - model - INFO - Calc encoder layer 1
2021-10-08 03:12:08,943 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,944 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,947 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,948 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,948 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,950 - model - INFO - Calc encoder layer 2
2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,952 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,954 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,955 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,956 - model - INFO - Calc encoder layer 3
2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,958 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,961 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,962 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,963 - model - INFO - Calc encoder layer 4
2021-10-08 03:12:08,963 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,964 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,964 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,970 - model - INFO - Calc encoder layer 5
2021-10-08 03:12:08,970 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,971 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 294912
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,972 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 1536
2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,974 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,975 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 49152
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 786432
2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 393216
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 128
2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 196608
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
2021-10-08 03:12:08,979 - bminf.allocator.base - INFO - Allocate 40080
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 20040, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 1, 768, 0, 1, 0) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 20040, 1, 20040, 0, 1, 20040) Missing
2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (0, 68, True, False) Missing
Loading model
Start
[[nan nan nan ... nan nan nan]]
运行generate_cpm2.py 报value error
(EVAAA) [root@localhost examples]# python generate_cpm2.py
Loading model
Input: 天空是蔚蓝色,窗外有
Output: 天空是蔚蓝色,窗外有Traceback (most recent call last):
File "generate_cpm2.py", line 32, in
main()
File "generate_cpm2.py", line 29, in main
generate(cpm2_1, input_text)
File "generate_cpm2.py", line 11, in generate
value, stoped = model.generate(
ValueError: too many values to unpack (expected 2)
ERROR in app: Exception on /api/fillblank [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2070, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1515, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1513, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1499, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "main.py", line 66, in fillBlank
result = fillblank.fillBlank(model)
File "/app/controller/fill_blank_controller.py", line 18, in fillBlank
presence_penalty = presence_penalty1)
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 151, in fill_blank
frequency_penalty, presence_penalty, 0)
File "/usr/local/lib/python3.6/dist-packages/bminf/models/cpm2.py", line 103, in pre_processing
ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
File "/usr/local/lib/python3.6/dist-packages/bminf/arch/t5/model.py", line 238, in encode
True
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/transformer_block.py", line 42, in forward
x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
File "/usr/local/lib/python3.6/dist-packages/bminf/layers/attention.py", line 63, in forward
qkv_i32
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 86, in igemm
_igemm(allocator, a, aT, b, bT, c, device, stream)
File "/usr/local/lib/python3.6/dist-packages/bminf/functions/gemm.py", line 180, in _igemm
cublasLt.checkCublasStatus( cublasLt.cublasLtMatrixTransform(lthandle, transform_desc_b, ctypes.byref(v1), b.data.ptr, layout_b, ctypes.byref(v0), 0, 0, trans_b.ptr, layout_trans_b, stream.ptr) )
File "/usr/local/lib/python3.6/dist-packages/bminf/backend/cublaslt.py", line 101, in checkCublasStatus
raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED
EVA报错
KeyError Traceback (most recent call last)
in ()
----> 1 eva2 = bminf.models.EVA2()
~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/models/eva2.py in init(self, device, memory_limit, config)
56 raise ValueError("Memory is not enough")
57
---> 58 super().init(config)
59
60 def dialogue(self,
~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/model.py in init(self, config)
73 vocab_path = data.ensure_file(config.MODEL_NAME, "vocab.txt")
74
---> 75 self.tokenizer = T5Tokenizer(vocab_path)
76
77 self.device = config.DEVICE
~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in init(self, vocab_path, max_len, max_sentinels)
81 self.translator_dec = str.maketrans("\u2582\u2583", " \n")
82
---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)]
84
85 @Property
~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in (.0)
81 self.translator_dec = str.maketrans("\u2582\u2583", " \n")
82
---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)]
84
85 @Property
KeyError: '<s_0>'
在Google Colab提供的 12G RAM,Tesla K80 GPU运行时上运行。
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2
RuntimeError Traceback (most recent call last)
in ()
25 print("Loading model")
26 cpm2_1 = bminf.models.CPM2()
---> 27 generate(cpm2_1, input_text)
in generate(model, text)
16 temperature=0.85,
17 frequency_penalty=0,
---> 18 presence_penalty=0,
19 )
20 text += value
/content/BMInf/bminf/models/cpm2.py in generate(self, input_sentence, max_tokens, top_n, top_p, temperature, frequency_penalty, presence_penalty, stop_tokens)
217 [len(input_sentence)],
218 max_tokens, top_n, top_p, temperature,
--> 219 frequency_penalty, presence_penalty, 189
220 )
221
/content/BMInf/bminf/models/cpm2.py in pre_processing(self, input_sentence, spans_position, max_tokens, top_n, top_p, temperature, frequency_penalty, presence_penalty, start_span_idx)
101 input_length = len(idx)
102
--> 103 ctx = self.encode(np.array([idx], dtype=np.int64), [input_length])
104 self.init_decoder_context(ctx)
105
/content/BMInf/bminf/arch/t5/model.py in encode(self, input_idx, input_length)
236 encoder_attn_mask,
237 x_pos,
--> 238 True
239 )
240 with calc_stream:
/content/BMInf/bminf/layers/transformer_block.py in forward(self, allocator, hidden_state, attention_mask, self_attn_position_bias, inplace)
40
41 logger.info("Encoder transformer block -- self attention")
---> 42 x = self.self_attention.forward(allocator, x, attention_mask, self_attn_position_bias)
43 assert x.dtype == cupy.float16
44 assert x.shape == (batch_size, dim_model, seq_len)
/content/BMInf/bminf/layers/attention.py in forward(self, allocator, hidden_state, attention_mask, self_attn_position_bias)
61 self.w_project_qkv.value[i:i+1],
62 False,
---> 63 qkv_i32
64 )
65 elementwise_copy_scale(
/content/BMInf/bminf/functions/gemm.py in igemm(allocator, a, aT, b, bT, c)
84 device = a.device
85 stream = cupy.cuda.get_current_stream()
---> 86 _igemm(allocator, a, aT, b, bT, c, device, stream)
87 return c
88
/content/BMInf/bminf/functions/gemm.py in _igemm(allocator, a, aT, b, bT, c, device, stream)
263 0,
264 0,
--> 265 stream.ptr
266 ))
267 if c.shape[2] != trans_ldc:
/content/BMInf/bminf/backend/cublaslt.py in checkCublasStatus(cublas_status)
99 return
100 if cublas_status in cublas_errors:
--> 101 raise RuntimeError("cublas error: %s" % cublas_errors[cublas_status])
102 else:
103 raise RuntimeError("cublas error code: %d" % cublas_status)
RuntimeError: cublas error: CUBLAS_STATUS_NOT_SUPPORTED
!git clone https://github.com/OpenBMB/BMInf.git
%cd BMInf
!python setup.py install
import bminf
import sys
def generate(model : bminf.models.CPM2, text):
print("Input: ", text)
sys.stdout.write("Output: %s" % text)
stoped = False
while not stoped:
value, stoped = model.generate(
input_sentence = text[-32:],
max_tokens=32,
top_n=5,
top_p=None,
temperature=0.85,
frequency_penalty=0,
presence_penalty=0,
)
text += value
sys.stdout.write(value)
sys.stdout.flush()
sys.stdout.write("\n")
input_text = input("请输入提示内容:")
print("Loading model")
cpm2_1 = bminf.models.CPM2()
generate(cpm2_1, input_text)
您好
我们这里收集的是医疗问答数据,想基于eva进行训练,但是我们尝试了一些方法还是没有实现,请问一下您这边实现了没有呢?谢谢您Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
我设置的memory limit为6<<30的时候,最后的显存占用是8G, 设置为1<<30的时候,就是3G, 总是多2G, 不知道为什么
Describe the bug
运行路径https://github.com/OpenBMB/BMInf 的demo时候,出现RuntimeError: Library cublasLt is not initialized错误
Minimal steps to reproduce
import bminf #成功导入
cpm2 = bminf.models.CPM2() #成功定义
cpm2.fill_blank('好') #报错 RuntimeError: Library cublasLt is not initialized
Expected behavior
Screenshots
Environment:
NVIDIA-SMI 465.19.01
Driver Version: 465.19.01
NVIDIA A40
CUDA Version: 11.3
Memory:45634MiB
Is there any comparison between BMInf and Nvidia's FasterTransformer?
I would like to use some tools to improve our model's inference performance. BMInf is great, and it seems like use CUDA implementation to boost inference performance, just like FasterTransformer. So, is there any comparison in inference time between BMInf and FasterTransformer?
Is your feature request related to a problem? Please describe.
There are other speedup methods for transformers like FasterTransformer.
Describe the solution you'd like
Can you describe how your method compares to FT method and if it can be combined and potentially show an example?
where does it optimized?
模型代码:
self.model = MyBert.from_pretrained(pretrained_model_name_or_path=model_path,)
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model.to(self.device)
self.model = bminf.wrapper(self.model)
错误信息:
input_embed = self.model.bert(**input_tokenized)["last_hidden_state"]
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 1022, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 611, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 497, in forward
self_attention_outputs = self.attention(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 427, in forward
self_outputs = self.self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 293, in forward
mixed_query_layer = self.query(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 81, in forward
out = OpLinear.apply(x, self.weight_quant, self.weight_scale)
File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 31, in forward
gemm_int8(
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/gemm.py", line 139, in gemm_int8
assert m % 4 == 0 and n % 4 == 0 and k % 4 == 0
AssertionError
ctx = self.encode(np.array([idx]),[input_length])
我提取ctx的hidden_states取平均作为句子的embedding,但发现这样做的效果不是很好,请问该如何使用CPM2.1正确的获取句子的特征表示呢?
我在尝试用CPM2.1进行文本生成时,为了生成更长的结果,我修改了下面这行代码,使程序不会在生成标点符号时就停止。
Line 216 in 59e8903
在修改代码后,我发现生成的结果result
中会有换行符(由词表中id为3的token转换而来),并且在换行后,上下文就不再连贯了,像是另起了一个段,有时候甚至话题都变了,如下图。
运行examples/generate.py时报错,上层调用栈是functions/gemm.py的第249行。
Line 249 in d40c6f5
使用的环境为:
cuda 10.1(cublas版本为10.1.0.63)
BMInf 0.0.4 通过clone + python setup.py install方式安装
torch 1.7.1
请问该工具是否直接支持 cpm2-finetune配套的cpm2模型(需要到智源页面申请);
我下载了中英文模型,100亿参数,vocab大小51967;本来有4个单独的文件,我按照官方脚本将其合并成1个单文件模型,测试显示没问题;
修改一些参数后,用bminf 下的 example/generate_cpm2.py加载我合并的单文件模型进行测试,无法加载,错误如下:
我发现应该是要加载压缩量化等技术处理之后的模型,tool下有个migrate_cpm2.py,我用它做了量化工作,得到11g的模型;建议可以把文档写的详细一点。
我用generate_cpm2.py加载上面 量化后的11g模型,推理时设置最多生成100个字,查看显存占用要用13g+(A100),不知道要怎么做到你们 doc 说的 显存调度,可以在2080ti下跑推理(2080ti只有11g显存)?
请问怎么样把模型的模块拆分到不同gpu?这样可以解决第3步11g显存不够用的问题。比如把encoder、decoder分配到不同的gpu。我看模型构建并不是用torch等框架,数据迁移到显存主要靠with device 和 allocator好像,所以没太懂怎么把不同模块分配到不同gpu;
@a710128 期望回复,谢谢
Failed to run with P100 GPU which works fine with other pytorch cuda code, GPU info:
GPU Device 0: "Pascal" with compute capability 6.0
Compute 6.0 CUDA device: [Tesla P100-PCIE-16GB]
error trace:
/opt/conda/lib/python3.8/site-packages/cpm_kernels/library/cuda.py in checkCUStatus(error) 214 def checkCUStatus(error : int) -> None: 215 if error != CUDA_SUCCESS: --> 216 raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error)) 217 218 @cuda.bind("cuDriverGetVersion", [ctypes.POINTER(ctypes.c_int)], CUresult) RuntimeError: CUDA Error: no kernel image is available for execution on the device
What the minimal compute version does cpm-kernels need?
when running examples/fill_blank.py, get error:
AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'
cuda version is 10.0
have successfully installed bminf 0.4.0
any idea how to solve this problem?
when I want to get parameters,I got the empty [] from cpm1 or cpm2 named_parameters()?
我看推理只有"生成"和"填空",请问自动问答用BMInf怎么进行推理呢
Is your feature request related to a problem? Please describe.
For example I cannot get HF Bert working. I don't know when I can use your project
import bminf
import torch
encoded_input_cpu = tokenizer(text, return_tensors='pt').to('cpu')
model = BertModel.from_pretrained("bert-base-uncased").to('cpu')
# apply wrapper
with torch.cuda.device(0):
model = bminf.wrapper(model.to('cpu'))
with print_time_delta('generate'):
output = model(**encoded_input_cpu)
Describe the solution you'd like
Can you provide full examples with some known models from the HF in a Collab Notebook?
Is your feature request related to a problem? Please describe.
有时候跑模型的服务器是物理断网的,需要手动下载模型上传后再加载。
从前(0.0.4版本)可以通过设置config的MODEL_NAME实现本地加载,但代码更新到1.0.0以后不能这样做了(除非修改BMInf的源码)。
Describe the solution you'd like
在初始化模型时提供一个接口,指定本地路径进行加载(可能通过修改现有的version
字段实现)。
Describe alternatives you've considered
无。
其他:请问0.0.4到1.0.0之间,CPM2.1模型是否更新过?使用1.0.0的代码加载0.0.4时期下载的模型时报错了。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.