zhuiyitechnology / roformer-sim Goto Github PK
View Code? Open in Web Editor NEWSimBERT升级版(SimBERTv2)!
License: Apache License 2.0
SimBERT升级版(SimBERTv2)!
License: Apache License 2.0
您好!
我这边下载了Chinese_roformer-sim-char-ft_L12_H-768_A-12这个模型的权重,但是在加载时,vocab.txt会一直说无法用utf-8编码load。我对比了一下simbert1的vocab文件好像没有这种情况。
我直接读取simbert1的vocab文件作为替代进行生成时,基本都是生僻字(看上去就很像乱码)组成的读不通的句子。。可能是我训练不充分,也有可能是词表不一样
这里有没有办法重新提供一下utf-8编码的Chinese_roformer-sim-char-ft_L12_H-768_A-12的vocab.txt,我不确定simbert1的词表跟simbert2一模一样
万分感谢!
请问输入[t1]和输入[t1,t2],roformer-sim模型输出的768维向量本质有区别吗([cls]句向量)?输入[t1,t2]是t1拼接t2当成一句话处理的吗?
在16G V100上运行,占用内存接近16个G。是模型本身这么大吗?还是为啥呢
有没有TF2版本的?
如题,求问!实在没看懂代码(哭脸)
如题,之前就想会不会有roformer+simbert的版本,结果就刷出来了:)
请问,数据文件如何获得?
我在使用supervised.py代码(https://raw.githubusercontent.com/ZhuiyiTechnology...
微调chinese_roformer-sim-char-ft_L-12_H-768_A-12时,每次跑出来的成绩相差巨大,有时候accuracy能有0.9,有时候0.1都不到,请问您知道是什么原因吗?(数据集是蚂蚁金服的文本匹配数据https://www.kaggle.com/zephyrwang666/afqmc-public)
请老师指点🙏
问题已经解决
大佬,loss中的compute_loss_of_similarity方法,return loss是不是应该换成return K.mean(loss)
机器系统为windows
TensorFlow-gpu 为 1.14 ,CUDA为10.0 GPU测试tf确实可用
显卡显存为22G
`from bert4keras.models import build_transformer_model
config_path = 'D:/pycharm/字段引用/chinese_simbert_L-4_H-312_A-12/bert_config.json'
checkpoint_path = 'D:/pycharm/字段引用/chinese_simbert_L-4_H-312_A-12/bert_model.ckpt'
roformer = build_transformer_model(
config_path,
checkpoint_path,
model='roformer',
application='unilm',
with_pool='linear'
)`
已经加载是tiny模型了,还是在build_transformer_model 时会报错 ,2023-05-23 16:31:14.506402: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 19.69G (21146212608 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2023-05-23 16:31:14.705061: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 17.72G (19031590912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti WDDM | 00000000:01:00.0 On | N/A |
| 77% 77C P8 41W / 260W| 496MiB / 22528MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
请问这是为什么呢?
CPU版本是可以用的。
看了一下代码,Stage 2似乎没有用到Stage 1训练好的model,是需要自己修改Stage 2的代码,导入Stage 1训好的model吗?如果是的话,Stage 2的训练样本和Stage 1的训练样本需要有什么区别呢?
看了一下链接里的两篇文章,好像也没有分两个Stage来训练呀(可能是我没看懂)
NLP小白挥泪求助大佬
感谢贵公司的开源!
我在gpu
上运行generate.py
时报以下错误。
2022-01-18 20:50:55.799119: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfuld dynamic library libcublas.so.10.0
2022-01-18 20:52:03.633057: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2022-01-18 20:52:03.633118: E tensorflow/stream_executor/cuda/cuda_blas.cc:2301] Internal: failed BLAS call, see log for details
File "/usr/local/miniconda3/lib/python3.6/site-packages/bert4keras/snippets.py", line 627, in random_sample
inputs, output_ids, states, temperature, 'probas'
File "/usr/local/miniconda3/lib/python3.6/site-packages/bert4keras/snippets.py", line 525, in new_predict
prediction = predict(self, inputs, output_ids, states)
File "example_generate.py", line 52, in predict
return self.last_token(seq2seq).predict([token_ids, segment_ids])
File "/usr/local/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/usr/local/miniconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 324, in predict_loop
batch_outs = f(ins_batch)
File "/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "/usr/local/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[12,11,64], b.shape=[12,64,11], m=11, n=11, k=64, batch_size=12
[[{{node Transformer-0-MultiHeadSelfAttention/einsum/MatMul}}]]
[[lambda_1/strided_slice/_1229]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[12,11,64], b.shape=[12,64,11], m=11, n=11, k=64, batch_size=12
[[{{node Transformer-0-MultiHeadSelfAttention/einsum/MatMul}}]]
0 successful operations.
0 derived errors ignored.
google了一下,说是需要添加如下代码:
# 第一种情况
cfg = tf.ConfigProto()
cfg.gpu_options.allow_growth=True
cfg.gpu_options.per_process_gpu_memory_fraction = 0.90
sess = tf.Session(config=cfg)
# 第二种情况
import keras
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # TensorFlow按需分配显存
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # 指定显存分配比例
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))
我发现我无论是第一种情况还是第二种情况都会报图片中的错误。我用的显卡是A100-SXM4-40GB,安装的是
Package Version
absl-py 0.15.0
asn1crypto 0.24.0
astor 0.8.1
astunparse 1.6.3
bert4keras 0.10.6
cached-property 1.5.2
cachetools 4.2.4
certifi 2018.4.16
cffi 1.11.5
chardet 3.0.4
charset-normalizer 2.0.10
clang 5.0
conda 4.5.4
cryptography 2.2.2
flatbuffers 1.12
gast 0.4.0
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.43.0
h5py 3.1.0
idna 2.6
importlib-metadata 1.6.0
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.2.2
numpy 1.19.5
oauthlib 3.1.1
opt-einsum 3.3.0
pip 20.1
protobuf 3.12.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycosat 0.6.3
pycparser 2.18
pyOpenSSL 18.0.0
PySocks 1.6.8
PyYAML 6.0
requests 2.27.1
requests-oauthlib 1.3.0
rsa 4.8
ruamel-yaml 0.15.37
scipy 1.5.4
setuptools 46.4.0
six 1.15.0
tensorboard 1.14.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
typing-extensions 3.7.4.3
urllib3 1.22
Werkzeug 1.0.1
wheel 0.37.1
wrapt 1.12.1
zipp 3.1.0
我用如下代码测试我的gpu,发现是可以用的。
# 测试gpu是否可以用
import tensorflow as tf
tf.test.is_gpu_available()
请指导一下,如何可以用GPU运行generate.py
。
感谢万分,预祝新年快乐~
https://blog.csdn.net/hailongzhang26/article/details/118937909
分享一下转onnx模型,已成功运行。
请问一下outputs = TotalLoss([2, 3])(roformer.inputs + roformer.outputs)这句有点没看懂,可以简单讲一下吗?
你好!
运行train/stage1.py 报缺少文件 /root/data_pretrain/synonyms_shuf.json
寻求帮助一下!谢谢
按照博客里说的“具体来说,Sentence-BERT是将u,v,|u−v|(其中|u−v|是指u−v的每个元素都取绝对值后构成的向量)拼接起来做为特征,后面接一个全连接层做2分类(如果是NLI数据集则是3分类)。”
为什么在train/supervised.py里output = keras.layers.Dense(5, use_bias=False)(output)
data_path = './glue/data/'
datasets_1 = []
for task_name in ['ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STS-B', 'SOHU21-SSB']:
for f in ['train', 'valid']:
threshold = 2.5 if task_name == 'STS-B' else 0.5
filename = '%s%s/%s.%s.data' % (data_path, task_name, task_name, f)
datasets_1 += load_data_1(filename, threshold)
有点没看懂如何读取自己的数据集,另外数据格式必须是 (文本1, 文本2, 标签) 这样的吗?
请问用自己的语料在stage1.py中训练后保存的weights文件,该如何加载?谢谢!
请问我在16g显存的gpu上测试 roformer-sim,总是会有如下报错:
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[192,17,64], b.shape=[192,17,64], m=17, n=17, k=64, batch_size=192
[[node model_3/Transformer-0-MultiHeadSelfAttention/einsum/Einsum (defined at /anaconda3/envs/nlp_tf/lib/python3.7/site-packages/bert4keras/layers.py:445) ]] [Op:__inference_predict_function_22308]
Errors may have originated from an input operation.
Input Source operations connected to node model_3/Transformer-0-MultiHeadSelfAttention/einsum/Einsum:
model_3/Transformer-0-MultiHeadSelfAttention/add_1 (defined at /anaconda3/envs/nlp_tf/lib/python3.7/site-packages/bert4keras/layers.py:443)
model_3/Transformer-0-MultiHeadSelfAttention/add (defined at /anaconda3/envs/nlp_tf/lib/python3.7/site-packages/bert4keras/layers.py:440)
Function call stack:
predict_function
请问这是什么原因呢?
如题,请问下蒸馏阶段时用到的数据是公开数据集合BQ、ATEC等的train、valid集合吗
复现的指标和文章说的指标差别有一点大
如何修改代码
如题
bert_model.ckpt.data-00000-of-00001文件已损坏
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.