biorex's People
biorex's Issues
Request support on data input sample and output sample just for prediction
I used AIONER output to extract relations, but it didn't work. Went through the issues and found the example to be in BioRED repo. Want to know how to create such data and a sample output, about how the predict.pubtator will look like
questions about run_test_pred.sh output
I use run_test_pred_sh to predict my input data which has been transformed into BioRED format. In the run_test_pred.sh script, there are two outputs: out_tsv_file and out_pubtator_files. When I run the run_test_pred_sh, the prediction is successfully added in the out_pubtator_files but why does the "out_tsv_file" result is still none?
You can see the result in that zip file. The "out_pubtator_files" file has prediction results but the out_tsv_file has "None". Is this normal?
I also provide my input data as well in that attached file.
10_sentences_sample.zip
Note:
I have tried using the input example from BioRED file that you gave to me and the out_tsv_file result is still "None".
ImportError: cannot import name 'TFTrainer' from 'transformers' and cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
Hi, I am getting this error, Please check the code from your side if some files are missing while prediction and training. please help me out to solve this issue. Thank you
I have used this environment setting as given to me for BIORED
(Ubuntu 22.04.2 LTS)
GPU: RTX 3040
- Setting up
conda create -n py39 python=3.9
conda activate py39
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python.exe -m pip install --upgrade pip
python -m pip install "tensorflow==2.10"
Then you can run the below Python script to check whether you can access GPU.
import tensorflow as tf
print(tf.version)
print(len(tf.config.list_physical_devices('GPU')))
print(tf.test.is_built_with_cuda())
print(tf.test.is_gpu_available())
build_info = tf.sysconfig.get_build_info()
cuda_version = build_info["cuda_version"]
cudnn_version = build_info["cudnn_version"]
print("CUDA version TensorFlow was built with:", cuda_version)
print("cuDNN version TensorFlow was built with:", cudnn_version)
Install requirements
pip install -r requirements.txt
Here is my requirements.txt
transformers == 4.18.0
accelerate == 0.9.0
pandas == 1.1.5
numpy == 1.20.0
datasets == 2.3.2
sentencepiece != 0.1.92
protobuf == 3.19.4
scispacy == 0.2.4
tensorflow == 2.9.3
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
error
bash scripts/run_test_pred.sh 0
bash: /home/khyati/anaconda3/envs/biorex/lib/libtinfo.so.6: no version information available (required by bash)
Converting the dataset into BioREx input format
2024-05-27 12:33:24.619028: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:24.800147: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-27 12:33:24.865322: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-27 12:33:25.520537: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:25.520634: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:25.520643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
number_unique_YES_instances 0
Generating RE predictions
2024-05-27 12:33:28.647477: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:28.822023: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-27 12:33:28.870588: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-27 12:33:29.678845: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:29.678988: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:29.679003: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[INFO|training_args.py:804] 2024-05-27 12:33:32,063 >> using logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2024-05-27 12:33:32,063 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-05-27 12:33:32,092 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-05-27 12:33:32,093 >> Tensorflow: setting up strategy
2024-05-27 12:33:32.101113: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:32.743930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21219 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:31:00.0, compute capability: 8.9
05/27/2024 12:33:32 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
05/27/2024 12:33:32 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=biorex_model/runs/May27_12-33-32_microcrispr7,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=10.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=biorex_model,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=16,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=biorex_model,
save_on_each_node=False,
save_steps=10,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
xpu_backend=None,
)
Traceback (most recent call last):
File "/Khyati/BioREx-main/src/run_ncbi_rel_exp.py", line 884, in
main()
File "/Khyati/BioREx-main/src/run_ncbi_rel_exp.py", line 606, in main
tokenizer = AutoTokenizer.from_pretrained(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 471, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 332, in get_tokenizer_config
resolved_config_file = get_file_from_repo(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 678, in get_file_from_repo
resolved_file = cached_path(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 282, in cached_path
output_path = get_from_cache(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 545, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
2024-05-27 12:33:35.580800: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:35.691247: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-27 12:33:35.720913: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-27 12:33:36.168418: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:36.168508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:36.168517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loaded 0 parameters in the TF 2.0 model. Some weights of the model were not used when initializing the TF 2.0 model TFBertForSequenceClassification
I tried to run ’scripts/run_biorex_exp.sh‘ to reproduce your results, but ran into the following trouble:
When I test directly with microsoft pre-trained models, the warning prompts:
[INFO|modeling_tf_utils.py:2830] 2024-02-29 20:59:44,616 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:184] 2024-02-29 20:59:45,290 >> Loading PyTorch weights from /root/autodl-tmp/BioREx/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:187] 2024-02-29 20:59:45,855 >> PyTorch checkpoint contains 133,577,846 parameters
[INFO|modeling_tf_pytorch_utils.py:344] 2024-02-29 20:59:46,677 >> Loaded 0 parameters in the TF 2.0 model.
When I run the ‘scripts/run_biorex_exp.sh’ and predict using the model I got from my own training, it warns again. No parameters are loaded in and the result is really poor:
Loaded 0 parameters in the TF 2.0 model.
But when I use your pre-trained weights to predict directly it is normal.
loading weights file pretrained_model_biolinkbert/tf_model.h5
[WARNING|modeling_tf_utils.py:2966] 2024-02-29 20:55:57,942 >> All model checkpoint layers were used when initializing TFBertForSequenceClassification.[WARNING|modeling_tf_utils.py:2975] 2024-02-29 20:55:57,942 >> All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at pretrained_model_biolinkbert.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.
Some rows contain ''None" labels.
When I run bash scripts/build_biorex_datasets.sh
, the error occurs.
BioREx/src/run_ncbi_rel_exp.py
Line 112 in 0031c52
Here, the code emits an error, because it cannot find a label when the label is ''.
I think the
BioREx/src/run_ncbi_rel_exp.py
Line 148 in 0031c52
The None label is changed into "'' for this code.
run_test_pred.sh can't run on CPU
Hello, I am trying to run your code for "predicting new data" part. I follow all your command from creating new environment, install the requirement based on your txt file, until running the bash script. However, when I ran the bash script, I get this issue:
(biorex) [michaela95@BLOOM BioREx]$ bash scripts/run_biorex_new.sh
Converting the dataset into BioREx input format
2024-07-04 14:00:33.031276: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:33.054684: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:33.054743: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:33.071248: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:34.221688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
number_unique_YES_instances 0
Generating RE predictions
2024-07-04 14:00:38.500149: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:38.524087: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:38.524137: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:38.540547: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:39.539068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|training_args.py:804] 2024-07-04 14:00:42,112 >> using logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2024-07-04 14:00:42,112 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-07-04 14:00:42,113 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-07-04 14:00:42,114 >> Tensorflow: setting up strategy
07/04/2024 14:00:42 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
07/04/2024 14:00:42 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=biorex_model/runs/Jul04_14-00-42_BLOOM,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=10.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=biorex_model,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=16,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=biorex_model,
save_on_each_node=False,
save_steps=10,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
xpu_backend=None,
)
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/vocab.txt
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/tokenizer.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/added_tokens.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/special_tokens_map.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/tokenizer_config.json
=======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8, 'None-CID': 9, 'CID': 10, 'None-PPIm': 11, 'PPIm': 12, 'None-AIMED': 13, 'None-DDI': 14, 'None-BC7': 15, 'None-phargkb': 16, 'None-GDA': 17, 'None-DISGENET': 18, 'None-EMU_BC': 19, 'None-EMU_PC': 20, 'None-HPRD50': 21, 'None-PHARMGKB': 22, 'ACTIVATOR': 23, 'AGONIST': 24, 'AGONIST-ACTIVATOR': 25, 'AGONIST-INHIBITOR': 26, 'ANTAGONIST': 27, 'DIRECT-REGULATOR': 28, 'INDIRECT-DOWNREGULATOR': 29, 'INDIRECT-UPREGULATOR': 30, 'INHIBITOR': 31, 'PART-OF': 32, 'PRODUCT-OF': 33, 'SUBSTRATE': 34, 'SUBSTRATE_PRODUCT-OF': 35, 'mechanism': 36, 'int': 37, 'effect': 38, 'advise': 39, 'AIMED-Association': 40, 'HPRD-Association': 41, 'EUADR-Association': 42, 'None-EUADR': 43, 'Indirect_conversion': 44, 'Non_conversion': 45}
=======================>positive_label
=======================>use_balanced_neg False
=======================>max_neg_scale 2
07/04/2024 14:00:42 - INFO - main - pos_label_ids
07/04/2024 14:00:42 - INFO - main - [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 16, 17, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]
[INFO|configuration_utils.py:652] 2024-07-04 14:00:42,153 >> loading configuration file pretrained_model/config.json
[INFO|configuration_utils.py:690] 2024-07-04 14:00:42,154 >> Model config BertConfig {
"_name_or_path": "pretrained_model",
"architectures": [
"BertForSequenceClassification"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"finetuning_task": "text-classification",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "None",
"1": "Association",
"2": "Bind",
"3": "Comparison",
"4": "Conversion",
"5": "Cotreatment",
"6": "Drug_Interaction",
"7": "Negative_Correlation",
"8": "Positive_Correlation",
"9": "None-CID",
"10": "CID",
"11": "None-PPIm",
"12": "PPIm",
"13": "None-AIMED",
"14": "None-DDI",
"15": "None-BC7",
"16": "None-phargkb",
"17": "None-GDA",
"18": "None-DISGENET",
"19": "None-EMU_BC",
"20": "None-EMU_PC",
"21": "None-HPRD50",
"22": "None-PHARMGKB",
"23": "ACTIVATOR",
"24": "AGONIST",
"25": "AGONIST-ACTIVATOR",
"26": "AGONIST-INHIBITOR",
"27": "ANTAGONIST",
"28": "DIRECT-REGULATOR",
"29": "INDIRECT-DOWNREGULATOR",
"30": "INDIRECT-UPREGULATOR",
"31": "INHIBITOR",
"32": "PART-OF",
"33": "PRODUCT-OF",
"34": "SUBSTRATE",
"35": "SUBSTRATE_PRODUCT-OF",
"36": "mechanism",
"37": "int",
"38": "effect",
"39": "advise",
"40": "AIMED-Association",
"41": "HPRD-Association",
"42": "EUADR-Association",
"43": "None-EUADR",
"44": "Indirect_conversion",
"45": "Non_conversion"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"label2id": {
"ACTIVATOR": 23,
"AGONIST": 24,
"AGONIST-ACTIVATOR": 25,
"AGONIST-INHIBITOR": 26,
"AIMED-Association": 40,
"ANTAGONIST": 27,
"Association": 1,
"Bind": 2,
"CID": 10,
"Comparison": 3,
"Conversion": 4,
"Cotreatment": 5,
"DIRECT-REGULATOR": 28,
"Drug_Interaction": 6,
"EUADR-Association": 42,
"HPRD-Association": 41,
"INDIRECT-DOWNREGULATOR": 29,
"INDIRECT-UPREGULATOR": 30,
"INHIBITOR": 31,
"Indirect_conversion": 44,
"Negative_Correlation": 7,
"Non_conversion": 45,
"None": 0,
"None-AIMED": 13,
"None-BC7": 15,
"None-CID": 9,
"None-DDI": 14,
"None-DISGENET": 18,
"None-EMU_BC": 19,
"None-EMU_PC": 20,
"None-EUADR": 43,
"None-GDA": 17,
"None-HPRD50": 21,
"None-PHARMGKB": 22,
"None-PPIm": 11,
"None-phargkb": 16,
"PART-OF": 32,
"PPIm": 12,
"PRODUCT-OF": 33,
"Positive_Correlation": 8,
"SUBSTRATE": 34,
"SUBSTRATE_PRODUCT-OF": 35,
"advise": 39,
"effect": 38,
"int": 37,
"mechanism": 36
},
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.18.0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 28933
}
[INFO|modeling_tf_utils.py:1776] 2024-07-04 14:00:42,177 >> loading weights file pretrained_model/tf_model.h5
/media/data/biorex/lib/python3.10/site-packages/keras/src/layers/layer.py:1331: UserWarning: Layer 'tf_bert_for_sequence_classification' looks like it has unbuilt state, but Keras is not able to trace the layer call()
in order to build it automatically. Possible causes:
- The
call()
method of your layer may be crashing. Try to__call__()
the layer eagerly on some test input first to see if it works. E.g.x = np.random.random((3, 4)); y = layer(x)
- If the
call()
method is correct, then you may need to implement thedef build(self, input_shape)
method on your layer. It should create all variables used by the layer (e.g. by callinglayer.build()
on all its children layers).
Exception encountered: ''Exception encountered when calling TFBertMainLayer.call().
'NoneType' object has no attribute 'shape'
Arguments received by TFBertMainLayer.call():
• input_ids=tf.Tensor(shape=(3, 5), dtype=int32)
• attention_mask=None
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• past_key_values=None
• use_cache=None
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False''
warnings.warn(
/media/data/biorex/lib/python3.10/site-packages/keras/src/layers/layer.py:372: UserWarning: build()
was called on layer 'tf_bert_for_sequence_classification', however the layer does not have a build()
method implemented and it looks like it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which may cause failures down the line. Make sure to implement a proper build()
method.
warnings.warn(
Traceback (most recent call last):
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/run_ncbi_rel_exp.py", line 884, in
main()
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/run_ncbi_rel_exp.py", line 687, in main
model = TFAutoModelForSequenceClassification.from_pretrained(
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1803, in from_pretrained
model(model.dummy_inputs) # build the network with dummy inputs
File "/media/data/biorex/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 1633, in call
outputs = self.bert(
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 850, in call
encoder_outputs = self.encoder(
File "/media/data/biorex/lib/python3.10/site-packages/optree/ops.py", line 594, in tree_map
return treespec.unflatten(map(func, *flat_args))
AttributeError: Exception encountered when calling TFBertMainLayer.call().
'NoneType' object has no attribute 'shape'
Arguments received by TFBertMainLayer.call():
• input_ids=tf.Tensor(shape=(3, 5), dtype=int32)
• attention_mask=None
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• past_key_values=None
• use_cache=None
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
2024-07-04 14:00:46.035576: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:46.059361: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:46.059424: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:46.075918: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:47.150909: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 1557, in
dump_pred_2_pubtator_file(in_pubtator_file = in_test_pubtator_file,
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 206, in dump_pred_2_pubtator_file
add_relation_pairs_dict(
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 83, in add_relation_pairs_dict
testdf = pd.read_csv(in_gold_tsv_file, sep="\t", index_col=0)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1620, in init
self._engine = self._make_engine(f, self.engine)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
return mapping[engine](f, **self.options)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 581, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
Would you tell what should I do in dealing with this situation?
Can you proivde a script or an explanation to reproduce scores in your paper?
In the paper (https://arxiv.org/abs/2306.11189), you wrote the scores below. Can you kindly provide a way to reproduce this?
For example, with the model you provided in the repo BioREx PubMedBERT model (Original) and BioREx BioLinkBERT model (Preferred), what score can I get? And how can I get the score?
When I run with BioREx PubMedBERT model (Original) using the code you suggest bash scripts/run_test_pred.sh
, I got
Overall 966 652 263 314 0.7125683060109289 0.6749482401656315 0.6932482721956407
in the file locaed in "out_result_file" parameter.
I think it would be precision, recall, f1 score, but then I am not sure I can get 79.6 in this case(BioRED+8 datasets in your paper).
If I misunderstood something, please let me know.
And again, if you provide a specific parameters to reproduce the scores in the paper (including the baseline approaches like TL(Transfer learning) or MTL(Multi-Task Learning), it would be great help for me as well.
run_biorex_exp.sh doesn't seem to use GPU
I tried to run the script but I don't think it is using a GPU. I stopped the code before this line (https://github.com/ncbi/BioREx/blob/0031c52d2dd0d7fb8e3b84e256d8fbf73c4bd463/src/tf_wrapper.py#L199C28-L200C1), and check it whether using GPU or not, but I got this.
In [16]: self.model.layers[0].variables[0].device
Out[16]: '/job:localhost/replica:0/task:0/device:CPU:0'
Can you kindly check the script or runfile to use GPU to train?
And I think it is better to change the static gpu variable in the script into dynamic one as you put in the other script. I also put that in the pull request(#4)
BioREx/scripts/run_biorex_exp.sh
Line 3 in 0031c52
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
how to solve this problem? i meet this error when i run the model.I need help!thanks!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.