ncbi / biorex Goto Github PK

View Code? Open in Web Editor NEW

25.0 25.0 9.0 62 KB

Shell 3.55% Python 96.45%

biorex's People

Contributors

Stargazers

Watchers

Forkers

qpc-github melsiddieg menggf bioin-401-project-8 xuanxie123 cristina-gabriela dongheechoi

biorex's Issues

Request support on data input sample and output sample just for prediction

I used AIONER output to extract relations, but it didn't work. Went through the issues and found the example to be in BioRED repo. Want to know how to create such data and a sample output, about how the predict.pubtator will look like

questions about run_test_pred.sh output

I use run_test_pred_sh to predict my input data which has been transformed into BioRED format. In the run_test_pred.sh script, there are two outputs: out_tsv_file and out_pubtator_files. When I run the run_test_pred_sh, the prediction is successfully added in the out_pubtator_files but why does the "out_tsv_file" result is still none?
You can see the result in that zip file. The "out_pubtator_files" file has prediction results but the out_tsv_file has "None". Is this normal?
I also provide my input data as well in that attached file.
10_sentences_sample.zip

Note:
I have tried using the input example from BioRED file that you gave to me and the out_tsv_file result is still "None".

ImportError: cannot import name 'TFTrainer' from 'transformers' and cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory

Hi, I am getting this error, Please check the code from your side if some files are missing while prediction and training. please help me out to solve this issue. Thank you

I have used this environment setting as given to me for BIORED
(Ubuntu 22.04.2 LTS)
GPU: RTX 3040

Setting up
conda create -n py39 python=3.9
conda activate py39
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python.exe -m pip install --upgrade pip
python -m pip install "tensorflow==2.10"
Then you can run the below Python script to check whether you can access GPU.

import tensorflow as tf
print(tf.version)
print(len(tf.config.list_physical_devices('GPU')))
print(tf.test.is_built_with_cuda())
print(tf.test.is_gpu_available())

build_info = tf.sysconfig.get_build_info()
cuda_version = build_info["cuda_version"]
cudnn_version = build_info["cudnn_version"]
print("CUDA version TensorFlow was built with:", cuda_version)
print("cuDNN version TensorFlow was built with:", cudnn_version)
Install requirements
pip install -r requirements.txt
Here is my requirements.txt

transformers == 4.18.0
accelerate == 0.9.0
pandas == 1.1.5
numpy == 1.20.0
datasets == 2.3.2
sentencepiece != 0.1.92
protobuf == 3.19.4
scispacy == 0.2.4
tensorflow == 2.9.3
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

error
bash scripts/run_test_pred.sh 0
bash: /home/khyati/anaconda3/envs/biorex/lib/libtinfo.so.6: no version information available (required by bash)
Converting the dataset into BioREx input format
2024-05-27 12:33:24.619028: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:24.800147: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-05-27 12:33:24.865322: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-27 12:33:25.520537: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:25.520634: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:25.520643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
number_unique_YES_instances 0
Generating RE predictions
2024-05-27 12:33:28.647477: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:28.822023: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-05-27 12:33:28.870588: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-27 12:33:29.678845: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:29.678988: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:29.679003: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[INFO|training_args.py:804] 2024-05-27 12:33:32,063 >> using logging_steps to initialize eval_steps to 10
[INFO|training_args.py:1023] 2024-05-27 12:33:32,063 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-05-27 12:33:32,092 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-05-27 12:33:32,093 >> Tensorflow: setting up strategy
2024-05-27 12:33:32.101113: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:32.743930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21219 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:31:00.0, compute capability: 8.9
05/27/2024 12:33:32 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
05/27/2024 12:33:32 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=biorex_model/runs/May27_12-33-32_microcrispr7,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=10.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=biorex_model,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=16,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=biorex_model,
save_on_each_node=False,
save_steps=10,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
xpu_backend=None,
)
Traceback (most recent call last):
File "/Khyati/BioREx-main/src/run_ncbi_rel_exp.py", line 884, in
main()
File "/Khyati/BioREx-main/src/run_ncbi_rel_exp.py", line 606, in main
tokenizer = AutoTokenizer.from_pretrained(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 471, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 332, in get_tokenizer_config
resolved_config_file = get_file_from_repo(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 678, in get_file_from_repo
resolved_file = cached_path(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 282, in cached_path
output_path = get_from_cache(
File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 545, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
2024-05-27 12:33:35.580800: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 12:33:35.691247: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-05-27 12:33:35.720913: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-27 12:33:36.168418: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:36.168508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib
2024-05-27 12:33:36.168517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Loaded 0 parameters in the TF 2.0 model. Some weights of the model were not used when initializing the TF 2.0 model TFBertForSequenceClassification

I tried to run ’scripts/run_biorex_exp.sh‘ to reproduce your results, but ran into the following trouble:
When I test directly with microsoft pre-trained models, the warning prompts:

[INFO|modeling_tf_utils.py:2830] 2024-02-29 20:59:44,616 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:184] 2024-02-29 20:59:45,290 >> Loading PyTorch weights from /root/autodl-tmp/BioREx/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:187] 2024-02-29 20:59:45,855 >> PyTorch checkpoint contains 133,577,846 parameters
[INFO|modeling_tf_pytorch_utils.py:344] 2024-02-29 20:59:46,677 >> Loaded 0 parameters in the TF 2.0 model.

When I run the ‘scripts/run_biorex_exp.sh’ and predict using the model I got from my own training, it warns again. No parameters are loaded in and the result is really poor：

Loaded 0 parameters in the TF 2.0 model.

But when I use your pre-trained weights to predict directly it is normal.

loading weights file pretrained_model_biolinkbert/tf_model.h5
[WARNING|modeling_tf_utils.py:2966] 2024-02-29 20:55:57,942 >> All model checkpoint layers were used when initializing TFBertForSequenceClassification.

[WARNING|modeling_tf_utils.py:2975] 2024-02-29 20:55:57,942 >> All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at pretrained_model_biolinkbert.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.

Some rows contain ''None" labels.

When I run bash scripts/build_biorex_datasets.sh, the error occurs.

BioREx/src/run_ncbi_rel_exp.py

Line 112 in 0031c52

label = label2id[ex[self.label_name]]

Here, the code emits an error, because it cannot find a label when the label is ''.

I think the

BioREx/src/run_ncbi_rel_exp.py

Line 148 in 0031c52

data_df = pd.read_csv(data_file, sep='\t', dtype=str).fillna(np.str_(''))

should be changed.
The None label is changed into "'' for this code.

run_test_pred.sh can't run on CPU

Hello, I am trying to run your code for "predicting new data" part. I follow all your command from creating new environment, install the requirement based on your txt file, until running the bash script. However, when I ran the bash script, I get this issue:

(biorex) [michaela95@BLOOM BioREx]$ bash scripts/run_biorex_new.sh
Converting the dataset into BioREx input format
2024-07-04 14:00:33.031276: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:33.054684: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:33.054743: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:33.071248: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:34.221688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
number_unique_YES_instances 0
Generating RE predictions
2024-07-04 14:00:38.500149: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:38.524087: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:38.524137: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:38.540547: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:39.539068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|training_args.py:804] 2024-07-04 14:00:42,112 >> using logging_steps to initialize eval_steps to 10
[INFO|training_args.py:1023] 2024-07-04 14:00:42,112 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-07-04 14:00:42,113 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-07-04 14:00:42,114 >> Tensorflow: setting up strategy
07/04/2024 14:00:42 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
07/04/2024 14:00:42 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=biorex_model/runs/Jul04_14-00-42_BLOOM,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=10.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=biorex_model,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=16,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=biorex_model,
save_on_each_node=False,
save_steps=10,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
xpu_backend=None,
)
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/vocab.txt
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/tokenizer.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/added_tokens.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/special_tokens_map.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/tokenizer_config.json
=======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8, 'None-CID': 9, 'CID': 10, 'None-PPIm': 11, 'PPIm': 12, 'None-AIMED': 13, 'None-DDI': 14, 'None-BC7': 15, 'None-phargkb': 16, 'None-GDA': 17, 'None-DISGENET': 18, 'None-EMU_BC': 19, 'None-EMU_PC': 20, 'None-HPRD50': 21, 'None-PHARMGKB': 22, 'ACTIVATOR': 23, 'AGONIST': 24, 'AGONIST-ACTIVATOR': 25, 'AGONIST-INHIBITOR': 26, 'ANTAGONIST': 27, 'DIRECT-REGULATOR': 28, 'INDIRECT-DOWNREGULATOR': 29, 'INDIRECT-UPREGULATOR': 30, 'INHIBITOR': 31, 'PART-OF': 32, 'PRODUCT-OF': 33, 'SUBSTRATE': 34, 'SUBSTRATE_PRODUCT-OF': 35, 'mechanism': 36, 'int': 37, 'effect': 38, 'advise': 39, 'AIMED-Association': 40, 'HPRD-Association': 41, 'EUADR-Association': 42, 'None-EUADR': 43, 'Indirect_conversion': 44, 'Non_conversion': 45}
=======================>positive_label
=======================>use_balanced_neg False
=======================>max_neg_scale 2
07/04/2024 14:00:42 - INFO - main - pos_label_ids
07/04/2024 14:00:42 - INFO - main - [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 16, 17, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]
[INFO|configuration_utils.py:652] 2024-07-04 14:00:42,153 >> loading configuration file pretrained_model/config.json
[INFO|configuration_utils.py:690] 2024-07-04 14:00:42,154 >> Model config BertConfig {
"_name_or_path": "pretrained_model",
"architectures": [
"BertForSequenceClassification"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"finetuning_task": "text-classification",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "None",
"1": "Association",
"2": "Bind",
"3": "Comparison",
"4": "Conversion",
"5": "Cotreatment",
"6": "Drug_Interaction",
"7": "Negative_Correlation",
"8": "Positive_Correlation",
"9": "None-CID",
"10": "CID",
"11": "None-PPIm",
"12": "PPIm",
"13": "None-AIMED",
"14": "None-DDI",
"15": "None-BC7",
"16": "None-phargkb",
"17": "None-GDA",
"18": "None-DISGENET",
"19": "None-EMU_BC",
"20": "None-EMU_PC",
"21": "None-HPRD50",
"22": "None-PHARMGKB",
"23": "ACTIVATOR",
"24": "AGONIST",
"25": "AGONIST-ACTIVATOR",
"26": "AGONIST-INHIBITOR",
"27": "ANTAGONIST",
"28": "DIRECT-REGULATOR",
"29": "INDIRECT-DOWNREGULATOR",
"30": "INDIRECT-UPREGULATOR",
"31": "INHIBITOR",
"32": "PART-OF",
"33": "PRODUCT-OF",
"34": "SUBSTRATE",
"35": "SUBSTRATE_PRODUCT-OF",
"36": "mechanism",
"37": "int",
"38": "effect",
"39": "advise",
"40": "AIMED-Association",
"41": "HPRD-Association",
"42": "EUADR-Association",
"43": "None-EUADR",
"44": "Indirect_conversion",
"45": "Non_conversion"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"label2id": {
"ACTIVATOR": 23,
"AGONIST": 24,
"AGONIST-ACTIVATOR": 25,
"AGONIST-INHIBITOR": 26,
"AIMED-Association": 40,
"ANTAGONIST": 27,
"Association": 1,
"Bind": 2,
"CID": 10,
"Comparison": 3,
"Conversion": 4,
"Cotreatment": 5,
"DIRECT-REGULATOR": 28,
"Drug_Interaction": 6,
"EUADR-Association": 42,
"HPRD-Association": 41,
"INDIRECT-DOWNREGULATOR": 29,
"INDIRECT-UPREGULATOR": 30,
"INHIBITOR": 31,
"Indirect_conversion": 44,
"Negative_Correlation": 7,
"Non_conversion": 45,
"None": 0,
"None-AIMED": 13,
"None-BC7": 15,
"None-CID": 9,
"None-DDI": 14,
"None-DISGENET": 18,
"None-EMU_BC": 19,
"None-EMU_PC": 20,
"None-EUADR": 43,
"None-GDA": 17,
"None-HPRD50": 21,
"None-PHARMGKB": 22,
"None-PPIm": 11,
"None-phargkb": 16,
"PART-OF": 32,
"PPIm": 12,
"PRODUCT-OF": 33,
"Positive_Correlation": 8,
"SUBSTRATE": 34,
"SUBSTRATE_PRODUCT-OF": 35,
"advise": 39,
"effect": 38,
"int": 37,
"mechanism": 36
},
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.18.0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 28933
}

[INFO|modeling_tf_utils.py:1776] 2024-07-04 14:00:42,177 >> loading weights file pretrained_model/tf_model.h5
/media/data/biorex/lib/python3.10/site-packages/keras/src/layers/layer.py:1331: UserWarning: Layer 'tf_bert_for_sequence_classification' looks like it has unbuilt state, but Keras is not able to trace the layer call() in order to build it automatically. Possible causes:

The call() method of your layer may be crashing. Try to __call__() the layer eagerly on some test input first to see if it works. E.g. x = np.random.random((3, 4)); y = layer(x)
If the call() method is correct, then you may need to implement the def build(self, input_shape) method on your layer. It should create all variables used by the layer (e.g. by calling layer.build() on all its children layers).
Exception encountered: ''Exception encountered when calling TFBertMainLayer.call().

'NoneType' object has no attribute 'shape'

Arguments received by TFBertMainLayer.call():
• input_ids=tf.Tensor(shape=(3, 5), dtype=int32)
• attention_mask=None
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• past_key_values=None
• use_cache=None
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False''
warnings.warn(
/media/data/biorex/lib/python3.10/site-packages/keras/src/layers/layer.py:372: UserWarning: build() was called on layer 'tf_bert_for_sequence_classification', however the layer does not have a build() method implemented and it looks like it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which may cause failures down the line. Make sure to implement a proper build() method.
warnings.warn(
Traceback (most recent call last):
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/run_ncbi_rel_exp.py", line 884, in
main()
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/run_ncbi_rel_exp.py", line 687, in main
model = TFAutoModelForSequenceClassification.from_pretrained(
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1803, in from_pretrained
model(model.dummy_inputs) # build the network with dummy inputs
File "/media/data/biorex/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 1633, in call
outputs = self.bert(
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 850, in call
encoder_outputs = self.encoder(
File "/media/data/biorex/lib/python3.10/site-packages/optree/ops.py", line 594, in tree_map
return treespec.unflatten(map(func, *flat_args))
AttributeError: Exception encountered when calling TFBertMainLayer.call().

'NoneType' object has no attribute 'shape'

Arguments received by TFBertMainLayer.call():
• input_ids=tf.Tensor(shape=(3, 5), dtype=int32)
• attention_mask=None
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• past_key_values=None
• use_cache=None
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
2024-07-04 14:00:46.035576: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:46.059361: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:46.059424: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:46.075918: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:47.150909: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 1557, in
dump_pred_2_pubtator_file(in_pubtator_file = in_test_pubtator_file,
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 206, in dump_pred_2_pubtator_file
add_relation_pairs_dict(
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 83, in add_relation_pairs_dict
testdf = pd.read_csv(in_gold_tsv_file, sep="\t", index_col=0)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1620, in init
self._engine = self._make_engine(f, self.engine)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
return mapping[engine](f, **self.options)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 581, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

Would you tell what should I do in dealing with this situation?

Can you proivde a script or an explanation to reproduce scores in your paper?

In the paper (https://arxiv.org/abs/2306.11189), you wrote the scores below. Can you kindly provide a way to reproduce this?

For example, with the model you provided in the repo BioREx PubMedBERT model (Original) and BioREx BioLinkBERT model (Preferred), what score can I get? And how can I get the score?

When I run with BioREx PubMedBERT model (Original) using the code you suggest bash scripts/run_test_pred.sh, I got

Overall 966 652 263 314 0.7125683060109289 0.6749482401656315 0.6932482721956407
in the file locaed in "out_result_file" parameter.
I think it would be precision, recall, f1 score, but then I am not sure I can get 79.6 in this case(BioRED+8 datasets in your paper).

If I misunderstood something, please let me know.
And again, if you provide a specific parameters to reproduce the scores in the paper (including the baseline approaches like TL(Transfer learning) or MTL(Multi-Task Learning), it would be great help for me as well.

run_biorex_exp.sh doesn't seem to use GPU

I tried to run the script but I don't think it is using a GPU. I stopped the code before this line (https://github.com/ncbi/BioREx/blob/0031c52d2dd0d7fb8e3b84e256d8fbf73c4bd463/src/tf_wrapper.py#L199C28-L200C1), and check it whether using GPU or not, but I got this.

In [16]: self.model.layers[0].variables[0].device
Out[16]: '/job:localhost/replica:0/task:0/device:CPU:0'

Can you kindly check the script or runfile to use GPU to train?

And I think it is better to change the static gpu variable in the script into dynamic one as you put in the other script. I also put that in the pull request(#4)

BioREx/scripts/run_biorex_exp.sh

Line 3 in 0031c52

cuda_visible_devices=0

cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory

how to solve this problem? i meet this error when i run the model.I need help!thanks!

ncbi / biorex Goto Github PK

biorex's People

Contributors

Stargazers

Watchers

Forkers

biorex's Issues

Request support on data input sample and output sample just for prediction

questions about run_test_pred.sh output

ImportError: cannot import name 'TFTrainer' from 'transformers' and cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory

Loaded 0 parameters in the TF 2.0 model. Some weights of the model were not used when initializing the TF 2.0 model TFBertForSequenceClassification

Some rows contain ''None" labels.

run_test_pred.sh can't run on CPU

Can you proivde a script or an explanation to reproduce scores in your paper?

run_biorex_exp.sh doesn't seem to use GPU

cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent