Thank you for publishing this work. I have a personal dataset (CoNLL-2012 format b

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hi, Converting hdf5 and hf models back and fort

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Fine-tuning on a personal dataset starting from "tli8hf/robertabase-structured-tuning-srl-conll2012" model or similar about structured_tuning_srl HOT 6 CLOSED

felgaet commented on June 24, 2024

Fine-tuning on a personal dataset starting from "tli8hf/robertabase-structured-tuning-srl-conll2012" model or similar

from structured_tuning_srl.

Comments (6)

t-li commented on June 24, 2024

Hi,

Thanks for your interest.

To start with, you can follow the steps of preprocessing in the readme file, and jump to the "2nd round of finetuning" section, and specify LOAD_FILE=tli8hf/robertabase-structured-tuning-srl-conll2012 in the bash. This will tell the training script to start from our model and continue training.

Also, note that --label_dict option is to be set for your dataset (the file will be dumped during the preprocessing step).

Let me know how that goes. I'd be more than happy to guide you through the process.

from structured_tuning_srl.

felgaet commented on June 24, 2024

Dear @t-li,
Thank you for your answer.

I would kindly ask you two questions:

If I jump to the "2nd round of fine-tuning" section and specify LOAD_FILE=tli8hf/robertabase-structured-tuning-srl-conll2012 in the bash, as here:

GPUID=0
DROP=0.5
LR=0.00001
EPOCH=5
PERC=1
LOSS=crf,unique_role,frame_role,overlap_role
LAMBD=1,1,1,0.1
SEED=1
LOAD_FILE=tli8hf/robertabase-structured-tuning-srl-conll2012
MODEL=model-output
python3 -u train.py --gpuid $GPUID --dir ./data/srl/ --train_data conll2012.train.hdf5 --val_data conll2012.val.hdf5 \
	--train_res conll2012.train.orig_tok_grouped.txt,conll2012.train.frame.hdf5,conll2012.frame_pool.hdf5 \
	--val_res conll2012.val.orig_tok_grouped.txt,conll2012.val.frame.hdf5,conll2012.frame_pool.hdf5 \
	--label_dict conll2012.label.dict \
	--bert_type roberta-base --loss $LOSS --epochs $EPOCH --learning_rate $LR --dropout $DROP --lambd $LAMBD \
	--percent $PERC --seed $SEED \
	--load $LOAD_FILE --conll_output ${MODEL} --save_file $MODEL | tee ${MODEL}.txt

I get this error:

FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'tli8hf/robertabase-structured-tuning-srl-conll2012.hdf5', errno = 2, error message = 'No such file or directory', flags = 
0, o_flags = 0)

The script appears to be looking for a local file with the extension ".hdf5". Is it possible to directly use the huggingface model? Or to convert the hf model to an hdf5?

I also have another question: is there a way to predict roleset_ids too?
I want to verify that the algorithm identifies the correct meaning in my dataset.

For example, in the sentence "I drink a glass of water", I would like to verify that the algorithm identifies the correct meaning,
that is
[ARG-0: I] [drink.01 drink] [ARG-1: a glass of water]
and not:
[ARG-0: I] [drink.02 drink] [ARG-1: a glass of water]

Is there a way to output also the predictions on the role_set_id?
By doing so, I could then compare them to my gold standard.

Currently, the evaluation's output is a table of precision, recall, and f1-score for arguments only. Is there a way to print the roleset_id predictions to a file as well?

Thank you for you help.

from structured_tuning_srl.

t-li commented on June 24, 2024

Hi,

Converting hdf5 and hf models back and forth can be tricky. Use this hdf5 model I just uploaded here
The roleset_ids are already part of the prediction. I referred to it as "frameset" in the readme (probably a misleading name).

If it is possible on your end, I recommend to start from preprocessing step to get frameset dumped. You will have a frameset.txt which list all suffices for each predicate, and this will participate in the model training and prediction.
In the modules/linear_classifier.py, the frame_layer outputs roleset_ids.
I am not sure if the perl eval script takes into account of roleset_ids directly (actually I think it doesn't since it's a argument-level F1). If it doesn't, you can use the frame_layer outputs to do some quick evaluations.

from structured_tuning_srl.

felgaet commented on June 24, 2024

Hi @t-li,
thanks for your reply.
unfortunately, the hdf5 model you sent me seems to include the dictionary, frames and label as well. Since I am using an extended/customized PropBank resource, your model seems to be incompatible (tensor size issues). I got around the problem by re-training on the extended resource (PropBank + my annotations).

Another question, I need to use another version of Roberta (HuggingFace allenai / biomed_roberta_base). To do so, is it enough to specify it in the training script ? (--bert_type allenai / biomed_roberta_base)

from structured_tuning_srl.

t-li commented on June 24, 2024

Sorry about the late reply. As long as it's in the huggingface hub, you can use it. And track the mentions of bert_type in the code, as there are few places that are hard-coded for roberta.

BTW, for timely interaction, please email me directly if you have urgent issues that need fixing.

from structured_tuning_srl.

felgaet commented on June 24, 2024

Thanks!

from structured_tuning_srl.

Fine-tuning on a personal dataset starting from "tli8hf/robertabase-structured-tuning-srl-conll2012" model or similar about structured_tuning_srl HOT 6 CLOSED

Comments (6)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent