Code Monkey home page Code Monkey logo

Comments (9)

wangxinyu0922 avatar wangxinyu0922 commented on September 23, 2024

I have just uploaded the ptb dataset on onedrive.

For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:

1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root

from ace.

woshiyyya avatar woshiyyya commented on September 23, 2024

Hi Xinyu,

Thanks for uploading the data!

I created a folder named data and put a train.tsv file with the demo case you provide.

Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order

But still got an error:

2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
  File "train.py", line 85, in <module>
    config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
    self.corpus: ListCorpus=self.get_corpus
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
    current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
    train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
    assert path_to_conll_file.exists()
AssertionError

Do you know how to fix that?

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on September 23, 2024

Have you checked whether the datasets is at the correct place?

from ace.

lizhou21 avatar lizhou21 commented on September 23, 2024

I have just uploaded the ptb dataset on onedrive.

For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:

1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root

Hi Xinyu,
Is there something wrong with the data format provided?
i just find, the code token = Token(fields[1], head_id=int(fields[6])) shows me ValueError: invalid literal for int() with base 10: '_'.

So I guess the 0-th column is token id,
the 1-th column is token,
the 2,3,4,5-th column is "",
the 6-th column is 0, (dummy tags)
the 7-th column is "
",
the 8-th column is "root", (dummy tags)
the 9-th column is "0:root", (dummy tags)

is that right?

from ace.

lizhou21 avatar lizhou21 commented on September 23, 2024

Hi Xinyu,

Thanks for uploading the data!

I created a folder named data and put a train.tsv file with the demo case you provide.

Run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order

But still got an error:

2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
  File "train.py", line 85, in <module>
    config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
    self.corpus: ListCorpus=self.get_corpus
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
    current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
    train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
    assert path_to_conll_file.exists()
AssertionError

Do you know how to fix that?

after I change the data format, I also face the same problem.
have you resolved it?

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on September 23, 2024

Hi Xinyu,
Thanks for uploading the data!
I created a folder named data and put a train.tsv file with the demo case you provide.
Run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:

2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
  File "train.py", line 85, in <module>
    config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
    self.corpus: ListCorpus=self.get_corpus
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
    current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
    train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
    assert path_to_conll_file.exists()
AssertionError

Do you know how to fix that?

after I change the data format, I also face the same problem. have you resolved it?

Have you ensured the path /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu exist? If not, you may download the data above and put them at this path.

from ace.

lizhou21 avatar lizhou21 commented on September 23, 2024

yes! I have done it! and I solve this problem, it also needs to have dev/test datasets in the target_dir.
But now I can parse the dataset with CPU(very slow), fail to run it with GPU set.

It shows me :

Traceback (most recent call last):
File "train.py", line 378, in
train_eval_result, train_loss = student.evaluate(loader,out_path=Path('outputs/train.'+'.'+tar_file_name+'.conllu'),embeddings_storage_mode="none",prediction_mode=True)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 1174, in evaluate
arc_scores, rel_scores = self.forward(batch, prediction_mode=prediction_mode)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 597, in forward
self.embeddings.embed(sentences,embedding_mask=self.selection)
File "/DM_parser/ACE/flair/embeddings.py", line 185, in embed
embedding.embed(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 97, in embed
self._add_embeddings_internal(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 2960, in _add_embeddings_internal
self._add_embeddings_to_sentences(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 3155, in _add_embeddings_to_sentences
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 753, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 68, in forward
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 178, in forward
inputs_embeds = self.word_embeddings(input_ids)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/functional.py", line 1484, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

I try to set
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)

into

sequence_output, pooled_output, hidden_states = self.model(input_ids.cuda(), attention_mask=mask.cuda(), inputs_embeds = inputs_embeds)

it also shows me the same question.

T T,

from ace.

wangxinyu0922 avatar wangxinyu0922 commented on September 23, 2024

You may try to uncomment these lines

ACE/train.py

Lines 226 to 238 in 7033e91

# if student.selection[idx] == 1:
# embedding.to(flair.device)
# if 'elmo' in embedding.name:
# # embedding.reset_elmo()
# # continue
# # pdb.set_trace()
# embedding.ee.elmo_bilm.cuda(device=embedding.ee.cuda_device)
# states=[x.to(flair.device) for x in embedding.ee.elmo_bilm._elmo_lstm._states]
# embedding.ee.elmo_bilm._elmo_lstm._states = states
# for idx in range(len(embedding.ee.elmo_bilm._elmo_lstm._states)):
# embedding.ee.elmo_bilm._elmo_lstm._states[idx]=embedding.ee.elmo_bilm._elmo_lstm._states[idx].to(flair.device)
# else:
embedding.to('cpu')

from ace.

lizhou21 avatar lizhou21 commented on September 23, 2024

You may try to uncomment these lines

ACE/train.py

Lines 226 to 238 in 7033e91

# if student.selection[idx] == 1:
# embedding.to(flair.device)
# if 'elmo' in embedding.name:
# # embedding.reset_elmo()
# # continue
# # pdb.set_trace()
# embedding.ee.elmo_bilm.cuda(device=embedding.ee.cuda_device)
# states=[x.to(flair.device) for x in embedding.ee.elmo_bilm._elmo_lstm._states]
# embedding.ee.elmo_bilm._elmo_lstm._states = states
# for idx in range(len(embedding.ee.elmo_bilm._elmo_lstm._states)):
# embedding.ee.elmo_bilm._elmo_lstm._states[idx]=embedding.ee.elmo_bilm._elmo_lstm._states[idx].to(flair.device)
# else:
embedding.to('cpu')

hi Xinyu, I have resolved the problem, and applied ACE to my data parsing successfully, thanks for your help.

from ace.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.