Comments (9)
I have just uploaded the ptb dataset on onedrive.
For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:
1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root
from ace.
Hi Xinyu,
Thanks for uploading the data!
I created a folder named data
and put a train.tsv
file with the demo case you provide.
Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:
2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
File "train.py", line 85, in <module>
config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
self.corpus: ListCorpus=self.get_corpus
File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
assert path_to_conll_file.exists()
AssertionError
Do you know how to fix that?
from ace.
Have you checked whether the datasets is at the correct place?
from ace.
I have just uploaded the ptb dataset on onedrive.
For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:
1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root 2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root 3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root 4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root 5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root 6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root 7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root 7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root 8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root 9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root 10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root 11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root 12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root
Hi Xinyu,
Is there something wrong with the data format provided?
i just find, the code token = Token(fields[1], head_id=int(fields[6])) shows me ValueError: invalid literal for int() with base 10: '_'.
So I guess the 0-th column is token id,
the 1-th column is token,
the 2,3,4,5-th column is "",
the 6-th column is 0, (dummy tags)
the 7-th column is "",
the 8-th column is "root", (dummy tags)
the 9-th column is "0:root", (dummy tags)
is that right?
from ace.
Hi Xinyu,
Thanks for uploading the data!
I created a folder named
data
and put atrain.tsv
file with the demo case you provide.Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:
2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified 2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu 2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu 2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu Traceback (most recent call last): File "train.py", line 85, in <module> config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict) File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__ self.corpus: ListCorpus=self.get_corpus File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target) File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__ train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True) File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__ assert path_to_conll_file.exists() AssertionError
Do you know how to fix that?
after I change the data format, I also face the same problem.
have you resolved it?
from ace.
Hi Xinyu,
Thanks for uploading the data!
I created a folder nameddata
and put atrain.tsv
file with the demo case you provide.
Run:CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified 2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu 2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu 2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu Traceback (most recent call last): File "train.py", line 85, in <module> config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict) File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__ self.corpus: ListCorpus=self.get_corpus File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target) File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__ train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True) File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__ assert path_to_conll_file.exists() AssertionError
Do you know how to fix that?
after I change the data format, I also face the same problem. have you resolved it?
Have you ensured the path /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
exist? If not, you may download the data above and put them at this path.
from ace.
yes! I have done it! and I solve this problem, it also needs to have dev/test datasets in the target_dir.
But now I can parse the dataset with CPU(very slow), fail to run it with GPU set.
It shows me :
Traceback (most recent call last):
File "train.py", line 378, in
train_eval_result, train_loss = student.evaluate(loader,out_path=Path('outputs/train.'+'.'+tar_file_name+'.conllu'),embeddings_storage_mode="none",prediction_mode=True)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 1174, in evaluate
arc_scores, rel_scores = self.forward(batch, prediction_mode=prediction_mode)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 597, in forward
self.embeddings.embed(sentences,embedding_mask=self.selection)
File "/DM_parser/ACE/flair/embeddings.py", line 185, in embed
embedding.embed(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 97, in embed
self._add_embeddings_internal(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 2960, in _add_embeddings_internal
self._add_embeddings_to_sentences(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 3155, in _add_embeddings_to_sentences
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 753, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 68, in forward
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 178, in forward
inputs_embeds = self.word_embeddings(input_ids)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/functional.py", line 1484, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select
I try to set
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
into
sequence_output, pooled_output, hidden_states = self.model(input_ids.cuda(), attention_mask=mask.cuda(), inputs_embeds = inputs_embeds)
it also shows me the same question.
T T,
from ace.
You may try to uncomment these lines
Lines 226 to 238 in 7033e91
from ace.
You may try to uncomment these lines
Lines 226 to 238 in 7033e91
hi Xinyu, I have resolved the problem, and applied ACE to my data parsing successfully, thanks for your help.
from ace.
Related Issues (20)
- Confuse about the SDP dataset HOT 2
- Is GPU required? HOT 4
- AttributeError: 'NoneType' object has no attribute 'tokenize' HOT 7
- Test result comparison without vs with document-level features HOT 18
- Embeddings Availability HOT 3
- Testing on my own data file HOT 1
- Using your best model for inference for NER in production HOT 4
- Missing B-ORG but with I-ORG and E-ORG HOT 2
- How to apply ACE model as a parser tool for DM parsing? HOT 3
- Change the directory of downloading HOT 1
- How many embeds are connected? HOT 4
- ERROR: No matching distribution found for nltk==7.1.2 HOT 3
- bdb.BdbQuit error while training conll 2003 HOT 1
- Error when loading custom dataset HOT 5
- Memory requirements for a new model HOT 3
- Segmentation Fault (core dumped) HOT 3
- Install fails all python versions tested 3.6 3.7 3.8 3.9 3.10 3.11. Repo Unusable.
- can you share the weight for semantic dependency parsing?
- AttributeError: 'FastSequenceTagger' object has no attribute 'selection'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ace.