The word_ordering from allenschmaltz

create_dataset results in empty sequence and fail to continue

I am using python 2.7.18, nltk 3.4.5 (also tried 3.2.4, the oldest copy i can find on the web).
error occur before running java so i am skipping it for now.

i am following dataset_creation.txt from dataset/preprocessing folder and everything went well until the following command: bash create_dataset.sh

Currently processing 0 in train
Currently processing 1000 in train
Currently processing 2000 in train
Currently processing 3000 in train
Currently processing 4000 in train
Currently processing 5000 in train
Currently processing 6000 in train
Currently processing 7000 in train
Currently processing 8000 in train
Currently processing 9000 in train
Currently processing 10000 in train
Currently processing 11000 in train
Currently processing 12000 in train
Currently processing 13000 in train
Currently processing 14000 in train
Currently processing 15000 in train
Currently processing 16000 in train
Currently processing 17000 in train
Currently processing 18000 in train
Currently processing 19000 in train
Currently processing 20000 in train
Currently processing 21000 in train
Currently processing 22000 in train
Currently processing 23000 in train
Currently processing 24000 in train
Currently processing 25000 in train
Currently processing 26000 in train
Currently processing 27000 in train
Currently processing 28000 in train
Currently processing 29000 in train
Currently processing 30000 in train
Currently processing 31000 in train
Currently processing 32000 in train
Currently processing 33000 in train
Currently processing 34000 in train
Currently processing 35000 in train
Currently processing 36000 in train
Currently processing 37000 in train
Currently processing 38000 in train
Currently processing 39000 in train
Currently processing 0 in valid
Currently processing 1000 in valid
Currently processing 0 in test
Currently processing 1000 in test
Currently processing 2000 in test
Train
Total bag size: 672894, Average size of item in bag 1.411720
Base NP count: 228399, Average Base NP length: 2.212983, Token count: 949938, Sentence count: 39832
Valid
Base NP count: 9536, Average Base NP length: 2.273700, Token count: 40104, Sentence count: 1700
Test
Base NP count: 13457, Average Base NP length: 2.192465, Token count: 56674, Sentence count: 2416
split_fileid: wsj_02_dep.txt'
[]
['Hani', 'Zayadi', 'was', 'appointed', 'president', 'and', 'chief', 'executive', 'officer', 'of', 'this', 'financially', 'troubled', 'department', 'store', 'chain', ',', 'effective', 'Nov.', '15', ',', 'succeeding', 'Frank', 'Robertson', ',', 'who', 'is', 'retiring', 'early', '.']
/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py(470)filter_tree()
-> renumbered_filtered_collapsed_tree = collapse_npsyms(list(renumbered_filtered_collapsed_tree), list(wsj_filtered_tokens_one_sent_npsyms), ctr)
(Pdb)

i looked around and it seems the problem is that the renumbered_filtered_collapsed_tree is an empty list. tracing back further, filtered_collapsed_tree is also empty. And the filtered_token at line 375 is returning None.

If i type continue and let it goes on at the pdb, it results in assertion error, as shown below:
(Pdb) continue Traceback (most recent call last): File "ptb_to_word_ordering_dataset.py", line 42, in <module> sys.exit(main(sys.argv[1:])) File "ptb_to_word_ordering_dataset.py", line 35, in main save_dependency_trees(train_words, valid_words, test_words, train_bnps, valid_bnps, test_bnps, dependency_dir, filtered_dependency_dir, True) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 528, in save_dependency_trees full_trees, filtered_trees = get_filtered_dependency_trees(dependency_dir, split_label, words, bnps) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 505, in get_filtered_dependency_trees filter_tree_result, lines_with_issues = filter_tree(full_tree, wsj_filtered_tokens[ctr], ctr, wsj_filtered_tokens_npsyms[ctr]) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 470, in filter_tree renumbered_filtered_collapsed_tree = collapse_npsyms(list(renumbered_filtered_collapsed_tree), list(wsj_filtered_tokens_one_sent_npsyms), ctr) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 234, in collapse_npsyms assert number_of_roots == 1, renumbered_filtered_collapsed_tree AssertionError: []
and subsequently no file can be saved for further preprocessing.

Encountering an error in dataset creation

On running the 4th step of dataset creation, I'm encountering the following error:
No such file or directory: '../../datasets/zgen_data/test_ref.txt'

allenschmaltz / word_ordering Goto Github PK

word_ordering's People

Contributors

Stargazers

Watchers

Forkers

word_ordering's Issues

create_dataset results in empty sequence and fail to continue

Encountering an error in dataset creation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent