allenschmaltz / word_ordering Goto Github PK
View Code? Open in Web Editor NEWThis repository includes code for replicating the results in the paper "Word Ordering Without Syntax" (2016).
This repository includes code for replicating the results in the paper "Word Ordering Without Syntax" (2016).
I am using python 2.7.18, nltk 3.4.5 (also tried 3.2.4, the oldest copy i can find on the web).
error occur before running java so i am skipping it for now.
i am following dataset_creation.txt from dataset/preprocessing folder and everything went well until the following command: bash create_dataset.sh
Currently processing 0 in train
Currently processing 1000 in train
Currently processing 2000 in train
Currently processing 3000 in train
Currently processing 4000 in train
Currently processing 5000 in train
Currently processing 6000 in train
Currently processing 7000 in train
Currently processing 8000 in train
Currently processing 9000 in train
Currently processing 10000 in train
Currently processing 11000 in train
Currently processing 12000 in train
Currently processing 13000 in train
Currently processing 14000 in train
Currently processing 15000 in train
Currently processing 16000 in train
Currently processing 17000 in train
Currently processing 18000 in train
Currently processing 19000 in train
Currently processing 20000 in train
Currently processing 21000 in train
Currently processing 22000 in train
Currently processing 23000 in train
Currently processing 24000 in train
Currently processing 25000 in train
Currently processing 26000 in train
Currently processing 27000 in train
Currently processing 28000 in train
Currently processing 29000 in train
Currently processing 30000 in train
Currently processing 31000 in train
Currently processing 32000 in train
Currently processing 33000 in train
Currently processing 34000 in train
Currently processing 35000 in train
Currently processing 36000 in train
Currently processing 37000 in train
Currently processing 38000 in train
Currently processing 39000 in train
Currently processing 0 in valid
Currently processing 1000 in valid
Currently processing 0 in test
Currently processing 1000 in test
Currently processing 2000 in test
Train
Total bag size: 672894, Average size of item in bag 1.411720
Base NP count: 228399, Average Base NP length: 2.212983, Token count: 949938, Sentence count: 39832
Valid
Base NP count: 9536, Average Base NP length: 2.273700, Token count: 40104, Sentence count: 1700
Test
Base NP count: 13457, Average Base NP length: 2.192465, Token count: 56674, Sentence count: 2416
split_fileid: wsj_02_dep.txt'
[]
['Hani', 'Zayadi', 'was', 'appointed', 'president', 'and', 'chief', 'executive', 'officer', 'of', 'this', 'financially', 'troubled', 'department', 'store', 'chain', ',', 'effective', 'Nov.', '15', ',', 'succeeding', 'Frank', 'Robertson', ',', 'who', 'is', 'retiring', 'early', '.']
/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py(470)filter_tree()
-> renumbered_filtered_collapsed_tree = collapse_npsyms(list(renumbered_filtered_collapsed_tree), list(wsj_filtered_tokens_one_sent_npsyms), ctr)
(Pdb)
i looked around and it seems the problem is that the renumbered_filtered_collapsed_tree is an empty list. tracing back further, filtered_collapsed_tree is also empty. And the filtered_token at line 375 is returning None.
If i type continue and let it goes on at the pdb, it results in assertion error, as shown below:
(Pdb) continue Traceback (most recent call last): File "ptb_to_word_ordering_dataset.py", line 42, in <module> sys.exit(main(sys.argv[1:])) File "ptb_to_word_ordering_dataset.py", line 35, in main save_dependency_trees(train_words, valid_words, test_words, train_bnps, valid_bnps, test_bnps, dependency_dir, filtered_dependency_dir, True) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 528, in save_dependency_trees full_trees, filtered_trees = get_filtered_dependency_trees(dependency_dir, split_label, words, bnps) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 505, in get_filtered_dependency_trees filter_tree_result, lines_with_issues = filter_tree(full_tree, wsj_filtered_tokens[ctr], ctr, wsj_filtered_tokens_npsyms[ctr]) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 470, in filter_tree renumbered_filtered_collapsed_tree = collapse_npsyms(list(renumbered_filtered_collapsed_tree), list(wsj_filtered_tokens_one_sent_npsyms), ctr) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 234, in collapse_npsyms assert number_of_roots == 1, renumbered_filtered_collapsed_tree AssertionError: []
and subsequently no file can be saved for further preprocessing.
On running the 4th step of dataset creation, I'm encountering the following error:
No such file or directory: '../../datasets/zgen_data/test_ref.txt'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.