Code Monkey home page Code Monkey logo

word_ordering's People

Contributors

allenschmaltz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

word_ordering's Issues

create_dataset results in empty sequence and fail to continue

I am using python 2.7.18, nltk 3.4.5 (also tried 3.2.4, the oldest copy i can find on the web).
error occur before running java so i am skipping it for now.

i am following dataset_creation.txt from dataset/preprocessing folder and everything went well until the following command: bash create_dataset.sh

Currently processing 0 in train
Currently processing 1000 in train
Currently processing 2000 in train
Currently processing 3000 in train
Currently processing 4000 in train
Currently processing 5000 in train
Currently processing 6000 in train
Currently processing 7000 in train
Currently processing 8000 in train
Currently processing 9000 in train
Currently processing 10000 in train
Currently processing 11000 in train
Currently processing 12000 in train
Currently processing 13000 in train
Currently processing 14000 in train
Currently processing 15000 in train
Currently processing 16000 in train
Currently processing 17000 in train
Currently processing 18000 in train
Currently processing 19000 in train
Currently processing 20000 in train
Currently processing 21000 in train
Currently processing 22000 in train
Currently processing 23000 in train
Currently processing 24000 in train
Currently processing 25000 in train
Currently processing 26000 in train
Currently processing 27000 in train
Currently processing 28000 in train
Currently processing 29000 in train
Currently processing 30000 in train
Currently processing 31000 in train
Currently processing 32000 in train
Currently processing 33000 in train
Currently processing 34000 in train
Currently processing 35000 in train
Currently processing 36000 in train
Currently processing 37000 in train
Currently processing 38000 in train
Currently processing 39000 in train
Currently processing 0 in valid
Currently processing 1000 in valid
Currently processing 0 in test
Currently processing 1000 in test
Currently processing 2000 in test
Train
Total bag size: 672894, Average size of item in bag 1.411720
Base NP count: 228399, Average Base NP length: 2.212983, Token count: 949938, Sentence count: 39832
Valid
Base NP count: 9536, Average Base NP length: 2.273700, Token count: 40104, Sentence count: 1700
Test
Base NP count: 13457, Average Base NP length: 2.192465, Token count: 56674, Sentence count: 2416
split_fileid: wsj_02_dep.txt'
[]
['Hani', 'Zayadi', 'was', 'appointed', 'president', 'and', 'chief', 'executive', 'officer', 'of', 'this', 'financially', 'troubled', 'department', 'store', 'chain', ',', 'effective', 'Nov.', '15', ',', 'succeeding', 'Frank', 'Robertson', ',', 'who', 'is', 'retiring', 'early', '.']
/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py(470)filter_tree()
-> renumbered_filtered_collapsed_tree = collapse_npsyms(list(renumbered_filtered_collapsed_tree), list(wsj_filtered_tokens_one_sent_npsyms), ctr)
(Pdb)

i looked around and it seems the problem is that the renumbered_filtered_collapsed_tree is an empty list. tracing back further, filtered_collapsed_tree is also empty. And the filtered_token at line 375 is returning None.

If i type continue and let it goes on at the pdb, it results in assertion error, as shown below:
(Pdb) continue Traceback (most recent call last): File "ptb_to_word_ordering_dataset.py", line 42, in <module> sys.exit(main(sys.argv[1:])) File "ptb_to_word_ordering_dataset.py", line 35, in main save_dependency_trees(train_words, valid_words, test_words, train_bnps, valid_bnps, test_bnps, dependency_dir, filtered_dependency_dir, True) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 528, in save_dependency_trees full_trees, filtered_trees = get_filtered_dependency_trees(dependency_dir, split_label, words, bnps) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 505, in get_filtered_dependency_trees filter_tree_result, lines_with_issues = filter_tree(full_tree, wsj_filtered_tokens[ctr], ctr, wsj_filtered_tokens_npsyms[ctr]) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 470, in filter_tree renumbered_filtered_collapsed_tree = collapse_npsyms(list(renumbered_filtered_collapsed_tree), list(wsj_filtered_tokens_one_sent_npsyms), ctr) File "/media/colman/testmount/projects/trans/word_ordering/data/preprocessing/collapse_dependency_trees_based_on_bnps.py", line 234, in collapse_npsyms assert number_of_roots == 1, renumbered_filtered_collapsed_tree AssertionError: []
and subsequently no file can be saved for further preprocessing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.