supplement materials for the paper DPWord2Vec: Better Representation of Design Patterns in Semantics
File Instructions
- crowdsourced corpus
- question_posts.txt, answer_post_ids.txt: format per line:
"Stack_Overflow_post_id<involved_design_pattern_1<involved_design_pattern_2< ..."
- question_posts.txt, answer_post_ids.txt: format per line:
- description corpus
- descriptions.txt: description of each design pattern, separated by "##%%&&"
- patterns.txt: corresponding design patterns, one design pattern per line
- design pattern - word pair dataset
- selected_patterns_words.txt: format per line:
"selected_design_pattern corresponding_word_1 corresponding_word_2 ... corresponding_word_40" - pattern_word_labels.txt: labels for each design pattern - word pair, 3 for "related", 2 for "somewhat related", and 1 for "unrelated", one design pattern and its 40 corresponding words per line
- selected_patterns_words.txt: format per line:
- design pattern and word vectors
- pats_vocab.txt: vocabulary of design patterns and words, tokens with the suffix "designpattern" denote design patterns
- vecs.txt: the design pattern and word vectors according to the order of pats_vocab.txt, the dimension is 100
- design pattern selection
- dp_selection_gof.txt, dp_selection_security.txt, dp_selection_douglass.txt: format per line: "design problem##%%&&the corresponding correct design pattern"
- pattern_des_gof.txt, pattern_des_security.txt, pattern_des_douglass.txt: descriptions of design patterns in each collection for design pattern selection, separated by "##%%&&"
- pattern_pats_gof.txt, pattern_pats_security.txt, pattern_pats_douglass.txt: corresponding design patterns to the descriptions above, one design pattern per line
- design pattern tag recommendation
- stackoverflow_posts.txt, softwareengineering_posts.txt: format per line:
"Stack_Overflow_or_Software_Engineering_post_id<design_pattern_tag_1<design_pattern_tag_2< ..." - stackoverflow_tags.txt, softwareengineering_tags.txt: format per line:
"design_pattern_tag<original_tag_1<original_tag_2< ..."
- stackoverflow_posts.txt, softwareengineering_posts.txt: format per line:
- code
- data_preparation_pairs.py: Python script for calculating the co-occurrence counts
- weight_cal.py: Python script for calculating the weights of the dp-word pairs
- dpword2vec_glove: C code of the GloVe model for building the word and design pattern vectors
Appendix