jacobandreas / geca Goto Github PK

View Code? Open in Web Editor NEW

41.0 41.0 6.0 486.65 MB

Python 74.00% Shell 26.00%

geca's People

Contributors

Stargazers

Watchers

Forkers

frankfan007 shanyas10 jingfengyang vernadankers leffletcher inbaroren

geca's Issues

library requirements

Hi there!
I was wondering if it could be possible to add a requirements.txt file in order to use the correct versions of the libraries.
It would definitely ease the installation of the libraries.
Thanks
:)
David

questions upon reproducing GECA on SCAN-dataset

Hello, Mr.Andreas,
First I'd like to thank u for this work and opening source code of GECA. Here I met up with a problem while trying reproducing GECA on SCAN-dataset.
I've got read the shell file (for example : exp/scan_jump/retrieval/run.sh), and let me just copy it :

#!/bin/sh

home="../../.."

for i in seq 0 9
do

python -u $home/compose.py
--dataset scan
--scan_data_dir $home/data/jda/SCAN
--dedup
--wug_size 1
--seed $i
--model_type retrieval
--compute_adjacency
--n_sample 1000
--write "composed.$i.json"
--nouse_trie
--max_comp_len 40
--max_adjacencies 1000
--TEST
> compose.$i.out 2> compose.$i.err

python -u $home/eval.py
--dataset scan
--seed $i
--scan_data_dir $home/data/jda/SCAN
--augment composed.$i.json
--dedup
--aug_ratio 0.3
--n_epochs 150
--n_enc 512
--sched_factor 0.5
--dropout 0.5
--lr 0.001
--notest_curve
--TEST
> eval.$i.out 2> eval.$i.err

done

I have 2 questions:
question#1:
it's clearly divided into 2 parts : one is to compose augmented data, another is to train and evaluate your seq2seq model on augmented training set. It seems that you were going to sample 1000 augmented data and write them into the composed file in the first part, but the composed file actually contains just around 400 augmented examples( for example, as the scan_jump case), could you please tell me why there is a mismatch :)

question#2:
I just use the augmented data already in the composed.$i.json file (cuz it would really take a long time to rerun and recompose data) and try to reproduce your result reported in "Good Enough Compostionally Data Augmentation".
I focus on SCAN-Jump_split.
I run 6 groups of experiments in total (each of them contains 10 individual experiments), and I got a average result for 83.23% which is slightly lower than 87% reported in the paper.
I am wondering if this is out of some improper hyperparams or any other reasons?

Thanks for reading this and looking for your reply!

Inquiry about code

Hi Jacob,

I'm thinking you are reusing Robin Jia's code for geoquery logic experiments, (e.g. code=../../../3p/jia/src
in https://github.com/jacobandreas/geca/blob/master/exp/semparse_geo_logic/query_retrieval_jia/run.sh ). I tried to use their code here https://worksheets.codalab.org/worksheets/0x50757a37779b485f89012e4ba03b6f4f/ . But it seems that you modified some of their code, right? For instance, there is no argument "--jda_augment " in their main.py scripts. Would you mind also sharing your modified code for this part?

Thanks,
Jingfeng

Dataset in "Good-Enough Compositional Data Augmentation"

Hi Jacob!

I'm really interested in your work "good-enough compositional data augmentation", and would love to try it out! :)

Is there a processed version of the dataset or command to reproduce the SCAN/NACS/LM dataset mentioned in the paper?

Thanks!!

Documentation for applying to new dataset

Hi, I'm interested in applying GECA to a new dataset - could you provide some brief documentation or examples on how I might augment an arbitrary list of utterances using your implementation? Thanks!

Seeking for suggestions on reproducing the experiment results.

Hi, I have some difficulty in reproducing the reported results in the paper. For example, when I directly run geca/exp/semparse_geo_sql/query_eval_retrieval/run.sh, the best exact match accuracy is 0.36 (reported result for sql-query+GECA is 0.49). When I run geca/exp/semparse_geo_sql/question_eval_retrieval/run.sh, the best exact match accuracy is 0.62 (reported result for sql-question+GECA is 0.68). I found that much augmented data in compose.i.json are not very good. Do you have any suggestions that I can improve the performance? Do I need to tune hyper-parameters in run.sh?

By the way, when I run baseline models using geca/exp/semparse_geo_sql/query_eval_baseline/run.sh and Jia's scripts, the best exact-match result is 0.67 after running with 10 random seeds, which is similar to the results in your paper: sql-question-seq2seq 0.68. Are reported result in your paper the best performance or the average performance after running with 10 random seeds?

Thanks for your help.

jacobandreas / geca Goto Github PK

geca's People

Contributors

Stargazers

Watchers

Forkers

geca's Issues

library requirements

questions upon reproducing GECA on SCAN-dataset

Inquiry about code

Dataset in "Good-Enough Compositional Data Augmentation"

Documentation for applying to new dataset

Seeking for suggestions on reproducing the experiment results.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent