geca's People
geca's Issues
library requirements
Hi there!
I was wondering if it could be possible to add a requirements.txt file in order to use the correct versions of the libraries.
It would definitely ease the installation of the libraries.
Thanks
:)
David
questions upon reproducing GECA on SCAN-dataset
Hello, Mr.Andreas,
First I'd like to thank u for this work and opening source code of GECA. Here I met up with a problem while trying reproducing GECA on SCAN-dataset.
I've got read the shell file (for example : exp/scan_jump/retrieval/run.sh), and let me just copy it :
#!/bin/sh
home="../../.."
for i in seq 0 9
do
python -u $home/compose.py
--dataset scan
--scan_data_dir $home/data/jda/SCAN
--dedup
--wug_size 1
--seed $i
--model_type retrieval
--compute_adjacency
--n_sample 1000
--write "composed.$i.json"
--nouse_trie
--max_comp_len 40
--max_adjacencies 1000
--TEST
> compose.$i.out 2> compose.$i.err
python -u $home/eval.py
--dataset scan
--seed $i
--scan_data_dir $home/data/jda/SCAN
--augment composed.$i.json
--dedup
--aug_ratio 0.3
--n_epochs 150
--n_enc 512
--sched_factor 0.5
--dropout 0.5
--lr 0.001
--notest_curve
--TEST
> eval.$i.out 2> eval.$i.err
done
I have 2 questions:
question#1:
it's clearly divided into 2 parts : one is to compose augmented data, another is to train and evaluate your seq2seq model on augmented training set. It seems that you were going to sample 1000 augmented data and write them into the composed file in the first part, but the composed file actually contains just around 400 augmented examples( for example, as the scan_jump case), could you please tell me why there is a mismatch :)
question#2:
I just use the augmented data already in the composed.$i.json file (cuz it would really take a long time to rerun and recompose data) and try to reproduce your result reported in "Good Enough Compostionally Data Augmentation".
I focus on SCAN-Jump_split.
I run 6 groups of experiments in total (each of them contains 10 individual experiments), and I got a average result for 83.23% which is slightly lower than 87% reported in the paper.
I am wondering if this is out of some improper hyperparams or any other reasons?
Thanks for reading this and looking for your reply!
Inquiry about code
Hi Jacob,
I'm thinking you are reusing Robin Jia's code for geoquery logic experiments, (e.g. code=../../../3p/jia/src
in https://github.com/jacobandreas/geca/blob/master/exp/semparse_geo_logic/query_retrieval_jia/run.sh ). I tried to use their code here https://worksheets.codalab.org/worksheets/0x50757a37779b485f89012e4ba03b6f4f/ . But it seems that you modified some of their code, right? For instance, there is no argument "--jda_augment " in their main.py scripts. Would you mind also sharing your modified code for this part?
Thanks,
Jingfeng
Dataset in "Good-Enough Compositional Data Augmentation"
Hi Jacob!
I'm really interested in your work "good-enough compositional data augmentation", and would love to try it out! :)
Is there a processed version of the dataset or command to reproduce the SCAN/NACS/LM dataset mentioned in the paper?
Thanks!!
Documentation for applying to new dataset
Hi, I'm interested in applying GECA to a new dataset - could you provide some brief documentation or examples on how I might augment an arbitrary list of utterances using your implementation? Thanks!
Seeking for suggestions on reproducing the experiment results.
Hi, I have some difficulty in reproducing the reported results in the paper. For example, when I directly run geca/exp/semparse_geo_sql/query_eval_retrieval/run.sh, the best exact match accuracy is 0.36 (reported result for sql-query+GECA is 0.49). When I run geca/exp/semparse_geo_sql/question_eval_retrieval/run.sh, the best exact match accuracy is 0.62 (reported result for sql-question+GECA is 0.68). I found that much augmented data in compose.i.json are not very good. Do you have any suggestions that I can improve the performance? Do I need to tune hyper-parameters in run.sh?
By the way, when I run baseline models using geca/exp/semparse_geo_sql/query_eval_baseline/run.sh and Jia's scripts, the best exact-match result is 0.67 after running with 10 random seeds, which is similar to the results in your paper: sql-question-seq2seq 0.68. Are reported result in your paper the best performance or the average performance after running with 10 random seeds?
Thanks for your help.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.