Code Monkey home page Code Monkey logo

geca's People

Contributors

jacobandreas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

geca's Issues

library requirements

Hi there!
I was wondering if it could be possible to add a requirements.txt file in order to use the correct versions of the libraries.
It would definitely ease the installation of the libraries.
Thanks
:)
David

questions upon reproducing GECA on SCAN-dataset

Hello, Mr.Andreas,
First I'd like to thank u for this work and opening source code of GECA. Here I met up with a problem while trying reproducing GECA on SCAN-dataset.
I've got read the shell file (for example : exp/scan_jump/retrieval/run.sh), and let me just copy it :

#!/bin/sh

home="../../.."

for i in seq 0 9
do

python -u $home/compose.py
--dataset scan
--scan_data_dir $home/data/jda/SCAN
--dedup
--wug_size 1
--seed $i
--model_type retrieval
--compute_adjacency
--n_sample 1000
--write "composed.$i.json"
--nouse_trie
--max_comp_len 40
--max_adjacencies 1000
--TEST
> compose.$i.out 2> compose.$i.err

python -u $home/eval.py
--dataset scan
--seed $i
--scan_data_dir $home/data/jda/SCAN
--augment composed.$i.json
--dedup
--aug_ratio 0.3
--n_epochs 150
--n_enc 512
--sched_factor 0.5
--dropout 0.5
--lr 0.001
--notest_curve
--TEST
> eval.$i.out 2> eval.$i.err

done

I have 2 questions:
question#1:
it's clearly divided into 2 parts : one is to compose augmented data, another is to train and evaluate your seq2seq model on augmented training set. It seems that you were going to sample 1000 augmented data and write them into the composed file in the first part, but the composed file actually contains just around 400 augmented examples( for example, as the scan_jump case), could you please tell me why there is a mismatch :)

question#2:
I just use the augmented data already in the composed.$i.json file (cuz it would really take a long time to rerun and recompose data) and try to reproduce your result reported in "Good Enough Compostionally Data Augmentation".
I focus on SCAN-Jump_split.
I run 6 groups of experiments in total (each of them contains 10 individual experiments), and I got a average result for 83.23% which is slightly lower than 87% reported in the paper.
I am wondering if this is out of some improper hyperparams or any other reasons?

Thanks for reading this and looking for your reply!

Inquiry about code

Hi Jacob,

I'm thinking you are reusing Robin Jia's code for geoquery logic experiments, (e.g. code=../../../3p/jia/src
in https://github.com/jacobandreas/geca/blob/master/exp/semparse_geo_logic/query_retrieval_jia/run.sh ). I tried to use their code here https://worksheets.codalab.org/worksheets/0x50757a37779b485f89012e4ba03b6f4f/ . But it seems that you modified some of their code, right? For instance, there is no argument "--jda_augment " in their main.py scripts. Would you mind also sharing your modified code for this part?

Thanks,
Jingfeng

Dataset in "Good-Enough Compositional Data Augmentation"

Hi Jacob!

I'm really interested in your work "good-enough compositional data augmentation", and would love to try it out! :)

Is there a processed version of the dataset or command to reproduce the SCAN/NACS/LM dataset mentioned in the paper?

Thanks!!

Documentation for applying to new dataset

Hi, I'm interested in applying GECA to a new dataset - could you provide some brief documentation or examples on how I might augment an arbitrary list of utterances using your implementation? Thanks!

Seeking for suggestions on reproducing the experiment results.

Hi, I have some difficulty in reproducing the reported results in the paper. For example, when I directly run geca/exp/semparse_geo_sql/query_eval_retrieval/run.sh, the best exact match accuracy is 0.36 (reported result for sql-query+GECA is 0.49). When I run geca/exp/semparse_geo_sql/question_eval_retrieval/run.sh, the best exact match accuracy is 0.62 (reported result for sql-question+GECA is 0.68). I found that much augmented data in compose.i.json are not very good. Do you have any suggestions that I can improve the performance? Do I need to tune hyper-parameters in run.sh?

By the way, when I run baseline models using geca/exp/semparse_geo_sql/query_eval_baseline/run.sh and Jia's scripts, the best exact-match result is 0.67 after running with 10 random seeds, which is similar to the results in your paper: sql-question-seq2seq 0.68. Are reported result in your paper the best performance or the average performance after running with 10 random seeds?

Thanks for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.