Code Monkey home page Code Monkey logo

lr_gc_ood's Introduction

LR_GC_OOD

Code & Data for the AAAI 2020 Paper "Likelihood Ratios and Generative Classifiers For Unsupervised OOD Detection In Task-Based Dialog"

Data:
The ROSTD dataset of OOD points can be found under data/fbrelease
This tsv file contains ~4500 OOD examples. The 3rd field of each line contains the sentence.
This is the only field which could be of interest - the other fields are vestigial and can be ignored.

Note that this OOD dataset is a companion to the ID dataset released as part of the paper "Cross-lingual transfer learning for multilingual task oriented dialog" by Schuster et al at NAACL 2019.
This ID dataset can be found in its original form here.

Alternatively, you can directly use the splits we made (with ID train, and ID-OOD mixed validation and test) as described under the "Dataset Splits" section below.

Reference:
If you find our code or data useful, please consider citing our paper:

@article{gangal2019likelihood,
  title={Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog},
  author={Gangal, Varun and Arora, Abhinav and Einolghozati, Arash and Gupta, Sonal},
  journal={arXiv preprint arXiv:1912.12800},
  year={2019}
}

Contact:
For any questions or issues, either raise an issue here or drop an email at [email protected]

Code: [Under Progress]

Refer to requirements.txt for the python package requirements For other specifications, refer to other_specifications.txt

Code Structure and TLDR:

code/util.py: Contains most of the argument specifications. Ignore arguments or argument groups with an "IGNORE" comment on top of them

code/train.py: Contains the training and inference mechanism

code/model.py: Specifices architecture for most of the models e.g Discriminative Classifier, Generative Classifier etc

code/oodmetrics.py: Code for computing the ood-related metrics such as AUROC

Please ignore code/model_gan.py and code/wasserstein.py. They are not really used much for the paper experiments, but we have just retained them to not meddle with the imports.

Dataset Splits:

  • For fbrelease and fbreleasecoarse You can directly find the ready-to-use dataset splits under code/data/{dataset_name}/unsup/ for dataset_name = fbrelease / fbreleasecoarse
    This already contains the plain id train split and the id-ood mixed dev and test splits
    Note that only the ood part of the fbrelease dev and test splits constitutes our own released data. The rest is formed from existing datasets.
  • For atis and snips You will need to run some scripts to do random splitting where a fraction of classes are held out as OOD.
    The code/data/{dataset_name}/preprocess_{dataset_name}.sh needs to be run for this. (Where dataset_name = atis/snips)

Shell Scripts:

train_for_fbrelease.sh - Commands for fbrelease i.e ROSTD with its corresponding id training set and validation sets

train_for_fbreleasecoarse.sh - Commands for fbreleasecoarse i.e ROSTD with its corresponding id training set and validation sets, but with labels coarsened.

train_for_atis.sh - Commands for atis

train_for_snips.sh - Commands for snips

Notes:

  • In all of these scripts, you will need to set super_root to point to where the repo resides on your system. We need this because we use torchtext to preprocess, create the vocabulary, load and minibatch our datasets, and we could only get it to work with absolute path specifications.

lr_gc_ood's People

Contributors

vgtomahawk avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.