clinc / oos-eval Goto Github PK

View Code? Open in Web Editor NEW

201.0 201.0 42.0 1.75 MB

Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)

License: Other

oos-eval's People

Contributors

Stargazers

Watchers

Forkers

zxlzr minhpqn mbkan pooryapzm andreaschandra rogervaas phoebussi mvyjayanth praveen-ait ai-marshal cchaplin wavymazy nprovotorov dunkelhaus forhomme burakakrishna parvez2017 prithvi-shah vohyz vi-sri sonali210 omidmnezami andersonmolter1 anatanick esrel zeroqiaoba aiswaryal wj-mcat amit-gh karthikayan4u nrkarthikeyan isabel-olmedo alvinphee asrivast13 sonujaved dragomirradev krojan aqhali econworm jbheeman crazyivanz bibhutibhusan89 wangwei1237

oos-eval's Issues

Clarification re results in Table 2

Hi,

Do the numbers in Table 2 for oos-train correspond to the test set? I am trying to replicate the BERT results (using transformers and datasets library) with the hyperparameters provided in this repo, but there's a near 10 point difference in the test set performance; however, my validation set performance is quite close to the numbers provided in the paper. It'd be great if you could clarify this.

Thanks,
Gaurav.

Question about threshold in paper

Hi,
In paper "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction", you made comparisons between three approaches: oos-train, oss-threshold and oss-binary.
But the paper did not clearly tell which threshold was used ? 0.5, 0.6, ....

Can you please give me that information ?
Rgds,

confusing about threshold setting

In our evaluation, the out-of-scope threshold was chosen to be the value which yielded the highest validation score across all intents, treating out-of-scope as its own intent.

I am a little confused by this sentence. Does it mean that we select oos's highest score on the known intention as the threshold in the validation set? If so, isn't oos's recall equal to 1 in each epoch of validation set, how do we early stop and select hyper-parameters?

Re-partitioning Data: Which worker wrote which query?

Hi! I want re-partition the dataset to create 5 different train/valid/test splits for my analyses. In the paper, you mention that all queries from a given crowd worker were place in a single split. Is it possible to share information about which queries were generated by the same worker? I'd like to minimize any in-scope biases in my splits as well.

Hyperparameter details for fine tuning bert-large in the oos-train setting

Hi,

I am using huggingface to fine-tune bert large on the CLINC dataset. I follow the hyperparameters mentioned in hyperparams.csv but there's ~3 point difference in inscope accuracy for the oos-train setting (93.49 v/s 96.9 for Full version of the dataset; similarly for the OOS-Plus setting). I am wondering if this is due to some HF defaults, for e.g., HF defaults to 1.0 for gradient clipping, I am not sure what did you use. Would it be possible to clarify a bit more about your fine-tuning process? It'd be very helpful.

Thanks,
Gaurav.

confusing about Table 3 in paper.

I'm confusing about Table 3 in paper.

What is the experimental process?

I guess the binary classifier (oos detector) is first trained on "binary_undersample.json" (or "binary_wiki_aug.json" in wiki aug experiment), to detection whether the utterances are "in" or "oos", then build downstream multi-classes classifier (e.g. 150 classes for in-scope data) to deal with "in" samples from upstream oos detector.

In-Scope Accuracy was evaluated on "test" in "data_oos_plus.json", and Out-of-Scope Recall was evaluated on "oos_test" in "data_oos_plus.json".

What are the 10 domains covered in this dataset?

Could not find the answer to this question in the documentation.
What are the different domains covered in the dataset?
Is there a domain - intent mapping, to extract the data of interest?

clinc / oos-eval Goto Github PK

oos-eval's People

Contributors

Stargazers

Watchers

Forkers

oos-eval's Issues

Clarification re results in Table 2

Question about threshold in paper

confusing about threshold setting

Re-partitioning Data: Which worker wrote which query?

Hyperparameter details for fine tuning bert-large in the oos-train setting

confusing about Table 3 in paper.

What are the 10 domains covered in this dataset?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent