clinc / oos-eval Goto Github PK
View Code? Open in Web Editor NEWRepository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)
License: Other
Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)
License: Other
Hi,
Do the numbers in Table 2 for oos-train correspond to the test set? I am trying to replicate the BERT results (using transformers
and datasets
library) with the hyperparameters provided in this repo, but there's a near 10 point difference in the test set performance; however, my validation set performance is quite close to the numbers provided in the paper. It'd be great if you could clarify this.
Thanks,
Gaurav.
Hi,
In paper "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction", you made comparisons between three approaches: oos-train, oss-threshold and oss-binary.
But the paper did not clearly tell which threshold was used ? 0.5, 0.6, ....
Can you please give me that information ?
Rgds,
In our evaluation, the out-of-scope threshold was chosen to be the value which yielded the highest validation score across all intents, treating out-of-scope as its own intent.
I am a little confused by this sentence. Does it mean that we select oos's highest score on the known intention as the threshold in the validation set? If so, isn't oos's recall equal to 1 in each epoch of validation set, how do we early stop and select hyper-parameters?
Hi! I want re-partition the dataset to create 5 different train/valid/test splits for my analyses. In the paper, you mention that all queries from a given crowd worker were place in a single split. Is it possible to share information about which queries were generated by the same worker? I'd like to minimize any in-scope biases in my splits as well.
Hi,
I am using huggingface to fine-tune bert large on the CLINC dataset. I follow the hyperparameters mentioned in hyperparams.csv
but there's ~3 point difference in inscope accuracy for the oos-train
setting (93.49 v/s 96.9 for Full version of the dataset; similarly for the OOS-Plus setting). I am wondering if this is due to some HF defaults, for e.g., HF defaults to 1.0 for gradient clipping, I am not sure what did you use. Would it be possible to clarify a bit more about your fine-tuning process? It'd be very helpful.
Thanks,
Gaurav.
I'm confusing about Table 3 in paper.
What is the experimental process?
I guess the binary classifier (oos detector) is first trained on "binary_undersample.json" (or "binary_wiki_aug.json" in wiki aug experiment), to detection whether the utterances are "in" or "oos", then build downstream multi-classes classifier (e.g. 150 classes for in-scope data) to deal with "in" samples from upstream oos detector.
In-Scope Accuracy was evaluated on "test" in "data_oos_plus.json", and Out-of-Scope Recall was evaluated on "oos_test" in "data_oos_plus.json".
Could not find the answer to this question in the documentation.
What are the different domains covered in the dataset?
Is there a domain - intent mapping, to extract the data of interest?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.