Code Monkey home page Code Monkey logo

shammur / semeval2022task3 Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 1.0 2.16 MB

The PreTENS shared task hosted at SemEval 2022 aims at focusing on semantic competence with specific attention on the evaluation of language models with respect to the recognition of appropriate taxonomic relations between two nominal arguments (i.e. cases where one is a supercategory of the other, or in extensional terms, one denotes a superset of the other).

Home Page: https://sites.google.com/view/semeval2022-pretens

License: MIT License

Python 0.63% Jupyter Notebook 99.37%
languagemodel interpretability semantic competences dataset deep-learning semeval

semeval2022task3's Introduction

SemEval 2022 Task 3

Presupposed Taxonomies: Evaluating Neural Network Semantics (PreTENS)

โš  New Notice: Evaluation Phase has now ended, please check the data folder for test data (with labels and sentence Construction). Updated dates are available in the task website. [Old News] CodaLab links are given below.

Paper Submission Rules and Deadlines

Will be updated soon

Submission Rules

  • Maximum Submission: 3 result submissions per subtask
  • Ranking: Two ranking per subtask - Per Language Ranking and Global Ranking
  • What results will be displayed/used in the LeaderBoard: All the measures given in baseline script (Precision, Recall, F1 and F1-macro for subtask1 and Rho for subtask2) will be shown, but the final ranking will be based on macro F1 and Rho.
  • The naming convention for submission file: The result/submission file will be tab separated (with headers: ID \t Labels/Score), named as answer.tsv and then compressed to a zip file with naming convention: <teamName_subtaskX_submissionNo.zip>, X={1,2} and No={1,2,3}
  • Results selected to display in Leaderboard: Each team will have 3 chances (per task) and from there they can choose which results to submit in the leaderboard. However, each team must submit at least one result in the board (they can change the selected entry to show anytime during competition). This is mainly given so participants attempting just selected language are not penalized by the global-ranking score mechanism.

Tasks

PreTENS includes the two following sub-tasks:

  • a binary classification: Predicting the acceptability of sentences (A (1) vs UA (0))
  • a regression task: Predicting the degree of Acceptance in a seven Likert-scale

Data

The data comprise of sentences in 3 languages: English, Italian, and French.

For each sub-task and each language:

  • The dataset will be split into training and test set
  • Additionally, a trail data (a small subset of training set) is released to give participants a proper idea of the data and expected formats.

For the binary-classification sub-task, the training and test set will be composed by ~5,000 and 23,000 samples, respectively; For the regression sub-task, ~500 sentences will be provided for the training set and a bigger for the test set.

Sample/Trail data for Evaluation Campaign: data/trail

Data Format:

ID Sentence LABELS/SCORE

where LABEL is for binary classification task and SCORE is for the regression task. SCORE: represent average of the assigned score (1-7) given by the annotator. Details of scales and agreements will be elaborated/updated later. The LABEL (1/0) is assigned based on the regression score.

TEST DATA with scores/labels are now Available.

The folder <data/test/official_test_set_with_labels> now includes test file for each subtasks for all the three languages with the labels/scores in addition to the constructions the sentences belongs to.

File Format: Subtask1

e.g.

ID Construction Sentence Labels

en_0 drather I would rather have Chianti than water . 1

Here constructions are: 'andtoo', 'butnot', 'comparatives', 'drather', 'except', 'generally', 'particular', 'prefer', 'type', 'unlike'

File Format: Subtask2

e.g,

ID Construction Sentence Scores

en_0 comparatives I like governors more than farmers. 5.83

Here constructions are:

'andtoo', 'butnot', 'comparatives', 'ingeneral', 'particular', 'type', 'unlike'

Evaluation Measures

The official evaluation metrics for the Classification tasks are: Precision, Recall, F1-measure and macro F-measure (See the sub-Task1 starter code for more details)

As for the Regression, we opt for MSE, RMSE and Spearman Correlation (rho) (See the sub-Task2 starter code for more details)

โš  NOTICE: For each sub-task a separate baseline is defined: i) for the binary classification sub-task baseline, a Linear Support Vector classifier using n-grams (up to three) as input features is used, and ii) as for the regression sub-task, a baseline using a Linear Support Vector regressor with the same n-grams features is provided. Participants can run the evaluation system and obtain the results by using different cross-validation configurations on the training set. Due to the presence in the official test-set of additional constructions with the same presuppositional constraints, we have found that applying the baseline methods on the official test-set yields results that are from 10% to 20% lower than the training set. This highlights the importance of achieving a great deal of syntactic generality on this task. For this reason we encourage to test different cross-validation configurations on the training set.

To get our participant started with the Task, we provide baseline scripts showing how the data is processed, splited and in the end -- evaluated for the said task.

Below are the baseline and starter code:

Subtask1: https://colab.research.google.com/drive/1wDFQnEfMkoJY99Bmv-CfsTsdwleCDg2f?usp=sharing

Subtask2: https://colab.research.google.com/drive/18KwrdyTsp3wOPcaB7pyFnqOSc3Te7p-X?usp=sharing

You can also find the necessary codes in this git repository (SemEval_Task3_Baseline_subtask1.ipynb and SemEval_Task3_Baseline_subtask2.ipynb)

License

MIT

Useful links

Task Website

Participants Registration Form

Evaluation Platforms:

[Subtask1] (https://codalab.lisn.upsaclay.fr/competitions/1292)

[Subtask2] (https://codalab.lisn.upsaclay.fr/competitions/1290)

mailinglist: [email protected]

Organizers

Shammur Absar Chowdhury - Qatar Computing Research Institute, HBKU, Qatar

Dominique Brunato - Institute for Computational Linguistics "A. Zampolli" (CNR), Pisa, Italy

Cristiano Chesi - University School for Advanced Studies (IUSS), Pavia, Italy

Felice Dell'Orletta - Institute for Computational Linguistics "A. Zampolli" (CNR), Pisa, Italy

Simonetta Montemagni - Institute for Computational Linguistics "A. Zampolli" (CNR), Pisa, Italy

Giulia Venturi - Institute for Computational Linguistics "A. Zampolli" (CNR), Pisa, Italy

Roberto Zamparelli - Department of Psychology and Cognitive Science - University of Trento, Italy

For any queries: Contact: [email protected]

semeval2022task3's People

Contributors

shammur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yerkesoul

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.