Code Monkey home page Code Monkey logo

myxnli's Introduction

myXNLI - Myanmar Natural Language Inference Corpus

Natural Language Inference (NLI) is an NLP task that requires recognising whether there is a logical entailment or contradiction between two natural language statements, or the lack thereof.

The Cross-lingual Natural Language Inference Corpus (XNLI) already provides NLI benchmarking data in 15 other languages. The myXNLI corpus extends XNLI with Myanmar (Burmese) language.

For myXNLI, we human-translated all 7,500 sentence pairs from XNLI English dev and test sets into Myanmar. The NLI and Genre labels from English dev and test sets are also reused for the Myanmar datasets.

The dataset also includes the NLI training data in Myanmar which is created by machine-translating the MultiNLI training data from English into Myanmar. Similar to XNLI, we also reuse the existing NLI and Genre labels for English training data for the Myanmar version.

In addition to the NLI dev, test and training datasets, we also appended Myanmar translations to the XNLI 15-language parallel corpus, to create a 16-language parallel corpus.

Publications

Myanmar XNLI: Building a Dataset and Exploring Low-resource Approaches to Natural Language Inference with Myanmar https://www.researchsquare.com/article/rs-4329843/

Downloads

  • Myanmar NLI Test Dataset - 5010 records (tsv)
  • Myanmar NLI Validation Dataset - 1490 records (tsv)
  • Myannmar NLI Training Data - 392,702 records (tsv.gz)
  • Parallel Corpus in 16 Languages (ar, bg, de, el, en, es, fr, hi, my, ru, sw, th, tr, ur, vi, zh) (tsv)
  • HuggingFace dataset

This dataset is licensed under Creative Commons Attribution-NonCommercial

Myanmar NLI File Format

Sentence-1 (Premise) Sentence-2 (Hypothesis) Label Genre
သင် ဒီမှာ‌နေစရာ မလိုပါဘူး။ မင်း ထွက်သွားနိုင်တယ်။ Entailment face-to-face
You don’t have to stay there. You can leave.
သင် ဒီမှာ‌နေစရာ မလိုပါဘူး။ မင်း သွားချင်ရင် အိမ်ကို သွားနိုင်တယ်။ Neutral face-to-face
You don’t have to stay there. You can go home if you want to.
သင် ဒီမှာ‌နေစရာ မလိုပါဘူး။ မင်း အဲ့ဒီနေရာအတိအကျမှာ နေဖို့လိုတယ်။ Contradiction face-to-face
You don’t have to stay there. You need to stay in that exact spot!

Myanmar Translation File Format

Under translation folder, there are 100 files, each containing 100 blocks. Each block has a block number, an English sentence and a placeholder for Myanmar Translation. An example entry in a translation file is described below.

114
We were watching something on TV.
ကျွန်တော်တို့ တီဗီမှာ တခုခု ကြည့်နေခဲ့သည်။
# REVIEW
# This is a comment explaining details about the problem.

The first line makes a reference to the line number of the English sentence in the XNLI corpus.

The second line contains the actual English sentence to be translated.

The third line is reserved for Myanmar translation of the English sentence.

Additional and optional lines for human translator notes are also allowed with a hash prefix (#). This is useful for flagging translations that require review or documenting any observations made during translation.

Lastly, each entry in the file is separated by a blank line followed by another entry.

The translation revision was carried out in a private git repo, but final revised translation files have been imported into this (myXNLI) repo.

Acknowledgements

Each phase of myXNLI dataset development is contributed by the following volunteers.

Phase 1 - Core Translation Team

  • Aung Kyaw Htet
  • Aye Mya Hlaing
  • Hsu Myat Mo
  • Win Pa Pa
  • Yi Mon Shwe Sin

Phase 1 - Extended Translation Team

  • Aye Nyein Mon
  • Ei Myat Myat Noe
  • Hay Mar Soe Naing
  • Hnin Nandar Zaw
  • Myint Myint Wai
  • Wai Lai Lai Phyu
  • Yadanar Oo
  • Zaw Mee

Phase 2 - Translation Revision Team

  • Aung Kyaw Htet
  • Htoo Htet Aung
  • Junie Soe
  • Thar Htet
  • Thein Aung Tan
  • Thidar Nwe
  • Thiha Kyaw Zaw
  • Yair Pike
  • Yi Sandi Soe
  • 2 Freelancers with the financial support from Macquarie University

Sample Relabeling

  • Htet Cho
  • Lin Thurein Tun
  • Thein Than Phyo
  • Zay Ye Htut

myxnli's People

Contributors

yimonss avatar akhtet avatar hsummo avatar willphyu avatar thihakyawzaw avatar yairpike avatar tatan666 avatar amhlaing avatar ayenyein88 avatar yadanaroo17 avatar winnlp avatar haymarsoenaing avatar nandar-04 avatar eimyatmyatnoe avatar sitnaing83 avatar myint-wai-12 avatar zawmee avatar

Stargazers

 avatar Pyae Phyo avatar Yang Gao avatar  avatar Than Lwin Aung avatar Pyae Phyo Zaw avatar Linus Walker avatar Zin Lin Hain avatar Zaw Naing Oo avatar  avatar Sai Wai Hlyan Tun avatar Than Htut Zaw avatar PipCodex avatar Wai Yan Min Aung avatar KHUN LOUN ZAI avatar Waii avatar Kyaw Zayar Tun avatar Sai Horm Kham avatar Thet Myoe Khaing avatar Chit Swe avatar Aung Ko Min avatar EveLyne avatar  avatar Myo Naing Winn avatar Phyu Phyu Thin avatar alphauser  avatar  avatar Khun Htetz Naing avatar Sann Lynn Htun avatar Si Thu Phyo avatar sawissac-d3-sg avatar Sithu Ye Htun (Leo) avatar Thiha avatar Wut Hmone Hnin Hlaing aka Alex ~ avatar Min Khant Maung Maung avatar Ent Bhone Myint Mo avatar Cody avatar  avatar omega avatar Nay Tun Thein avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.