Code Monkey home page Code Monkey logo

absadatasets's Introduction

ABSA datasets for PyABSA

total views total views per week total clones total clones per week

Contribute (prepare) your dataset in using PyABSA

We hope you can share your custom dataset or an available public dataset. If you want to, follow these steps:

Important: Rename your dataset filename before use it in PyABSA

Although the integrated datasets have no ids, it is recommended to assign an id for your dataset. While merge your datasets into ABSADatasets, please keep the id remained.

  • APC dataset name should be {id}.{dataset name}, and the dataset files should be named in {dataset name}.{type}.dat.atepc e.g.,
datasets
├── 101.restaurant
│    ├── restaurant.train.dat  # train_dataset
│    ├── restaurant.test.dat  # test_dataset
│    └── restaurant.valid.dat  # valid_dataset, dev set are not recognized in PyASBA, please rename dev-set to valid-set
└── others
  • ATEPC dataset files should be {id}.{dataset name}.{type}.dat.atepc, e.g.,
datasets
├── 101.restaurant
│    ├── restaurant.train.dat.atepc  # train_dataset
│    ├── restaurant.test.dat.atepc  # test_dataset
│    └── restaurant.valid.dat.atepc  # valid_dataset, dev set are not recognized in PyASBA, please rename dev-set to valid-set
└── others

I prepare a demo custom APC/ATEPC dataset which is based on Yelp dataset, if you got problem in dataset rename, please feed your data into the prepared dataset folds. Check datasets/apc_datasets/100.CustomDataset and datasets/atepc_datasets/100.CustomDatasetto view or rewrite the custom dataset.

Then, use the {id}.{dataset name} to locate your dataset, e.g.,

from pyabsa.functional import APCConfigManager
from pyabsa.functional import Trainer
from autocuda import auto_cuda

config = APCConfigManager.get_apc_config_english() # APC task
dataset = '101.restaurant' 
# dataset = '100.CustomDataset'
Trainer(config=config,
        dataset=dataset,  # train set and test set will be automatically detected
        checkpoint_save_mode=1,
        auto_device=auto_cuda()  # automatic choose CUDA or CPU
        )

It will avoid some potential problem (e.g., duplicated dataset name) while PyABSA detects the dataset.

Dataset Processing

  • Format your APC dataset according to our dataset format. (Recommended. Once you finished this step, we can help you to finish other steps) image

  • Generate the inference dataset for APC / ATEPC task (Optional. The example is available here)

  • Convert the APC dataset to ATEPC dataset, and move the transformed ATEPC datasets from apc_dataset to corresponding atepc_datasets. (Optional. The example is available here )

  • Register your dataset in PyABSA. (Optional. Register here)

Annotate Your Dataset

  • A Stand-alone browser based tool to help process data for the training set. here image1

    Once data saved, 3 files will be created:

    1. a CSV file training set for classic sentiment analysis
    2. a TXT file training set for PyABSA
    3. a JSON file for saving unfinished work

Notice

All datasets provided are for research only, we do not hold any Copyright of any datasets. These datasets follow their original licenses (if any).

Datasets source:

MAMS https://github.com/siat-nlp/MAMS-for-ABSA

SemEval 2014: https://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools

SemEval 2015: https://alt.qcri.org/semeval2015/task12/index.php?id=data-and-tools

SemEval 2016: https://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools

Chinese: https://www.sciencedirect.com/science/article/abs/pii/S0950705118300972?via%3Dihub

Shampoo: brightgems@github

MOOC: jmc-123@github with GPL License

Twitter: https://dl.acm.org/doi/10.5555/2832415.2832437

Television & TShirt: https://github.com/rajdeep345/ABSA-Reproducibility

Yelp: WeiLi9811@github

absadatasets's People

Contributors

yangheng95 avatar lpfy avatar xumayi avatar jmc-123 avatar symabbas avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.