corpus_parsing_scripts's Introduction

Text Dataset Parsing Scripts

This repository contains a collection of scripts that parse four different NLP datasets (3 aspect-based sentiment analysis datasets and 1 sentiment analysis dataset). Each script puts the parsed data into a SQLite database with minimal changes to the individual texts in the datasets.

Each of the four scripts correspond to one dataset. For citations and more information on each dataset, please look at the comments at the top of each script file. These scripts do not currently extract every part of the supported datasets since these scripts were specifically written for my Masters thesis.

Supported Datasets:

The restaurant corpus from SemEval 2016's Aspect-Based Sentiment Analysis Task
Restaurant Reviews dataset
SOCC (SFU Opinion and Comments) corpus, which contains opinion articles from the Globe and Mail
Stanford Sentiment Treebank

The datasets are not included in this repository. Please look at parameters.json for where the scripts expect the data to placed by default.

Recommend Projects

jayantmadugula / corpus_parsing_scripts Goto Github PK

corpus_parsing_scripts's Introduction

Text Dataset Parsing Scripts

corpus_parsing_scripts's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent