Skillenza Data Science Competition
Introduction
Skillenza an online coding platform hosted a data science competition
Natural Language Processing Challenge Description Of The Data: The data provided are excerpts from research papers of certain categories. The text from the abstracts research papers have been tokenized. The following are the available attributes:
id: Unique identifier with all tuples in the data set.
category: Category in which the text present in the specific tuple is mapped to. There are 5 categories of research papers in the given data. The representation of the categories are as follows:
- Business
- Education
- Environment
- Politics
- Psychology
text: Tokenized text from the abstracts of the research papers.
Description Of The Files Provided:
Train: This file contains all the features available. This file is to be used to training the data model and validation of the same.
Test: The file contains all features expect for the target variable. Prediction must be made on the test file only. The prediction must be written to a CSV (comma seperated values). Please read the Objective Of The Problem section to better understand how solution is to be written and submitted.
Objective Of The Problem: The objective of the problem is to predict the the categories of the text attribute provided for the test file. All the predictions are to be written to a CSV file which contains two attributes. The first attribute is id which is the unique identifer for each tuple and the second attribute is the category for the prediction. Please refer to the sample submission file as an example to how the solution file should be. Please upload the solution file in order to get the solution evaluated.