feedback-prize---evaluating-student-writing's Introduction

Feedback Prize - Evaluating Student Writing

This is a collection of my some of the code from the Feedback Prize - Evaluating Student Writing Kaggle competition.

Preprocessing

File	Comment
preprocessing.ipynb	various code for preprocessing the original data provided by the competition organizers.

Models

File	Comment
fbp_train.ipynb	The code to train the model. The model is a Token Classification model with roberta-base (`LB: 0.631`) as the backbone. I also tried the longformer-base (`LB: 0.631`) and used gradient accumulation to fit a single batch of size 4 into RTX 3070ti's memory.

Other

File	Comment
LDA Topic Modeling	I had an idea of creating folds based on topics extracted using LDA. The logic was that documents of a similar topic would be structurally similar. Unfortunately, this had a negative impact on the model's overall performance. The parameters for the LDA model were not optimized and were chosen randomly such that the model returned topic clusters similar to that of what other users had shared on the forums.
stopwords.txt	Compiled list of stopswords from various sites. I also manually added additional words based on experimentation.

File

Comment

LDA Topic Modeling

I had an idea of creating folds based on topics extracted using LDA. The logic was that documents of a similar topic would be structurally similar. Unfortunately, this had a negative impact on the model's overall performance. The parameters for the LDA model were not optimized and were chosen randomly such that the model returned topic clusters similar to that of what other users had shared on the forums.

stopwords.txt

Compiled list of stopswords from various sites. I also manually added additional words based on experimentation.

Recommend Projects

jwnz / feedback-prize---evaluating-student-writing Goto Github PK