- Removed puntuation, extra spaces and digits
- Lower case all text
- Chose the first Node of
product_category_tree
to be Main Category - Removed Categories with < 100 rows
- Catorical plots using Seaborn and matplotlib
- Stiplots of price, discounts vs Category
- Reviews Anlysis
- Word Cloud
- Initially used a DistillBERT but switched to XLNet because it performs better at classification and has no number of token
- Used a pretrained XLNet adding a Classifier layer on top of it
- Balanced class imbalance using class weights
- finally used description with name, brand, and product specification
Final Validation F1 Score: 74
Confusion Matrix:
- Using a data augmentation method to fix class imbalance
- using translation
- using hypernyms and hyponymns and synonyms