A solution to the Jigsaw Unintended Bias in Toxicity Classification Kaggle competition.
Fine-tunes BERT and GPT-2 models on the training data with custom weighting schemes and auxiliary target variables.
Unfortunately I used a bugged evaluation metric function during the competition, and severely undermines the effort I put into this competition. I fixed the function and incorporated some of the custom weighting schemes shared by top competitors post-competition.
TODO: Try the renamed huggingface/pytorch-transformers
(from huggingface/pytorch-pretrained-BERT
) package and the new XLNet models.
Unfortunately this project is not as well versioned all its dependencies like my last project ceshine/imet-collection-2019. But this time I included a Dockerfile that can replicate a working environment (at least at the time of writing, that is, July 2019).
Some peculiarity specific to this project:
pytorch-pretrained-BERT-master.zip
is included and should be used viapip install pytorch-ptrained-BERT-master.zip
, This is because the version that I used that lived on the project master branch never made it to PyPI. The latest PyPI version is not compatible with this project.pytroch_helper_bot
is included viagit subtree
to ease the cognitive load on user (it's not on PyPI yet, and I'm not planing to put it on).
Generally speaking, the essential dependencies of this project includes (besides the above two):
- PyTorch >= 1.0
- NVIDIA/apex (for reducing GPU memory consumption and speed up training on newer GPUs).
- pandas
TODO: Write down the specific versions of major dependencies that are proven to work.
I used almost exactly the same framework used by ceshine/imet-collection-2019. Only this time we don't need a separate validation Kernel. The validation scoring function/metric is integrated to the helperbot
workflow.
- Training Kernel (script): fine-tuning bert-base-uncased pretrained models - 1 epoch takes around 4.5 hours.
- Inference Kernel (script): 5 fine-tuned bert-base-uncased models — Private score 0.94356; would be in 101th place (silver medal).
- Inference Kernel (script): 5 fine-tuned bert-base-uncased models + 2 fine-tuned GPT-2 models — Private score 0.94374; would be in 85th place (silver medal).
- Inference Kernel (script): 5 fine-tuned bert-base-uncased models + 2 fine-tuned GPT-2 models ensembled using "power 3.5 weighted sum" — Private score 0.94419; would be in 70th place (silver medal).
I used a Kaggle Dataset toxic-cache to store tokenized training data, so the kernel won't need to re-tokenized the whole training set in every single run.
Example Colab Notebook: code is cloned directly from this Github repo, but the dataset, caching, and model weights live on Google Drive (you need to set it up in your account yourself).