Adaptive Threshold MixMatch

Midterm Project for KAIST CS492H: Special Topics in Computer Science<Deep Learning for Real-world Problems.
done by Donghyun Kim(Github) / Yongmin Lee(Github).

Abstract

With a great amount of labeled data, the role of unlabeled samples in semi-supervised learning often becomes insignificant. Because proper usage of unlabeled data is crucial for semi-supervised learning to be applicable in real-world problems, we have to refrain from training steps to be dominated by labeled data. To utilize unlabeled data more elegantly, we devised an advanced scheduling algorithm for managing pseudo labels in intermediate steps. Our algorithm filters the pseudo label by its confidence depending on the model’s current learning phase. As a result, it increased the model’s performance as well as adding more robustness where there are few labeled data. We gained 1.17% additional top 1 accuracy compared to MixMatch. Our research also can be combined with other SSL methods, since it’s compact and well applicable to other domains.

Overall Structure

Our GitHub repo is divided into 3 parts. Please keep in mind that all codes in here are for NSML environment, not in local machines.

First, MixMatch_basic, Fixed_Threshold, Adaptive_Threshold contains finalized version of each source code that we used in our environments. These codes have proper argument settings that can be used to further research. Detailed instructions can be found in each directory.

Second, Experiment_codes contains exact source codes and some informations about experiements that was conducted in NSML. Each folder has session name on it, and exact files and configurations are inside each folder. This is to reproduce results that are in our presentation and paper. However, due to randomization in validation, exact accuracy might differ a bit.

Finally, etc folder contains remaining codes that are used in our research, but are not related with our final paper. Some implementation of SSL papers, code fragments, intermediate version of our implementation might be included in the folder.

2020.5.11 Report and Experiment Note added.

MixMatch_basic

Contains baseline codes for MixMatch.

Fixed_Threshold

Contains MixMatch with using fixed threshold for pseudo labels.