I've trained a binary cartoon gender classifier (ResNet50). My model takes a cartoon face as input and the output should be "female" or "male". Examples of the training data:
![frozen-disneyscreencaps com-247-0](https://user-images.githubusercontent.com/91416518/169550674-641add56-7ec5-417c-93a0-9640856de915.jpg)
In real world applications, sometimes the classifier receives background/ non-face images as input, e.g.
. I want to make my classifier robust against these scenarios, more specifically, if the model receives a background/ non-face image as input, I want the model to predict a gender ("female"/ "male") with a very low confidence score or I want the model to classify the image as a "background"/ "ood" class.
I tried training my model with your seamless solution to achieve just that, but my model is outputting strange confidence score and isn't distinguishing effectively between actual cartoon faces and background/ non-faces.
Here's how I train my model:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
history = []
for epoch in range(num_epochs):
result = {}
# each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
correct_females = 0
correct_males = 0
# iterate over data
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
for i, pred in enumerate(preds):
if labels[i] == 0 and labels[i] == pred:
correct_females += 1
if labels[i] == 1 and labels[i] == pred:
correct_males += 1
if phase == 'train':
# record & update learning rate
result['lrs'] = get_lr(optimizer)
scheduler.step()
# for printing to command line
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double().item() / dataset_sizes[phase]
female_acc = correct_females / gender_dist[phase]['females']
male_acc = correct_males / gender_dist[phase]['males']
# for plotting
result[phase+'_loss'] = epoch_loss
result[phase+'_acc'] = epoch_acc
result[phase+'_female_acc'] = female_acc
result[phase+'_male_acc'] = male_acc
# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
# Save if the model has best accuracy till now
torch.save(model, data_dir+'_model_best.pt')
torch.save(history, data_dir+'_history.pt')
# load best model weights
model.load_state_dict(best_model_wts)
return model, history
# Load pretrained ResNet Model and use seamless solution
import losses # from https://github.com/dlmacedo/entropic-out-of-distribution-detection/tree/master/losses
model = models.resnet50(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = losses.IsoMaxPlusLossFirstPart(num_ftrs, len(class_names))
model = model.to(device)
criterion = losses.IsoMaxPlusLossSecondPart()
# Observe that all parameters are being optimized
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
# # Freeze until 2nd block (included): https://raminnabati.com/post/002_adv_pytorch_freezing_layers/
ct = 0
for child in model.children():
ct += 1
if ct < 6:
for param in child.parameters():
param.requires_grad = False
# Decay LR by a factor of 0.5 every 5 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)
# Train
model, history = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=10)
When I test my model on unseen cartoon faces and background images, the model doesn't behave as expected. In the case of MPS, the confidence score is very low for actual cartoon faces, not just backgrounds. In the case of ES and MDS, the model outputs a tensore, instead of a probability (is this expected?) and the difference between actual face values and background values is very similar.
MPS:
![encanto-animationscreencaps com-10545-1](https://user-images.githubusercontent.com/91416518/169550012-decb8a95-b714-4c59-8440-64f33fb04e29.jpg)
ES:
![encanto-animationscreencaps com-10545-1 (1)](https://user-images.githubusercontent.com/91416518/169550084-e34826d1-92c2-42c5-92c3-8fb1fac16472.jpg)
MDS:
![encanto-animationscreencaps com-10545-1 (2)](https://user-images.githubusercontent.com/91416518/169550131-5b4f2532-7010-47d8-b519-c3640e0da2a4.jpg)
For reference, this is how MPS, ES, and MDS get computed:
logits = model(data)
probabilities = torch.nn.Softmax(dim=1)(logits)
if score_type == "MPS": # the maximum probability score
soft_out = probabilities.max(dim=1)[0]
elif score_type == "ES": # the negative entropy score
soft_out = (probabilities * torch.log(probabilities)).sum(dim=1)
elif score_type == "MDS": # the minimum distance score
soft_out = logits.max(dim=1)[0]
How do I get my model correctly distinguish between face and background/ non-face images?