haitongli / knowledge-distillation-pytorch Goto Github PK
View Code? Open in Web Editor NEWA PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
License: MIT License
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
License: MIT License
My teachermodel's acc is 99%, but when I try to distill knowledge, my studentmodel's acc is under 10%. It seems that studentmodel didn't learn knowledge from teachermodel.
I use Lenet5 as my studentmodel,alpha = 0.9, temperature = 1.
Thanks for your help.
Teacher model's outputs are only computed before training epoch. https://github.com/peterliht/knowledge-distillation-pytorch/blob/master/train.py#L277
It assumes that inputs are fixed in each epoch. But the inputs are different in each epoch due to the random transform operations, e.g. randonCrop, randomFlip.
I think the right way is to recalculate teacher's outputs in each epoch.
Is it a bug?
Hi,
Very useful code and instructions! If I understand it correctly, the teacher model shouldn't be updated with gradients and only the student model will compute gradients during the distillation process. I noticed in the train_and_evaluate_kd()
function, the teacher model is set to eval() mode. But I think eval() only alters the behavior of dropout or BatchNorm, it doesn't stop gradient update when loss.backward() is called. I think teacher model's parameters should set require_grad to False.
Hello @peterliht, the pre-trained teacher models are available but do you have the corresponding student models (5 layer CNN, where Teacher Model: Resnet 18 and dataset: CIFAR 10) uploaded somewhere? If you could provide it then it would be of great help. Thanks.
2018-03-09 20:46:06,587:INFO: Loading the datasets...
2018-03-09 20:46:10,074:INFO: - done.
2018-03-09 20:46:10,078:INFO: Starting training for 30 epoch(s)
2018-03-09 20:51:27,485:INFO: Loading the datasets...
2018-03-09 20:51:30,918:INFO: - done.
2018-03-09 20:51:30,922:INFO: Starting training for 30 epoch(s)
2018-03-09 20:54:20,870:INFO: Loading the datasets...
2018-03-09 20:54:24,364:INFO: - done.
2018-03-09 20:54:24,368:INFO: Starting training for 30 epoch(s)
2018-03-09 20:54:24,368:INFO: Epoch 1/30
In the code, the dataloader 'shuffle' switch is set to True.
So the teacher output can not actually work.
The reason why your temperature is bigger than the original paper setting (said T = 2) may be caused by KLDivLoss. You may try to set reduction = "batchmean" in KLDivLoss. Just a guess. Welcome others to discuss.
hi,i want to know whether Knowledge Distillation can be used in regression problem
I have downloaded the .zip file from boxed folder, but it can't be unzipped successfully, after being unzipped, the .tar file has become .tar.cpgz file. I also have tried to unzipped through 'unzip' and 'tar xvf' through terminal on max OSX, but failed. II
Could you please send me the boxed folder file to my email? [email protected] Thank You!
请问我在服务器上如何通过linux命令下载 box文件夹中的数据?
Hello peterliht,
I ran through your code according to the instructions, did not modify any parameters, but found that the results vary greatly.
What parameters did you modify before releasing the code?
The following experimental results on resnet18:
python train.py --model_dir experiments/resnet18_distill/resnext_teacher
My experimental environment is:
python 3.5.2
pytorch 0.4.0
GPU TITAN Xp
Did you use the fitnets for kd the model?
Fitnets: Hints for thin deep nets
This is my situation.
I trained base_cnn in advance using cifar10 dataset for comparing performance between base_cnn and cnn_distill.
Also, I trained base_resnet18 as a teacher using same dataset.
Lastly, I trained cnn_distill using resnet18.
I got two accuracy which were 0.875 from base_cnn and 0.858 from cnn_distill in each metrics_val_best_weights.json.
It looks like that base_cnn is better than cnn_distill.
I didn't change any param in base_cnn and cnn_distill except for one param which was augmentation value from 'no' to 'yes' in base_cnn's params.json.
I think there would be no reason to use knowledge-distillation if base_cnn had higher accuracy.
Please let me know where I was wrong.
Thanks for your time.
Hi @peterliht , Thanks for you great job!
I am trying to train my own dataset , however I got RuntimeError: size mismatch, m1: [2 x 2048], m2: [512 x 2] at errors,
I guess, it is because dataloader shape are different from yours , I am using this repo https://github.com/cs230-stanford/cs230-code-examples/tree/master/pytorch/vision to load my own data,
please guide to fix the issue
Thanks in advance, appreciate your time
I print the first 32 labels of train dataloader for teacher net and got:
14, 8, 29, 67, 59, 49, 73, 25, 4, 76, 11, 25, 82, 6, 11, 47, 28, 43, 40, 49, 27, 92, 62, 37, 64, 22, 38, 90, 14, 16, 27, 92
while the first 32 labels of train dataloader of student net but they are:
86, 40, 14, 73, 50, 43, 40, 27, 1, 51, 11, 47, 32, 76, 28, 83, 32, 4, 52, 77, 3, 64, 24, 36, 80, 93, 96, 72, 26, 75, 47, 79
So it seems that the output index of teacher net and student net are not the same at each batch.
I modified the code, and I get an error, does anybody have any idea why?
I am using CPU:
I have an error in this line:
—> 10 output_teacher_batch = teacher_model(data_batch).data().numpy()
TypeError: ‘Tensor’ object is not callable
Does anybody have an idea how to solve this?
def fetch_teacher_outputs(teacher_model, dataloader):
teacher_model.eval()
teacher_outputs = []
for i, (data_batch, labels_batch) in enumerate(dataloader):
if torch.cuda.is_available():
data_batch, labels_batch = data_batch.cuda(async=True),
labels_batch.cuda(async=True)
data_batch, labels_batch = Variable(data_batch), Variable(labels_batch)
**output_teacher_batch = teacher_model(data_batch).data().numpy()**
teacher_outputs.append(output_teacher_batch)
return teacher_outputs
cant open the url to get the pretrained teacher model checkpoints, can you offer another way??
Hi, this is the error I got while executing this comman, could you please check this?
python3 train.py --model_dir experiments/resnet18_distill/resnext_teacher
Loading the datasets...
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
I suggest both training loss function without KD and with KD should add a softmax function, because the outputs of models are without softmax. Just like this.
https://github.com/peterliht/knowledge-distillation-pytorch/blob/e4c40132fed5a45e39a6ef7a77b15e5d389186f8/model/net.py#L100-L114
==>
KD_loss = nn.KLDivLoss()(F.log_softmax(outputs/T, dim=1), F.softmax(teacher_outputs/T, dim=1)) * (alpha * T * T) + \ F.cross_entropy(F.softmax(outputs,dim=1), labels) * (1. - alpha)
&
https://github.com/peterliht/knowledge-distillation-pytorch/blob/e4c40132fed5a45e39a6ef7a77b15e5d389186f8/model/net.py#L83-L97
==>
return nn.CrossEntropyLoss()(F.softmax(outputs,dim=1), labels)
For another thing, why does the first part of the KD loss function in distill_mnist.py multiply 2?
https://github.com/peterliht/knowledge-distillation-pytorch/blob/e4c40132fed5a45e39a6ef7a77b15e5d389186f8/mnist/distill_mnist.py#L96-L97
One more thing, it is not necessary to multiply T*T if we distill only using soft targets.
https://github.com/peterliht/knowledge-distillation-pytorch/blob/e4c40132fed5a45e39a6ef7a77b15e5d389186f8/mnist/distill_mnist_unlabeled.py#L96-L97
reference
Distilling the Knowledge in a Neural Network
I'm unable to run train.py on python 3.9. The versions stated in requirements are wrong, and after installing the newest libraries there's a bunch of syntax errors in the program. Is there an updated version available?
why softmax for teacher output , but log softmax for student output ?
I can't download the box folder.Could someone send these files to my mailbox?Thank you so much!
why?
from pytorch_lighting import as pl
it showes : no module named torch._dynamo.
how to solve it?thx.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.