Hi, Henry. I've got a well trained astroNN model, but I want to do some transfer learn

Transfer learning & Fine-tuning about astronn HOT 8 OPEN

henrysky commented on May 27, 2024

Transfer learning & Fine-tuning

from astronn.

Comments (8)

Li-ZhuoHan commented on May 27, 2024

Now my code is:

Building

class Noah_transfer(BayesianCNNBase):
def init(self, lr=0.0005, dropout_rate=0.2):
super().init()
self.initializer = RandomNormal(mean=0.0, stddev=0.05)
self.max_epochs = 50
self.lr = lr
self.reduce_lr_epsilon = 0.00005
self.reduce_lr_min = 1e-8
self.reduce_lr_patience = 2
self.l2 = 1e-9
self.dropout_rate = dropout_rate
self.input_norm_mode = 3
self.task = 'regression'

def model(self):
    input_tensor = Input(shape=self._input_shape['input'], name='input')
    labels_err_tensor = Input(shape=self._labels_shape['output'], name='labels_err')
    noah = load_folder('Noah_giant')
    base_model = Model(inputs=noah.keras_model.input,
                       outputs=noah.keras_model.get_layer('dense_1').output)
    base_model.trainable = False
    x = base_model([input_tensor], training=False)
    output = Dense(units=self._labels_shape['output'],
                   activation='linear',
                   name='output')(x)
    variance_output = Dense(units=self._labels_shape['output'],
                            activation='linear',
                            name='variance_output')(x)
    model = Model(inputs=[input_tensor, labels_err_tensor], outputs=[output, variance_output])
    model_prediction = Model(inputs=[input_tensor], outputs=concatenate([output, variance_output]))

    variance_loss = mse_var_wrapper(output, labels_err_tensor)
    output_loss = mse_lin_wrapper(variance_output, labels_err_tensor)

    return model, model_prediction, output_loss, variance_loss

Training

noah_transfer = Noah_transfer()
noah_transfer.task = 'regression'
noah_transfer.fit(input_data=x_train,
labels=y_train,
inputs_err=x_train_err,
labels_err=y_train_err)

Both ''model'' and ''model_prediction'' can be printed by summary(), but it will raise an error during training:
Layer "model_2" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 4500, 1) dtype=float32>]

Call arguments received:
  • inputs={'input': 'tf.Tensor(shape=(None, 4500, 1), dtype=float32)', 'input_err': 'tf.Tensor(shape=(None, None, None), dtype=float32)', 'labels_err': 'tf.Tensor(shape=(None, 11), dtype=float32)'}
  • training=True
  • mask=None

It seems that 'label' hasn't been taken into the training and model_2(which means ''model'' in this code) received only one input(which seems to be x_train)

from astronn.

henrysky commented on May 27, 2024

Sorry for the late reply.

I have add a function as a first step to solve your issue. So now this new function transfer_weights() should transfer all the weights to a new model (except the input and possibly the output layers) and set those transferred weights as non-trainable (so when you train on the other survey, only the input/output layers are trained, the middle layers are not trained). To use this new function, you should do git pull to pull the latest commit to your computer.

Here is an example:

from astroNN.models import ApogeeBCNN

# a model trained on the original survey
bneuralnet = ApogeeBCNN()
bneuralnet.fit(xdata, ydata)

# another astroNN model
bneuralnet2 = ApogeeBCNN()

# just to initialize the model with the correct input and output shape
bneuralnet2.max_epochs = 1
bneuralnet2.fit(xdata_another_survey, ydata_another_survey)
# transfer all the weights except layers with incompatible shape
bneuralnet2.transfer_weights(bneuralnet)

# training for real, the middle part of the model is not trainable
bneuralnet2.max_epochs = 60
bneuralnet2.fit(xdata_another_survey, ydata_another_survey)

# now bneuralnet2 is your new astroNN model transferred to anther survey with the same architecture of the original survey

from astronn.

Li-ZhuoHan commented on May 27, 2024

Thank you for your reply.

The two of us seem to have different ideas, your way is to transfer the weights of the base model while mine is to transfer the whole base model. Function transfer_weights() is a clever and effective way to do the transfer learning, it should be enough for me, for now.
But I still have some doubts:

why the training step goes wrong while all the models associated(noah, base_model, model, model_predcition) are good to be printed.
what if I want to splicing two models or add new layers directly after a base model?
This may have something to do with your architecture and could be complicated to implement, I'm not sure for that. Anyway, thanks to your efforts, it can work now and forgive me for leaving these doubts to you irresponsibly. Hope you can make astroNN more and more perfect and benefit more users.

from astronn.

Li-ZhuoHan commented on May 27, 2024

There still are some bugs.

When the output layer of my transfered model and the base model have the same number of nodes, the summary says that all of my params are non-trainable. But the weights of the transfered model's output layer should be trained.
Funny thing is that, if so, my loss should stay the same during the training step, but it turns out that the loss kept getting smaller which means the weights are still trained. This behavior is not only a departure from what I want but also a departure from the model summary.
On the other hand, when my output layer node count is different from the original model, trainable params is the sum of params in output layer and variance_output layer which is right. But in the training step, it seems that all the params are still trained.

from astronn.

henrysky commented on May 27, 2024

Yes it seems so that supposedly non-trainable parameters still get trained somehow. I am still investigating what is going on but most likely I need to set them to be non-trainable before compiling the model.

As for the output layer, the current strategy is to transfer all weights with compatible shape (i.e. if shape of weights are the same for a layer, then transfer those weights). I think what you want is to only train the input layer?? Or you can force a different output shape so that output layers wont get transferred (i.e. maybe train on T_eff and Log(g) for one survey and fe_h for another survey so output shapes are different). I think there could be a case where you have a small overlap between two surveys, then you can use the spectra from survey B but only train the input layer with label from the original survey A?

Regarding your questions from a few days ago, what do you mean by training step goes wrong? And yes splicing/adding layers probably requires more work but its not undoable per say but we need to make the simplest case working correctly first...

from astronn.

Li-ZhuoHan commented on May 27, 2024

Thank you for your patience and reply.

The training step failure happened because of model splicing a few days ago, but as you said, we should make the simplest case work first, so let's talk about it later.
What really important is that I want to train both the input layer and the output layer, whether the output layers have the same shape or not. (for now they are the same, so the weights are transfered and "locked")
The case is that I have a model trained on spectra from survey A but labels from survey B, now I want to transfer this model to train it on spectra from survey C and labels from survey B. I don't know if it will work, but I just want to take an atempt.

from astronn.

henrysky commented on May 27, 2024

I think I have fixed the issue of weights still being trained even after setting trainable=False, I also have added an argument exclusion_output=False so you can exclude output weights when transferring with transfer_weights(). You can checkout the latest commit to see if it is working for you

from astronn.

Li-ZhuoHan commented on May 27, 2024

Thank you for all the effort, it works now.

from astronn.

Transfer learning & Fine-tuning about astronn HOT 8 OPEN

Comments (8)

Building

Training

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent