Code Monkey home page Code Monkey logo

Comments (8)

kamisoel avatar kamisoel commented on May 12, 2024

The problem seems to be in the way the parameters are structured. For fastai.vision models learn.opt.param_lists return a count of 3 lists and freeze() deactivates gradients for the first two parameter lists. In tsai models all parameters are in one list. So freeze() works on an empty list.

## Pseudocode for freeze implementation
def freeze(): 
   freeze_to(-1)
def freeze_to(n):
   for p in learn.opt.all_params(slice(None, len(learn.opt.param_lists) - n:
      p.require_grad=False

EDIT:
I think I found the source of the problem. The Learner has to create models as a Sequential (init, body, head), so the optimizer knows how to freeze init and body but not the head. At least that seems to be the case from fastai.vision.learner

from tsai.

kamisoel avatar kamisoel commented on May 12, 2024

My workaround for now, if anyone else wants to use TSBERT / Fine-Tuning

def freeze(learn):
  assert hasattr(learn.model, "head"), f"you can only use this with models that have .head attribute"
  for p in learn.model.parameters():
    p.requires_grad=False
  for p in learn.model.head.parameters():
    p.requires_grad=True

def unfreeze(learn):
  for p in learn.model.parameters():
    p.requires_grad=True

def fine_tune(learn, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
              pct_start=0.3, div=5.0, **kwargs):
  "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
  freeze(learn)
  learn.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
  base_lr /= 2
  unfreeze(learn)
  learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)

from tsai.

oguiza avatar oguiza commented on May 12, 2024

Hi @kamisoel,
Thanks for raising this issue. It's important to fix it now that we have a way to pretrain models. I'll look into it to check what needs to be updated in tsai archs to support fine-tuning.
In the meantime, have you seen any difference in performance when using your workaround?

from tsai.

kamisoel avatar kamisoel commented on May 12, 2024

Hi @oguiza,
Perfect, I hope my debugging work is of use for this :)
My workaround seems to work quite fine and should have more or less the same performance, because it's pretty close to the fastai implementation. It's just less flexible in splitting the models head and body, which shouldn't be a huge problem for TSBert since it has the same restriction (only works for models with the head property)

from tsai.

oguiza avatar oguiza commented on May 12, 2024

Hi @kamisoel,
It's taken me a while, but I've already fixed this issue.
From now on, all models that have 'Plus' in their name will be able to use pre-trained weights and be fine-tuned.
Unlike vision models, where model parameters are split into 3 groups, time series models have only 2 groups of parameters (for backbone and head). Vision models are split into 3 groups as the initial layers need to be trained in many cases (especially if passing a number of filters different to 3).
Based on this I've re-run the TSBERT tutorial, and the results are practically identical. So there was no benefit in fine-tuning the model in this particular case.
It'd be good if you could test the change to make sure everything is working as expected.
Thanks again for raising this issue!

from tsai.

oguiza avatar oguiza commented on May 12, 2024

I will close this issue for lack of response. If the issue persists, please, feel free to re-open.

from tsai.

kamisoel avatar kamisoel commented on May 12, 2024

Hi @oguiza
Thanks for the fast fix - and sorry for my lack of response ^^' The change seems to work just fine! 👍

Just another small request: Would it be possible to allow the use of the XCM model for pre-training as well? It already has a separate head and can be used with TSBert as well

from tsai.

oguiza avatar oguiza commented on May 12, 2024

Hi @kamisoel, I'm glad to hear the issue is now fixed.
As to your 2nd request, I've already uploaded a new XCMPlus model you can pre-train. I haven't fully tested it, but it has the same structure as the rest, so I think it should work. It's already loaded in GitHub and will create a new pip release shortly (probably later today or tomorrow).
If you try it, please let me know if it works well.

from tsai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.