Comments (8)
The problem seems to be in the way the parameters are structured. For fastai.vision models learn.opt.param_lists return a count of 3 lists and freeze() deactivates gradients for the first two parameter lists. In tsai models all parameters are in one list. So freeze() works on an empty list.
## Pseudocode for freeze implementation
def freeze():
freeze_to(-1)
def freeze_to(n):
for p in learn.opt.all_params(slice(None, len(learn.opt.param_lists) - n:
p.require_grad=False
EDIT:
I think I found the source of the problem. The Learner has to create models as a Sequential (init, body, head), so the optimizer knows how to freeze init and body but not the head. At least that seems to be the case from fastai.vision.learner
from tsai.
My workaround for now, if anyone else wants to use TSBERT / Fine-Tuning
def freeze(learn):
assert hasattr(learn.model, "head"), f"you can only use this with models that have .head attribute"
for p in learn.model.parameters():
p.requires_grad=False
for p in learn.model.head.parameters():
p.requires_grad=True
def unfreeze(learn):
for p in learn.model.parameters():
p.requires_grad=True
def fine_tune(learn, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
pct_start=0.3, div=5.0, **kwargs):
"Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
freeze(learn)
learn.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
base_lr /= 2
unfreeze(learn)
learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
from tsai.
Hi @kamisoel,
Thanks for raising this issue. It's important to fix it now that we have a way to pretrain models. I'll look into it to check what needs to be updated in tsai
archs to support fine-tuning.
In the meantime, have you seen any difference in performance when using your workaround?
from tsai.
Hi @oguiza,
Perfect, I hope my debugging work is of use for this :)
My workaround seems to work quite fine and should have more or less the same performance, because it's pretty close to the fastai implementation. It's just less flexible in splitting the models head and body, which shouldn't be a huge problem for TSBert since it has the same restriction (only works for models with the head property)
from tsai.
Hi @kamisoel,
It's taken me a while, but I've already fixed this issue.
From now on, all models that have 'Plus' in their name will be able to use pre-trained weights and be fine-tuned.
Unlike vision models, where model parameters are split into 3 groups, time series models have only 2 groups of parameters (for backbone and head). Vision models are split into 3 groups as the initial layers need to be trained in many cases (especially if passing a number of filters different to 3).
Based on this I've re-run the TSBERT tutorial, and the results are practically identical. So there was no benefit in fine-tuning the model in this particular case.
It'd be good if you could test the change to make sure everything is working as expected.
Thanks again for raising this issue!
from tsai.
I will close this issue for lack of response. If the issue persists, please, feel free to re-open.
from tsai.
Hi @oguiza
Thanks for the fast fix - and sorry for my lack of response ^^' The change seems to work just fine! 👍
Just another small request: Would it be possible to allow the use of the XCM model for pre-training as well? It already has a separate head and can be used with TSBert as well
from tsai.
Hi @kamisoel, I'm glad to hear the issue is now fixed.
As to your 2nd request, I've already uploaded a new XCMPlus model you can pre-train. I haven't fully tested it, but it has the same structure as the rest, so I think it should work. It's already loaded in GitHub and will create a new pip release shortly (probably later today or tomorrow).
If you try it, please let me know if it works well.
from tsai.
Related Issues (20)
- Transfer Learning in PatchTST using Fine Tuning and Linear Probing - Notebook Tutorial
- It has not been possible to download the required files - get_classification_data func HOT 2
- Using pre train weight in TSRegressor API
- The label data corresponds to the number of time points, not the number of samples. HOT 2
- Unable to Navigate to self._cell Source Code in RNN Using PyCharm
- Request for Detailed Architecture and Forward Process of xresnet1d50 in XResNet1d.py for Diagramming and Corresponding Paper
- AttributeError: Exception occured in `ShowGraph` when calling event `after_fit`: 'bool' object has no attribute 'all'
- AttributeError: 'TST' object has no attribute 'backbone'
- Bug with `prepare_forecasting_data` in v0.3.7 HOT 1
- Variable `sequence_length` ts for MiniRocket Pytorch
- get_long_term_forecasting_data fails HOT 2
- Does PartchTST work for categorical variables?
- Using MVP callback for pre-training a TSTabFusionTransformer multilabel classification model
- Q: Can i use TST for regression problem?
- How to use the data I've prepared for training HOT 3
- Training LSTM for time series data
- PatchTST tutorial missing data HOT 2
- Support for anomaly detection / segmentation
- PatchTST pretrain
- TSiT Non-trainable parameters
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsai.