Comments (8)
Hi, I think the following is related.
What if we want just to alternate optimizers during training step by step. I mean the cases when we for example use Adam for 5 epochs than SGD for other 5 epochs.
How could we prevent training_step
which in that case has "optimizer_idx" argument
from executing twice?
from lightning.
@wheatdog @sidhanthholalkere on master now. override optimizer_step to update any optimizer at arbitrary intervals.
from lightning.
First, when defining validation_end
, instead of passing
return [torch.optim.Adam(self.parameters(), lr=0.02)]
,
they could try
optimizer = torch.optim.Adam(self.parameters(), lr=2.0)
optimizer.skip_batch = 1
return [optimizer]
I'm sure that whoever wants to use this skipping feature would be comfortable adding a few lines
To accommodate for this, whenever self.optimizers = model.configure_optimizers()
is called in trainer.py
, you could just add the following:
for optimizer in self.optimizers:
try:
optimizer.skip_batch = 0
except AttributeError:
optimizer.skip_batch = 0
Basically the first part is checking to see if the user manually defined the skip rate, and if not, just set it to 0(never skip)
Later on, when calling optimizer.step()
, you can replace it with
if self.batch_nb % (optimizer.skip_batch + 1) == 0:
optimizer.step()
I believe this should also work with schedulers as well.
But then again, I don't know that much about Pytorch.
Nevertheless, this project looks quite exciting and I hope I can provide some help!
On another note, why would you want to have this feature? If its so optimizer A can learn "faster" than B, why not just multiply B's learning rate by 1/k so you fully take advantage of all of the gradients while having it optimize slower than A.
from lightning.
good suggestion, but I wonder how adding properties to the optimizer might affect loading, saving and training continuation.
It seems a bit hacky, so let's think of other alternatives as well? If this turns out to be the best way, then we can go with it.
I was thinking about maybe just allowing the configure_optimizers method to return another list with config stuff:
return [opt_a, opt_b], [sched_a], [{'skip_batch': 2}]
Something like that. But don't love this either haha.
from lightning.
I fail to understand how to implement GAN-related training scheme in pytorch-lightning. Can you give me some examples?
from lightning.
@wheatdog @sidhanthholalkere see #106 for discussion. #107 for changes to support this.
Would these changes work for you?
from lightning.
docs here: https://williamfalcon.github.io/pytorch-lightning/Trainer/hooks/#optimizer_step
from lightning.
Is there a way to change the optimizer after n epochs ? I am trying by calling configure_optimizer and it changes the Lr value but scheduler is not working
from lightning.
Related Issues (20)
- Unable to extend FSDPStrategy to HPU accelerator HOT 7
- SaveConfigCallback.save_config is conflict with DDP HOT 1
- Logging Documentation Does not Detail How to Access the Logged Values during the fit loop
- Apply the ignore of the save_hyperparameters function to args as well.
- Cannot run in SLURM Interactive Session
- Resume from mid steps inside an epoch
- `DDPStrategy` fails when using accelerators other than CUDA
- PyTorch Lightning with T5 Model - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn HOT 1
- Script freezes when Trainer is instantiated
- Sanitize object params before they get logged from argument-free classes
- Support GAN based model training with deepspeed which need to setup fabric twice HOT 2
- IndexError: Pytorch-lightning CompositionalMetric require tensor.item() if dim=0 whether I did so
- Huge metrics jump between epochs && Step and epoch log not matched, when accumulate_grad_batches > 1
- Does `fabric.save()` save on rank 0? HOT 3
- Turn off hpc checkpoint saving in SLURM environment if trainer.fit(..., ckpt_path="last") HOT 3
- DDP strategy doesn't work for on_validation_epoch_end, always hang HOT 4
- TensorBoardLogger does not document .add_image() function
- Passing a dataloader to save_hyperparams hangs trainer.fit
- If saved_for_backward returns NumberProxy, the value is taken from compile time, not runtime. HOT 1
- logged images are not showing up in tensorboard images.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightning.