leopard-ai / betty Goto Github PK

Betty: an automatic differentiation library for generalized meta-learning and multilevel optimization

Home Page: https://leopard-ai.github.io/betty/

License: Apache License 2.0

Python 100.00%

autodiff automatic-differentiation bilevel-optimization meta-learning multilevel-optimization hyperparameter-optimization neural-architecture-search reinforcement-learning artificial-intelligence machine-learning

betty's People

Contributors

Stargazers

Watchers

betty's Issues

[REQUEST] Documentation for MAML

Need to write an example documentation for MAML.

How to control the times of optimizer.step() for different level?

Hello, I'm not an expert on MLO. If I understand correctly, in one iteration, the level-2 module and level-1 module are all updated once (that is, their optimizer.step() is called both once)?

My question is how can I control this. For example, I want to have level-2 calls optimizer.step() for 100 times and then calls optimizer.step() 1 time for level-1. Thanks!

[BUG] Error in CG implementation

betty/betty/hypergradient/cg.py

Line 53 in 39457e1

p_new = [rr + beta * pp for rr, pp in zip(r, p)]

I think p_new should be calculated based on r_new instead of r here. In the Wikipedia version (https://en.wikipedia.org/wiki/Conjugate_gradient_method) the corresponding formula is $p_{k+1} = r_{k+1} + \beta_k p_k$ with $r_{k+1}$ corresponding to r_new.

[Question] Code for the continued pretraining experiment

Hello!

I was hoping to reproduce the continued pretraining experiment in the SAMA paper and was curious where I could find the code. The learning_by_ignoring example looks quite relevant, but the datasets use there seem different from the ones used in the experiment. Could you point me to the experiment code?

Thanks!

[BUG] Training gets slower over iterations with IterativeProblem

Training with IterativeProblem gets slower over iterations. Below is the log from the logistic regression example. The same bug was also observed with the maml example.

[2022-07-01 11:01:53] [INFO] [Problem "outer"] [Global Step 1000] [Local Step 10] loss: 0.5296119451522827
[2022-07-01 11:01:56] [INFO] [Problem "outer"] [Global Step 2000] [Local Step 20] loss: 0.3373554050922394
[2022-07-01 11:01:59] [INFO] [Problem "outer"] [Global Step 3000] [Local Step 30] loss: 0.31969979405403137
[2022-07-01 11:02:02] [INFO] [Problem "outer"] [Global Step 4000] [Local Step 40] loss: 0.31455692648887634
[2022-07-01 11:02:05] [INFO] [Problem "outer"] [Global Step 5000] [Local Step 50] loss: 0.31011053919792175
[2022-07-01 11:02:08] [INFO] [Problem "outer"] [Global Step 6000] [Local Step 60] loss: 0.3047352433204651
[2022-07-01 11:02:12] [INFO] [Problem "outer"] [Global Step 7000] [Local Step 70] loss: 0.301718533039093
[2022-07-01 11:02:15] [INFO] [Problem "outer"] [Global Step 8000] [Local Step 80] loss: 0.30068764090538025
[2022-07-01 11:02:19] [INFO] [Problem "outer"] [Global Step 9000] [Local Step 90] loss: 0.29966291785240173
[2022-07-01 11:02:22] [INFO] [Problem "outer"] [Global Step 10000] [Local Step 100] loss: 0.2992149293422699
[2022-07-01 11:02:27] [INFO] [Problem "outer"] [Global Step 11000] [Local Step 110] loss: 0.2989771068096161
[2022-07-01 11:02:31] [INFO] [Problem "outer"] [Global Step 12000] [Local Step 120] loss: 0.2986523509025574
[2022-07-01 11:02:36] [INFO] [Problem "outer"] [Global Step 13000] [Local Step 130] loss: 0.29848340153694153
[2022-07-01 11:02:42] [INFO] [Problem "outer"] [Global Step 14000] [Local Step 140] loss: 0.29845142364501953
[2022-07-01 11:02:47] [INFO] [Problem "outer"] [Global Step 15000] [Local Step 150] loss: 0.2984345257282257
[2022-07-01 11:02:53] [INFO] [Problem "outer"] [Global Step 16000] [Local Step 160] loss: 0.2983992397785187
[2022-07-01 11:03:00] [INFO] [Problem "outer"] [Global Step 17000] [Local Step 170] loss: 0.2983682453632355
[2022-07-01 11:03:08] [INFO] [Problem "outer"] [Global Step 18000] [Local Step 180] loss: 0.29832738637924194
[2022-07-01 11:03:15] [INFO] [Problem "outer"] [Global Step 19000] [Local Step 190] loss: 0.29827484488487244
[2022-07-01 11:03:23] [INFO] [Problem "outer"] [Global Step 20000] [Local Step 200] loss: 0.29820242524147034
[2022-07-01 11:03:31] [INFO] [Problem "outer"] [Global Step 21000] [Local Step 210] loss: 0.29809457063674927
[2022-07-01 11:03:38] [INFO] [Problem "outer"] [Global Step 22000] [Local Step 220] loss: 0.2979184091091156

[Suggestion] torch.func in Pytorch 2.1

Have you considered changing the implementation of IterativeProblem from functorch to torch.func() since the functorch APIs will be deprecated in future versions?

[REQUEST] Documentation for DARTS (differentiable neural architecture search)

Need to write an example documentation for DARTS NAS.

may i ask how to run the new example of imagenet pruning?

the code of create_hdf5.py can not be ran successfully, assert len(fnames) == num_ex, how can i fix this problem?

WandB logging issue

wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: ERROR Error while calling W&B API: entity pod-tuning not found during upsertBucket (<Response [404]>)

[REQUEST] Distributed data parallel training

Currently, Betty only supports torch.nn.DataParallel. Compared to torch.nn.parallel.DistributedDataParallel, torch.nn.DataParallel is much slower even in the single-machine multi-gpu settings. Therefore, we need to replace torch.nn.DataParallel with torch.nn.parallel.DistributedDataParallel for better training speed and the multi-machine multi-gpu support.

Any plan to support SAMA?

It would be really nice if betty supports SAMA as well! Thanks!

[Request] Improve distributed training performance

PyTorch minimizes throughput degradation by overlapping communication and computation in distributed training.
However, Betty currently performs computation first and then manually perform gradient synchronization, not using the computation-communication overlapping technique.
This is mainly due to the fact that hypergradient calculation oftentimes requires second-order gradient computation as well as multiple forward-backward propagations.
To improve distributed training performance we can:

make the use of PyTorch's native communication-computation overlap by replacing torch.autograd.grad with torch.autograd.backward
keep most computations in hypergradient calculation local, and perform gradient synchronization at the end once.

leopard-ai / betty Goto Github PK

betty's People

Contributors

Stargazers

Watchers

Forkers

betty's Issues

Recommend Projects

Recommend Topics

Recommend Org