Comments (5)
When using TorchMetrics (written by the same developers are PyTorch Lightning), the metrics are automatically distributed between devices during steps, and collected at the end of each step and epoch. These are made available in the hooks for those events, see TorchMetrics in PyTorch Lightning. I don't know why they don't populate the same trainer.callback_metrics
dictionary in either case, might be an error on their part, but perhaps it could be solved by writing monitored metrics to a property of the model?
from optuna-examples.
Closed because I noticed it's a duplicate of optuna/optuna#1417
from optuna-examples.
Let me summarise the points -- thank you @jacanchaplais for pointing out them.
- The multi-GPUs training does not work due to this error -- I suppose rank 0 has only the attribute
val_acc
and the others do not. If a machine has multi-GPUs,pytorch_lightining_simple.py
uses all GPUs according to this line, so this error happens always on a machine with multi-GPUs. - Are there any issues with PyTorch-lightning with DDP training? If so, it would be great to mention it as comments in the example code or the callback page.
from optuna-examples.
I suppose rank 0 has only the attribute
val_acc
and the others do not.
Also, I don't think the issue is due to which device the metrics are stored to, as the line which tries to access trainer.callback_metrics['val_acc']
is after the trainer.fit()
method has completed execution, so I would have thought that the data would not be partitioned to specific devices (although I am new to GPU parallelism, so let me know if I'm wrong).
from optuna-examples.
Submitted failure to populate callback_metrics
as an issue to PyTorch Lightning, as I implemented with Ray[Tune] and ran into the same problem.
from optuna-examples.
Related Issues (20)
- Trial Fail HOT 1
- Optuna DDP with Slurm Cluster HOT 1
- License of the optuna-examples repo? HOT 2
- optuna.integration.XGBoostPruningCallback is not xgboost TrainingCallback HOT 2
- Updating the PyTorch Lightning example to >= 2.0 HOT 5
- Pruning Not Working in Pytorch HOT 1
- Support Lighting instead/besides Pytorch Lighting HOT 1
- Flax example HOT 2
- optuna-examples/xgboost /xgboost_integration.py error HOT 1
- Add `README.md` to `./dashboard` HOT 3
- Pruning only on some models? HOT 2
- Add Python 3.12 to the CI HOT 8
- Add README to each directory if it contains multiple examples HOT 2
- XGBoost callback problem HOT 2
- The problem of DDP training with pytorch lightning HOT 1
- NSGAIISampler number trials per generation HOT 1
- intermediate values and objective value use different metrics in `LightGBMPruningCallback` HOT 2
- Switch to NSGA sampler after initial trials with custom sampler HOT 1
- Different results in Optuna best value and re-train HOT 3
- Add comet integration example HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optuna-examples.