Intro
- At first, I am not certain whether this modes selection (using raytune&wandb simultaneously) is not implemented yet or not
I'm really sorry if I misunderstood the progress of this work
- Secondly, it probably occurred due to the specific environment of my docker container
Circumstance
- python main.py model=tinynet_e epochs=3 num_samples=2 Dataset.train_size=0.2 logging=wandb mode=raytune
- Branch : develop (commit 6b6f63d)
- Code change : No
Error
mode=raytune
working dir: /home/CheXpert_code/kdg/CXRAIL-dev
[2022-12-21 01:57:23,674][ray.tune.tune][INFO] - Initializing Ray automatically.For cluster usage or custom Ray initialization, call ray.init(...)
before tune.run
.
2022-12-21 01:57:26,066 INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Error executing job with overrides: ['model=tinynet_e', 'epochs=3', 'num_samples=2', 'Dataset.train_size=0.2', 'logging=wandb', 'mode=raytune']
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/ray/tune/tuner.py", line 272, in fit
return self._local_tuner.fit()
File "/usr/local/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py", line 420, in fit
analysis = self._fit_internal(trainable, param_space)
File "/usr/local/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py", line 532, in _fit_internal
analysis = run(
File "/usr/local/lib/python3.8/site-packages/ray/tune/tune.py", line 626, in run
callbacks = _create_default_callbacks(
File "/usr/local/lib/python3.8/site-packages/ray/tune/utils/callback.py", line 105, in _create_default_callbacks
callbacks.append(TBXLoggerCallback())
File "/usr/local/lib/python3.8/site-packages/ray/tune/logger/tensorboardx.py", line 165, in init
from tensorboardX import SummaryWriter
File "/usr/local/lib/python3.8/site-packages/tensorboardX/init.py", line 5, in
from .torchvis import TorchVis
File "/usr/local/lib/python3.8/site-packages/tensorboardX/torchvis.py", line 10, in
from .writer import SummaryWriter
File "/usr/local/lib/python3.8/site-packages/tensorboardX/writer.py", line 16, in
from .comet_utils import CometLogger
File "/usr/local/lib/python3.8/site-packages/tensorboardX/comet_utils.py", line 7, in
from .summary import _clean_tag
File "/usr/local/lib/python3.8/site-packages/tensorboardX/summary.py", line 12, in
from .proto.summary_pb2 import Summary
File "/usr/local/lib/python3.8/site-packages/tensorboardX/proto/summary_pb2.py", line 16, in
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/usr/local/lib/python3.8/site-packages/tensorboardX/proto/tensor_pb2.py", line 16, in
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/usr/local/lib/python3.8/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/usr/local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 560, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
- Downgrade the protobuf package to 3.20.x or lower.
- Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 94, in main
raytune(hydra_cfg)
File "main.py", line 61, in raytune
analysis = tuner.fit()
File "/usr/local/lib/python3.8/site-packages/ray/tune/tuner.py", line 274, in fit
raise TuneError(
ray.tune.error.TuneError: The Ray Tune run failed. Please inspect the previous error messages for a cause. After fixing the issue, you can restart the run from scratch or continue this run. To continue this run, you can use tuner = Tuner.restore("/home/CheXpert_code/kdg/CXRAIL-dev/logs/2022-12-21_01-57-23/Dataset.train_size=0.2,epochs=3,logging=wandb,mode=raytune,model=tinynet_e,num_samples=2/trainval_2022-12-21_01-57-23")
.
Suspected reason
- Python version and dependency conflict
Related to