Firstly, thanks for your innovate and excellent work! I got an error when I try to reproduce the results of the paper (in the pretraining stage).
Would you like to help me please? Of course, I'll try to figure out and fix it.
12/11/2023 16:50:43 - WARNING - __main__ - Output directory (datasets/R2R/exprs/pretrain/test) already exists and is not empty.
12/11/2023 16:50:43 - INFO - __main__ - device: cuda n_gpu: 1, distributed training: False, 16-bits training: False
0%| | 0/200000 [00:00<?, ?it/s]Some weights of MultiStepNavCMTPreTraining were not initialized from the model checkpoint at None and are newly initialized: ['bbox_head.net.2.weight', 'span_head.net.0.bias', 'span_head.net.4.bias', 'con_projection_image.bias', 'land_att.linear_out.weight', 'bbox_head.net.4.bias', 'span_head.net.4.weight', 'con_projection_text.bias', 'span_att.linear_in.weight', 'bbox_head.net.2.bias', 'bbox_head.net.0.bias', 'span_head.net.0.weight', 'span_att.linear_out.weight', 'con_projection_text.weight', 'con_projection_image.weight', 'land_att.linear_in.weight', 'span_head.net.2.weight', 'bbox_head.net.4.weight', 'bbox_head.net.0.weight', 'span_head.net.2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
data_num: 4675 14039 121819 58065
data_num: 182945 1083659 121819 58065
data_num: 340 1021 9017 4304
data_num: 775 2325 20364 9517
12/11/2023 16:52:47 - INFO - __main__ - mlm: 1083659 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sap: 6565468 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sar: 6565468 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sprel: 6565468 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - mrc: 1083659 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - itm: 1083659 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - gel: 121819 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - mlm: 1021 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sap: 6201 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sar: 6201 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sprel: 6201 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - mrc: 1021 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - itm: 1021 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - gel: 9017 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - mlm: 2325 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sap: 13875 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sar: 13875 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - sprel: 13875 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - mrc: 2325 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - itm: 2325 samples loaded
12/11/2023 16:52:47 - INFO - __main__ - gel: 20364 samples loaded
/home/zerowing/VLN-GELA/ada_pretrain_src/data/r2r_tasks.py:595: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1699449229234/work/torch/csrc/utils/tensor_new.cpp:261.)
batch['sp_targets'] = torch.FloatTensor(batch['sp_targets'])
12/11/2023 16:52:50 - INFO - __main__ - ***** Running training with 1 GPUs *****
12/11/2023 16:52:50 - INFO - __main__ - Batch size = 64
12/11/2023 16:52:50 - INFO - __main__ - Accumulate steps = 1
12/11/2023 16:52:50 - INFO - __main__ - Num steps = 200000
Traceback (most recent call last):
File "/home/zerowing/miniforge3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zerowing/miniforge3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zerowing/.vscode-server/extensions/ms-python.python-2023.22.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/home/zerowing/.vscode-server/extensions/ms-python.python-2023.22.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/zerowing/.vscode-server/extensions/ms-python.python-2023.22.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/home/zerowing/.vscode-server/extensions/ms-python.python-2023.22.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/zerowing/.vscode-server/extensions/ms-python.python-2023.22.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/zerowing/.vscode-server/extensions/ms-python.python-2023.22.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/home/zerowing/VLN-GELA/ada_pretrain_src/main_r2r.py", line 578, in <module>
main(args)
File "/home/zerowing/VLN-GELA/ada_pretrain_src/main_r2r.py", line 249, in main
loss = model(batch, task=task, compute_loss=True)
File "/home/zerowing/miniforge3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zerowing/miniforge3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zerowing/VLN-GELA/ada_pretrain_src/model/pretrain_cmt.py", line 210, in forward
return self.forward_mrc(batch['txt_ids'], batch['txt_masks'],
File "/home/zerowing/VLN-GELA/ada_pretrain_src/model/pretrain_cmt.py", line 323, in forward_mrc
hist_mrc_targets = self._compute_masked_hidden(hist_img_probs, hist_mrc_masks)
File "/home/zerowing/VLN-GELA/ada_pretrain_src/model/pretrain_cmt.py", line 251, in _compute_masked_hidden
hidden_masked = hidden[mask].contiguous().view(-1, hidden.size(-1))
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
File "/home/zerowing/VLN-GELA/ada_pretrain_src/data/r2r_data.py", line 349, in get_image_feature
with h5py.File(self.img_ft_file, 'r') as f:
Because ''datasets/R2R/features/pth_vit_base_patch16_224_imagenet_e2e.hdf5" doesn't exist, my solution is to rename the file called "/datasets/R2R/features/pth_vit_base_patch16_224_imagenet_r2r.e2e.ft.22k.hdf5" to make sure the program could find the file.
Or just modify the 42nd line ""img_ft_file": "datasets/R2R/features/pth_vit_base_patch16_224_imagenet_e2e.hdf5"," in "ada_pretrain_src/config/pretrain_r2r.json" from "pth_vit_base_patch16_224_imagenet_e2e.hdf5" to "pth_vit_base_patch16_224_imagenet_r2r.e2e.ft.22k.hdf5".