Comments (13)
Hello, I guess the epoch 3 is due to this line
Line 162 in 92a647c
ST3D
. Meanwhile, I download the link and test it, but got:
[2021-09-07 14:06:10,520 detector3d_template.py 325 INFO] ==> Loading parameters from checkpoint ../output/model_zoo/ckpt.pth to GPU
[2021-09-07 14:06:10,713 detector3d_template.py 331 INFO] ==> Checkpoint trained from version: pcdet+0.2.0+ee0831b+pyab7b158
[2021-09-07 14:06:10,854 detector3d_template.py 350 INFO] ==> Done (loaded 189/189)
[2021-09-07 14:06:10,864 eval_utils.py 40 INFO] *************** EPOCH no_number EVALUATION *****************
You can try to rename the ckpt.pth
to checkpoint_epoch_100.pth
, I guess it will becom epoch 100 evaluation.
The st3d training takes 30 epochs. I will add it to model zoo soon.
from st3d.
Hello, I train it for around 50 epochs. I will revise the config to add this term. Thank you.
from st3d.
Hi thanks for your response.
Just to clarify 50 epochs is for the pre-training of the nuscenes-kitti secondiou_old_anchor.yaml? When I evaluated the model_zoo checkpoint.pth file it says "EPOCH 3 EVALUATION". Also out of curiosity how long does one epoch take for you and what hardware are you using for training?
According to the OpenPCDet cfg file for second/pointpillars they only train it for 20 epochs (though their model zoo checkpoint.pth in evaluation also says something vastly different).
from st3d.
Hello, you concern is make sense. we decide to use 50 epochs for nus->kitti setting (with over 20,000 frames) since we use 30 epochs for waymo->kitti setting (with over 70,000 frames). Meanwhile, may I know which checkpoint did you used to evaluate? Can you reproduce the result with the checkpoint? The epoch number of two pretrained models for secondiou of nus->kitti should be 33 and 27 separately. Maybe I upload an error checkpoint.
from st3d.
If 50 epochs is for nuscenes-kitti pre-trained, how many epochs is for the nusc-kitti st3d training afterwards? Could you also provide that checkpoint in the model zoo?
I downloaded the model ckpt.pth from your model zoo in the nuScenes -> KITTI TASK, first row in the table for SECOND-IoU ROS. I've copied the download link you provided here.
When evaluating the model, this is what I see in the logs. It says EPOCH 3 EVALUATION and hence my confusion.
Regarding reproducing from the checkpoint, that is no issue. I just can't seem to reproduce it when I train it myself to 3 epochs. However, if you're saying that you train the full 28130 nuscenes samples to 50 epochs for this same ckpt.pth then that makes a bit more sense.
from st3d.
Thanks for the clarification. Yes please, it'll help with reproducing the results if the epochs are added for all the steps. Appreciate your work.
from st3d.
Hi @darrenjkt and @jihanyang, I tried to run the training command like this:
python train.py --cfg_file cfgs/da-nuscenes-kitti_models/secondiou/secondiou_old_anchor_ros.yaml --batch_size 1 --epochs 50 --extra_tag st3d_infos
I'm using batch size = 1, since any number above 1 gives me CUDA out of memory error. Just wish to double check, will setting the batch size to 1 affect the training results? Thx.
from st3d.
May I know how much the GPU memory that it occupies when batch_size=1
?
from st3d.
Dear @jihanyang,
Thank you so much for your reply. The command I ran was PVRCNN_ST3D: python train.py --cfg_file ./cfgs/da-nuscenes-kitti_models/pvrcnn_st3d/pvrcnn_st3d.yaml --batch_size 1 --epochs 50 --extra_tag st3d_infos
And here's my nvidia-smi when the above command runs:
So the program uses around 63% of my GPU memory. The training loop can run, but I just don't know if setting batch_size=1 will affect the training results.
I'm using a single Nvidia GeForce RTX 3080 GPU on my computer.
By the way, here's the CUDA out of memory error when I set bathc_size=4:
Command:
python train.py --cfg_file ./cfgs/da-nuscenes-kitti_models/pvrcnn_st3d/pvrcnn_st3d.yaml --batch_size 4 --epochs 50 --extra_tag st3d_infos
Error (reporting required CUDA memory):
RuntimeError: CUDA out of memory. Tried to allocate 864.00 MiB (GPU 0; 9.78 GiB total capacity; 7.84 GiB already allocated; 196.12 MiB free; 7.99 GiB reserved in total by PyTorch)
from st3d.
Hello, according to the cfg :
We use batch size = 2 in defaut, and it seems that GPU memory usage is normal for pvrcnn. All experiments are finished with 1080ti with 11GB GPU memory.
from st3d.
Thank you for sharing, seems that I can only train using batch_size = 1 with my current GPU. But strangely, I come across another RecursionError when I ran PVRCNN's command. The command I ran is this:
python train.py --cfg_file ./cfgs/da-nuscenes-kitti_models/pvrcnn_st3d/pvrcnn_st3d.yaml --batch_size 1 --epochs 50 --extra_tag st3d_infos
And after 2283 out of 3712 iterations in the train loop, the program stops with this RecursionError:
RecursionError: maximum recursion depth exceeded while calling a Python object
it seems these 2 lines of code are called alternatively and indefinitely:
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/dataset.py", line 253, in prepare_data return self.__getitem__(new_index) File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/kitti/kitti_dataset.py", line 421, in __getitem__ data_dict = self.prepare_data(data_dict=input_dict)
Here's the error:
[2021-09-29 22:51:14,537 train.py 174 INFO] Start training cfgs/da-nuscenes-kitti_models/pvrcnn_st3d/pvrcnn_st3d(st3d_infos)
[2021-09-29 22:51:14,935 train_st_utils.py 103 INFO] ==> Loading pseudo labels from /home/wangweijia/Desktop/UDA/ST3D/output/cfgs/da-nuscenes-kitti_models/pvrcnn_st3d/pvrcnn_st3d/st3d_infos/ps_label/ps_label_e0.pkl
epochs: 0%| | 0/50 [00:00<?, ?it/s]
/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/models/roi_heads/pvrcnn_head.py:135: UserWarning: This overload of nonzero is deprecated: | 0/3712 [00:00<?, ?it/s]
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
dense_idx = faked_features.nonzero() # (N, 3) [x_idx, y_idx, z_idx]
generate_ps_e0: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3712/3712 [16:46<00:00, 3.69it/s, pos_ps_box=20.000(20.001), ign_ps_box=0.000(0.001)]
epochs: 0%| | 0/50 [1:25:13<?, ?it/s, st_loss=2.664(4.211), pos_ps_box=19.8, ign_ps_box=0]
Traceback (most recent call last):█████████████████████████████████████████▎ | 2283/3712 [1:07:44<25:31, 1.07s/it, total_it=2283, pos_ps_box=19, ign_ps_box=0]
File "train.py", line 205, in
main()
File "train.py", line 197, in main
ema_model=None
File "/home/wangweijia/Desktop/UDA/ST3D/tools/train_utils/train_st_utils.py", line 157, in train_model_st
dataloader_iter=dataloader_iter, ema_model=ema_model
File "/home/wangweijia/Desktop/UDA/ST3D/tools/train_utils/train_st_utils.py", line 42, in train_one_epoch_st
target_batch = next(dataloader_iter)
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RecursionError: Caught RecursionError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/torch/utils/data/utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/kitti/kitti_dataset.py", line 421, in getitem
data_dict = self.prepare_data(data_dict=input_dict)
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/dataset.py", line 253, in prepare_data
return self.getitem(new_index)
(the above 2 lines repeat for LOTS of times)
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/kitti/kitti_dataset.py", line 421, in getitem
data_dict = self.prepare_data(data_dict=input_dict)
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/dataset.py", line 248, in prepare_data
data_dict=data_dict
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/processor/data_processor.py", line 134, in forward
data_dict = cur_processor(data_dict=data_dict)
File "/home/wangweijia/Desktop/UDA/ST3D/tools/../pcdet/datasets/processor/data_processor.py", line 74, in transform_points_to_voxels
voxel_output = voxel_generator.generate(points)
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/spconv/utils/init.py", line 258, in generate
self.height_high_threshold)
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/spconv/utils/init.py", line 75, in points_to_voxel
voxelmap_shape = tuple(np.round(voxelmap_shape).astype(np.int32).tolist())
File "<array_function internals>", line 6, in round
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 3637, in round
return around(a, decimals=decimals, out=out)
File "<array_function internals>", line 6, in around
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 3262, in around
return _wrapfunc(a, 'round', decimals=decimals, out=out)
File "/home/wangweijia/anaconda3/envs/st3d/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
RecursionError: maximum recursion depth exceeded while calling a Python object
I'll continue looking into this, but please suggest if you've come across similar issues. By the way, I'm using ST3D's version with OpenPCDet 0.3.
from st3d.
It seems that some issues dicussed this situation. You can refer to those issues. I have find some errors on ST3D with openpcdet v0.3.0, so I suggest you to re-pull the repo and install it with openpcdet v0.2.0 for reproduction.
from st3d.
from st3d.
Related Issues (20)
- Question about pretrained model HOT 1
- Unstable performance HOT 4
- Source Only and SN training HOT 13
- Have you ever tried ST3D on PointPillars? HOT 2
- Can this project be implement in openpdet-v0.6 HOT 10
- AttributeError: 'DataAugmentor' object has no attribute 'random_object_scaling' HOT 9
- AttributeError: 'EasyDict' object has no attribute 'FOV_DEGREE' HOT 4
- The Car [email protected], 0.70, 0.70 IS 0
- The code hangs on during multi GPU training HOT 8
- The cost definition for SASD HOT 1
- Performance of Nuscenes2KITTI HOT 3
- RecursionError: maximum recursion depth exceeded while calling a Python object
- RecursionError: maximum recursion depth exceeded while calling a Python object HOT 2
- what is the purpose of data augmentation in the target domain? HOT 4
- How to set source batch and target batch to fetch data as the data order in ImageSets.
- The config file for multi classes training
- How to set parameters of ROS
- During pretrain stage,the problem of RecursionError: maximum recursion depth exceeded in comparison.
- the PVRCNN model for Waymo -> nuScenes TASK
- Multi GPU Problems
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from st3d.