anuragranj / cc Goto Github PK

Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

Home Page: https://research.nvidia.com/publication/2018-05_Adversarial-Collaboration-Joint

License: MIT License

Python 100.00%

adversarial adversarial-collaboration optical-flow depth-prediction camera-pose motion-segmentation unsupervised-learning deep-learning

cc's People

Contributors

Stargazers

Watchers

Forkers

denethor1997 hfxunlp saideepakb littlebylittle2 excelsiorwu wangq95 mry1990 salt-fly weili1457355863 verigle archive-git-repo chengsq lifunudt createamind zenithfang stjordanis nightmaredimple nitin-ppnp minzhangm mumujun97 ap229997 snailwalkeryc lixinhaiyk briqr xychenunc ravikt lo-chih-hsuan filippoaleotti labimage drq12138 ezorfa keyaoli10 brandleyzhou wellsred itking666 zeta1999 bigwilky philglobal hide5stm harveyliufly guogangok lxngoddess5321 shreyas23 feiwang2018 etienne-meunier gaopeng91 anto09 marcelomata amirunpri2018 coatz lvxudong-hit cchhcchh atlasgooo2 ahmedhumais minsun0824 mmbannert qianqian121 tomjf iq-scm liuqinglong110 ciliverlee yiliu-coding

cc's Issues

RuntimeError: Caught RuntimeError in replica 1 on device 1.

python3 train.py /media/disk1/xgl/cc-pil/formatted --dispnet DispResNet6 --posenet PoseNetB6 --masknet MaskNet6 --flownet Back2Future --pretrained-disp /media/disk1/xgl/cc-pil/geometry/dispnet_k.pth.tar --pretrained-pose /media/disk1/xgl/cc-pil/geometry/posenet.pth.tar --pretrained-flow /media/disk1/xgl/cc-pil/geometry/back2future.pth.tar --pretrained-mask /media/disk1/xgl/cc-pil/geometry/masknet.pth.tar -b4 -m0.1 -pf 0.5 -pc 1.0 -s0.1 -c0.3 --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997 --with-flow-gt --with-depth-gt --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet --log-terminal --name EXPERIMENT_NAME
=> will save everything to checkpoints/EXPERIMENT_NAME
=> fetching scenes in '/media/disk1/xgl/cc-pil/formatted'
588 samples found in 5 train scenes
154 samples found in 1 valid scenes
=> creating model
=> using pre-trained weights for explainabilty and pose net
=> using pre-trained weights for explainabilty and pose net
=> using pre-trained weights from /media/disk1/xgl/cc-pil/geometry/dispnet_k.pth.tar
=> using pre-trained weights for FlowNet
=> setting adam solver

N/A% (0 of 100) | | Elapsed Time: 0:00:00 ETA: --:--:--

N/A% (0 of 147) | | Elapsed Time: 0:00:00 ETA: --:--:--

N/A% (0 of 38) | | Elapsed Time: 0:00:00 ETA: --:--:--

/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:2941: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/functional.py:3384: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn("Default grid_sample and affine_grid behavior has changed "
Traceback (most recent call last):
File "train.py", line 784, in
main()
File "train.py", line 353, in main
train_loss = train(train_loader, disp_net, pose_net, mask_net, flow_net, optimizer, args.epoch_size, logger, training_writer)
File "train.py", line 463, in train
flow_fwd, flow_bwd, _ = flow_net(tgt_img_var, ref_imgs_var[1:3])
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
100% (100 of 100) |###################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04
output = module(*input, **kwargs)
File "/home/xgl/anaconda3/envs/cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
100% (147 of 147) |###################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04
File "/media/disk1/xgl/cc-pil/models/back2future.py", line 174, in forward
corr6_fwd = corr6_fwd.index_select(1,self.idx_fwd)
100% (38 of 38) |#####################################################################################################################################################| Elapsed Time: 0:00:04 Time: 0:00:04

Excuse me, can you help me solve this problem? thank you very much.

Link of Pretrained Models has been broken.

Link of Pretrained Models provided by authors does not exist. Is there anyone else who can be kind enough to provide those weights? Thanks very much!

viz3 = np.vstack((255tgt_img_viz, 255depth_viz, 255*mask_viz... in test_mask

Hello, there seems something wrong with "vstack". The codes reports error as follows:

viz3_im = Image.fromarray(viz3.astype('uint8'))
TypeError: Cannot handle this data type

Can you help me with this, thanks!

When are u planning to release the code?

Very interesting work. Eagerly waiting

about log-terminal

When I set --log-terminal=True, the following error is occur

self.epoch_bar = progressbar.ProgressBar(maxval=n_epochs, fd=Writer(self.t, (0, h-s+e)))

TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

and I find that in logger.py line16, h = self.t.height, then the h is None

Could you help me with this? Thank you very much

spatial_correlation_sampler

Hey, I got This error when I want to run code.
import spatial_correlation_sampler_backend as correlation

lib/python3.6/site-packages/spatial_correlation_sampler_backend.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZTIN3c1021AutogradMetaInterfaceE

Cuda 10, gcc 7, pytorch-1.0

Training script

Thanks for your nice work and code release.

I wonder if there is any script for the training procedure in Alg.1 of your paper? I followed your paper and wrote my own version (without ground truth) as follows. Is it correct?

python3 train.py $data_root --dispnet DispResNet6 --posenet PoseNetB6 \
  --masknet MaskNet6 --flownet Back2Future  -b 4 -pc 1.0 -pf 0.0 -m 0.0 -c 0.0 -s 0.1 \
  --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997  \
  --epochs $epochs --smoothness-type edgeaware  --fix-masknet --fix-flownet \
  --log-terminal --name $EXPERIMENT_NAME 

## Train flownet
python3 train.py $data_root --dispnet DispResNet6 --posenet PoseNetB6 \
  --masknet MaskNet6 --flownet Back2Future  -b 4 -pc 0.0 -pf 1.0 -m 0.0 -c 0.0 -s 0.1 \
  --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997  \
  --epochs $epochs --smoothness-type edgeaware  --fix-dispnet --fix-posenet  --fix-masknet \
  --log-terminal --name $EXPERIMENT_NAME  --resume

## Train masknet
python3 train.py $data_root --dispnet DispResNet6 --posenet PoseNetB6 \
  --masknet MaskNet6 --flownet Back2Future  -b 4 -pc 1.0 -pf 0.5 -m 0.0 -c 0.0 -s 0.1 \
  --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997  \
  --epochs $epochs --smoothness-type edgeaware  --fix-dispnet --fix-posenet --fix-flownet \
  --log-terminal --name $EXPERIMENT_NAME  --resume

while true
do

    ## Train dispnet & posenet
    python3 train.py $data_root --dispnet DispResNet6 --posenet PoseNetB6 \
      --masknet MaskNet6 --flownet Back2Future  -b 4 -pc 1.0 -pf 0.5 -m 0.05 -c 0.0 -s 0.1 \
      --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997  \
      --epochs $epochs --smoothness-type edgeaware  --fix-masknet --fix-flownet \
      --log-terminal --name $EXPERIMENT_NAME  --resume

    ## Train flownet
    python3 train.py $data_root --dispnet DispResNet6 --posenet PoseNetB6 \
      --masknet MaskNet6 --flownet Back2Future  -b 4 -pc 0.0 -pf 1.0 -m 0.005 -c 0.0 -s 0.1 \
      --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997  \
      --epochs $epochs --smoothness-type edgeaware  --fix-dispnet --fix-posenet  --fix-masknet \
      --log-terminal --name $EXPERIMENT_NAME  --resume

    ## Train masknet
    python3 train.py $data_root --dispnet DispResNet6 --posenet PoseNetB6 \
      --masknet MaskNet6 --flownet Back2Future  -b 4 -pc 1.0 -pf 0.5 -m 0.005 -c 0.3 -s 0.1 \
      --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997  \
      --epochs $epochs --smoothness-type edgeaware  --fix-dispnet --fix-posenet --fix-flownet \
      --log-terminal --name $EXPERIMENT_NAME  --resume

done

Besides, I found that if --fix-flownet is not given in the script, then there will be an error. I wonder if it is caused by the flownet model.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Details about the training algorithm

Hi @anuragranj, I should say that your work is impressive!. I would like to implement your methods but I am confused with some details, and I will appreciate if you can give me some insights.

About the training algorithm, Are the first three initializations of pose/depth, flow, and motion are just one iteration, or several iterations of gradient descent?. On each iteration of the main loop, does each step (collaboration and competition) run 100K iterations? (I mean: 100K compet, 100K collab, 100K compet and so on.)

Thanks in advance,

run_inference.py pretrained weight

When I want to run cc on my own dataset using run_inference.py I got following error.

python run_inference.py --output-disp --pretrained ./downloads/geometry/dispnet_k.pth.tar --dataset-list ./output/test_files.txt --dataset-dir ./output/test_data --output-dir ./output/output --img-exts png
Traceback (most recent call last):
File "run_inference.py", line 77, in
main()
File "run_inference.py", line 37, in main
disp_net.load_state_dict(weights['state_dict'])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DispNetS:
Missing key(s) in state_dict: "conv2.0.weight", "conv2.0.bias", “….

—————
where --pretrained “pretrained DispNet” weights are downloaded from:
• DispNet, PoseNet, MaskNet and FlowNet in joint unsupervised learning of depth, camera motion, optical flow and motion segmentation.

Image size used in training and testing

Hi, thanks for publishing the codes!

I have a question, when you train the network, it seems that you use the small image path? Like 256832. But in the testing, do you use the original image with larger size (around 3201240)?

If not, how do you handle the image size? Thanks!

About correlation operation。

cc/models/back2future.py

Line 174 in 2b4e362

corr6_fwd = corr6_fwd.index_select(1,self.idx_fwd)

Thanks for your sharing。Can you explain why use index_select function here？
The index_select remap the costvolume according to the idx_fwd 。This remap process is better for optical flow learning？

Looking forward to your reply。

About Rigidity masks

Hi @anuragranj !

I am training the model using all the pre-trained models that you published online. However, I do not get proper rigidity masks. Do you know why this is happening? As I understand, Rigidity masks are the subtraction of rigid flow with flow generated by the flownet with some threshold(I havent chaned the code at all). Below is the picture:

Also, Could you share the optimizer checkpoint at the time of your training, so that I could visualize all the results at once?

flownetC6 pretrained model

Hi, thank you for sharing the code. I downloaded your pretrained models. It seems there's no pretrained model for flownetC6. Could you please share it?

Regarding training loop.

Hi @anuragranj !

It is not clear from the train.py that how you were able to train like the training algorithm mentioned in the paper. How did you do that?

Thankyou!

Runing test_flow on a real-world video

Hi,

Is there any demo script for producing optical flow on a test video? Specifically, I am interested in running test_flow.py on a video but the current code reads data (tgt_img, ref_imgs, intrinsics, intrinsics_inv) from a validation dataset loader. How do I extract these information from a test video?

Thanks

Will this code still be available soon?

Such a great work I think, but I still need real code to understand it better. Hope you guys can make it public soon. 😭

About the output of the posenet

hey, I traind all nets accroding to the readme file.When I tested posenet, I found that the camera pose in output was in 6DoF format and then converted to a 3*4 transformation matrix.

In a 3*4 transformation matrix, we usually think that the last column describes the translational motion of the camera.But when the test datas are pictures at the corner of the street,I feel that the network is not performing well when describing the translation.

For example,I chose five images as input .These images are all from the KITTI dataset.They described a street corner. In the transformation matrix of the output, three variables describing the translation, I think at least two variables should have a large change, however, there is only one. This obviously does not describe the positional change of the camera during the turning of the vehicle. Detailed data are attached.

Where did I go wrong?

Thanks.
test_data.zip

Which weights to load

Hi, I have two questions about the training details.

Which weights do you use when starting a new cc training phase: the "best" weights selected through validation during the training of last phase or the weights saved in the end of last phase.
When training depth&pose in competition step, weights λ is set to (1.0, 0.5, 0.05, 0). Since flownet and masknet are fixed, why set the λ flowloss to 0.5 and λ exploss 0.05? (It doesn't make much difference anyway.)

Thank you.

Regarding cityscapes dataset.

Hi @anuragranj !

The cityscapes dataset is about 324GB, which means, in total I would need 700GB of disk space. Is there a efficient way to download the dataset?

Thankyou!

Versions of torch and torchvision in the experiment implementation

Hi anuragranj,

The requirements.txt of your code denotes the necessary packages for your code's implementation. However I encoutered problems when installing the spatial-correlation-sampler, which may be a result of version conflict of spatial-correlation-sampler v0.2.1 and torch v0.4.1.

So I wonder your package versions of your spatial-correlation-sampler, torch and torchvision. I want to specify them to successfully install them.

Thanks!

Code does not reproduce results from Figure 4

Dear Anurag,

I ran test_flow.py in an effort to reproduce the results shown in Figure 4 of the paper. However, my qualitative results differed quite a bit from those reported in that paper.

Comparing Figure 4 from the paper with my results, you immediately see that the soft consensus mask has opposite contrast from that shown in the paper. (The paper says that high values of m indicate static scene pixels.) It would not be a problem if the direction of the contrast were just flipped. But even when assuming flipped contrasts, the comparison still puzzles me.

In the left-most one of my examples, it seems like there is some kind of saturation effect (or ceiling/floor effect), which produces a white rim around the image, especially at the bottom and the sides. I presume that this falsely indicates that these peripheral pixels are nonrigid. You can also see this to some extent in the original figure but it is not as strong. Consequently, the model predicts large patches of nonrigid motion: train tracks and trees on the left and the grass on the right. The example in the middle shows a similar problem: There are quite large white areas where no black is seen in the original. This may explain why the model predicts too much nonrigid motion on the right side where there is just grass under shadow. The fourth example from the left also shows too much nonrigid motion on the right side where there is only a building. Maybe the motion segmentation does not work properly? Just guessing...

I ran the code as follows:

 ipython --pdb -- test_flow.py \
     --pretrained-disp ../../cc-models/geometry/dispnet_k.pth.tar \
     --pretrained-pose ../../cc-models/geometry/posenet.pth.tar \
     --pretrained-mask ../../cc-models/geometry/masknet.pth.tar \
     --pretrained-flow ../../cc-models/geometry/back2future.pth.tar \
     --kitti-dir ../../stimuli/kitti2015 \
     --output-dir ../../results/competitive_collaboration/kitti2015_test_flow_demo/thresh_1e-2

Cheers,
Michael

Large EPE for optical flow evaluation

Hey Anurag, thanks for your wonderful work!

I got the following results when running test_flow.py:

	epe_total	epe_sp	epe_mv	Fl	epe_total_gt_mask	epe_sp_gt_mask	epe_mv_gt_mask	Fl_gt_mask
Errors	81.6802	6.4460	174.5231	0.6040	36.2934	6.3183	183.9449	0.4385

This seems pretty large, especially the EPE.

I run the evaluation in the same way as mentioned in the readme, with the following script:

model_dir=/path/to/cc_models/geometry
cd /path/to/cc/
python test_flow.py \
    --pretrained-disp="$model_dir/dispnet_cs_k.pth.tar" \
    --pretrained-pose="$model_dir/posenet.pth.tar" \
    --pretrained-mask="$model_dir/masknet.pth.tar" \
    --pretrained-flow="$model_dir/back2future.pth.tar" \
    --kitti-dir=/path/to/kitti_data/

All other settings are set as default. My PyTorch and cuda version are 1.4.0 and 10.0.

Is there something wrong? Or I miss something?

Thanks in advance!
Best,

Unable to train Dispnet and Posenet.

Hi @anuragranj !

After freshly checking out your code, I was trying to train dispnet and posenet with the followng command:
python3 train.py /cdtemp/ezorfa/gateway30/datasets/kitti/kitti_cc -- dispnet DispResNet6 --posenet PoseNetB6 --masknet MaskNet6 --flownet Back2Future -b 4 -pc 1.0 -pf 0.0 -c 0.0 -s 0.1 --log-output -f 50 --nlevels 6 --lr 1e-4 -wssim 0.997 --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet --log-terminal --name cc_depth --epoch-size 1000

But even after 30K iterations there seem to be no hint of training:

Previously, I also trained for over 200K, but I dint see any result or any hint of training. It just stays white image all the time. Could you suggest me or hint at what I might be doing wrong??

Appreciate your help,
Thankyou!

--spatial-normalize

Could I get to know that if you use the --spatial-normalize ? How should I use it and set the hyperparameters such as -s? I added the --spatial-normalize to train the depth and pose networks, but the results are very poor.
However, I found that other paper used it and had a performance improvement.
I found the method --spatial-normalize comes from this paper. Thank you.

C. Wang, J. M. Buenaposada, R. Zhu and S. Lucey, “Learning Depth from Monocular Videos using Direct Methods,” in Conference on Computer Vision and Pattern Recognition, 2018, pp. 2022-2030.

the output images' format should be changed from CHW to HWC

Tensors in pytorch are formatted in CHW(BCHW) by default, so if you wanna output the results of depth,flow and mask, you should change them into HWC format.
such as:
test_flow.py line 180

row1_viz_im = Image.fromarray((255*row1_viz).astype('uint8'))
row2_viz_im = Image.fromarray((row2_viz).astype('uint8'))

this will raise TypeError("Cannot handle this data type")
you should transpose/permute the format into HWC like this below:

row1_viz_im = Image.fromarray((255*row1_viz).astype('uint8').transpose((1,2,0)))
row2_viz_im = Image.fromarray((row2_viz).astype('uint8').transpose((1,2,0)))

How to train my own dataset based on your pre-training model?

I have an indoor stereo image dataset with a total of 8,000 images. There is a salient mobile robot in the dataset scene that has been moving within the camera field. I have resized my dataset image to the size required by your code. I am training my model with the parameters in your open source code based on your pre-training model. I want to get a model that can perform good indoor monocular depth estimation and can segment the moving robot. But the model obtained is very poor and I can't see anything visually. Do you have any suggestions?