marvinteichmann / multinet Goto Github PK

Real-time Joint Semantic Reasoning for Autonomous Driving

License: MIT License

Python 100.00%

autonomous-driving computer-vision deep-learning real-time tensorflow

multinet's Issues

demo getting tensorflow error processing weights

Getting an issue trying to read the vgg16 weights. I am running cuda-8 tensorflow 10 on ubuntu 16 with python3. Appear to be version conflict with saved npy format.

Traceback (most recent call last):
  File "demo.py", line 413, in <module>
    tf.app.run()
  File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "demo.py", line 311, in main
    load_out = load_united_model(logdir)
  File "demo.py", line 273, in load_united_model
    logits = modules['arch'].inference(hypes, image, train=False)
  File "RUNS/MultiNet_ICCV/detection/architecture.py", line 28, in inference
    random_init_fc8=True)
  File "/home/richard/opencv3-p3-code/not_working_yet/MultiNet/submodules/KittiClass/incl/tensorflow_fcn/fcn8_vgg.py", line 59, in build
    red, green, blue = tf.split(rgb, 3, 3)
  File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 918, in split
    name=name)
  File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2239, in _split
    num_split=num_split, name=name)
  File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 463, in apply_op
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.

Resource exhausted/ran out of memory trying to allocate while training

Hi, I am trying to run the train.py code as explained in README file. I am getting the following error.

$ python train.py --hypes hypes/multinet2.json
[I have added parts of the output log relevant here]
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.2405
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.70GiB

W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 245.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 278.55MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 4.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.40GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5804326912
InUse: 5803421440
MaxInUse: 5803589120
NumAllocs: 719
MaxAllocSize: 1270318592

_W tensorflow/core/common_runtime/bfc_allocator.cc:274]
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 2.25MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[256,256,3,3]
W tensorflow/core/kernels/queue_base.cc:294] _0_Queues_segmentation/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] 1_Queues_detection/fifo_queue: Skipping cancelled enqueue attempt with queue not closed

I thought I need to adjust the Batch_Size value in hypes/multinet2.json, but even at a value of 1 the training fails.
What could be the reason?

No module named 'annolist'

Hello,

After successful completion of all the described steps, tried running the demo.py But it seems, "annolist" is not found. Could you please let me know how do I fix it.

Downloading MultiNet_ICCV.zip 100.0%
2018-04-18 18:14:20,576 INFO Extracting MultiNet_pretrained.zip
2018-04-18 18:15:13,790 INFO Loading model from: RUNS/MultiNet_ICCV
2018-04-18 18:15:13,791 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/hypes.json' mode='r' encoding='UTF-8'>
2018-04-18 18:15:13,793 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/detection/hypes.json' mode='r' encoding='UTF-8'>
Traceback (most recent call last):
File "demo.py", line 426, in
tf.app.run()
File "/home/aev21/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
sys.exit(main(argv))
File "demo.py", line 313, in main
load_out = load_united_model(logdir)
File "demo.py", line 254, in load_united_model
postfix=model)
File "incl/tensorvision/utils.py", line 213, in load_modules_from_logdir
data_input = imp.load_source("input%s" % postfix, f)
File "/home/aev21/anaconda3/lib/python3.6/imp.py", line 172, in load_source
module = _load(spec)
File "", line 684, in _load
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "RUNS/MultiNet_ICCV/detection/data_input.py", line 22, in
from utils.data_utils import (annotation_jitter, annotation_to_h5)
File "PycharmProjects/MultiNet/submodules/KittiBox/incl/utils/data_utils.py", line 11, in
import annolist.AnnotationLib as al
ModuleNotFouldError: No module named 'annolist'

How to train with other models, like resNet, YOLO, SqueezeNet?

A question

I have seen your MultiNet paper V2 is updated in 8 May 2018. But the code is three years ago and the recent update is one year ago. If the update in your paper are all in the codes.Thank you!

I do not understand the number in paper.(Convolution and concatenated)

When I read a thesis, I do not understand it.

First, for classification, in the initial paper of 2016, 1x1 convolution was used, but in the 2018 paper (Figure2), 3x3 convolution was used.
Are there any spacial reasons for changing numbers? In addition, at the top of page 4, it is more confusing to say that 'we first apply a 1x1 convolution with 30 channels'. I wonder what number is correct.

Second, I want to see why the concated fetures in the Detection Decoder in Figure2 of the 2018 paper are expressed as 39x12x1526.
According to my calculations, ROI Aligh 128 channels is concatenated with 128*8=1024.(+ I wonder why I see 8 instead of 9, except for the existing results in the middle), 500 channels in the Bottleneck block, and finally Prediction 6 channels are concatenated, so the final result is supposed to be 1024+500+6=1530. I will be very grateful if you let me know if I have the wrong part. I have been thinking about this number for a long time, but there is no other conclusion.

I look forward to your reply. Thank you.

demo.py doesn't work

Hello,

After I installed all required modules, and followed the steps you had specified; I got the following exception. Would you mind helping me to run? Thanks.

`(tensorflow)ilithefallen@ubuntu:~/Documents/phdThesis/libs/MultiNet$ python demo.py --gpus 0 --input data/demo/um_000005.png
2017-05-03 18:36:22,177 INFO No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/ilithefallen/tv-plugins'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_SHOW' found. Set to '50'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_EVAL' found. Set to '250'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_MAX_KEEP' found. Set to '10'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.
2017-05-03 18:36:22,179 INFO GPUs are set to: 0
2017-05-03 18:36:22,179 INFO Download URL: ftp://mi.eng.cam.ac.uk/pub/mttt2/models/MultiNet_ICCV.zip
2017-05-03 18:36:22,179 INFO Download DIR: RUNS

Downloading MultiNet_ICCV.zip 100.0%
2017-05-03 18:55:18,499 INFO Extracting MultiNet_pretrained.zip
2017-05-03 18:55:31,855 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/hypes.json' mode='r' encoding='UTF-8'>
2017-05-03 18:55:31,855 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/road/hypes.json' mode='r' encoding='UTF-8'>
2017-05-03 18:55:31,864 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/detection/hypes.json' mode='r' encoding='UTF-8'>
Traceback (most recent call last):
File "demo.py", line 413, in
tf.app.run()
File "/home/ilithefallen/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(sys.argv[:1] + flags_passthrough))
File "demo.py", line 311, in main
load_out = load_united_model(logdir)
File "demo.py", line 254, in load_united_model
postfix=model)
File "incl/tensorvision/utils.py", line 213, in load_modules_from_logdir
data_input = imp.load_source("input%s" % postfix, f)
File "/home/ilithefallen/tensorflow/lib/python3.4/imp.py", line 171, in load_source
module = methods.load()
File "", line 1220, in load
File "", line 1200, in _load_unlocked
File "", line 1129, in _exec
File "", line 1471, in exec_module
File "", line 321, in _call_with_frames_removed
File "RUNS/MultiNet_ICCV/detection/data_input.py", line 22, in
from utils.data_utils import (annotation_jitter, annotation_to_h5)
File "/home/ilithefallen/Documents/phdThesis/libs/MultiNet/submodules/KittiBox/incl/utils/data_utils.py", line 11, in
import annolist.AnnotationLib as al
ImportError: No module named 'annolist'`

ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [7,7,512,4096] and type float [[Node: fc6/weights/Adam/Initializer/zeros = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [7,7,512,4 096] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

how can i solve the problem
ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [7,7,512,4096] and type float
[[Node: fc6/weights/Adam/Initializer/zeros = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [7,7,512,4 096] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

About download_data.py

I use
python download_data.py --kitti_url http://www.cvlibs.net/download.php?file=data_road.zip
or
python download_data.py --kitti_url http://kitti.is.tue.mpg.de/kitti/data_road.zip
(http://kitti.is.tue.mpg.de/kitti/data_road.zip is the url i get after email verified)
Then i get the error
2017-03-11 22:30:28,331 WARNING File: DATA/vgg16.npy exists.
2017-03-11 22:30:28,331 WARNING Please delete to redownload VGG weights.
2017-03-11 22:30:28,331 ERROR Wrong url.
2017-03-11 22:30:28,331 ERROR Please visit: http://www.cvlibs.net/download.php?file=data_road.zip
2017-03-11 22:30:28,331 ERROR and request Kitti Download link.
2017-03-11 22:30:28,331 ERROR Rerun scipt using'python download_data.py --kitti_url [url]'

i check the source code in download_data.py, the right url is ended with 'kitti/data_object_image_2.zip'. It is a little confused. How can i get the right url ?

About download_data.py

I use
python download_data.py --kitti_url http://www.cvlibs.net/download.php?file=data_road.zip
(http://www.cvlibs.net/download.php?file=data_road.zip is the url i get after email verified)
Then i get the error
2018-06-26 10:24:35,922 ERROR Wrong url.
2018-06-26 10:24:35,923 ERROR Please visit: http://www.cvlibs.net/download.php?file=data_road.zip
2018-06-26 10:24:35,923 ERROR and request Kitti Download link.
2018-06-26 10:24:35,923 ERROR You will receive an Email with the kitti download url
2018-06-26 10:24:35,923 ERROR Rerun and enter the received [url] using'python download_data.py --kitti_url [url]'

error running in train_utils.py and work around

error on running.

python demo.py --gpus 0 --input data/demo/um_000005.png

*********************************************************************train_utils.py: commented out if use_stitching
  File "demo.py", line 413, in <module>
    tf.app.run()
  File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "demo.py", line 364, in main
    min_conf=0.50, tau=subhypes['detection']['tau'])
  File "/home/richard/opencv3-p3-code/not_working_yet/MultiNet/submodules/KittiBox/incl/utils/train_utils.py", line 103, in add_rectangles
    from stitch_wrapper import stitch_rects
ImportError: ./submodules/KittiBox/submodules/utils/stitch_wrapper.so: undefined symbol: _Py_ZeroStruct


got around this by commenting out lines in train_utils.py

    #if use_stitching:
    #    from stitch_wrapper import stitch_rects
    #    acc_rects = stitch_rects(all_rects, tau)
    #else:
    acc_rects = all_rects_r

network weighting

Very good work Marvin,

If possible could you please explain, the training weighting for training logic?? In the paper its hard to follow what exactly you did.

I see your weights in the .json files and the subgraph training selection is based on this logic train.py. The weight values are now [1, 0] seems to indicate you only uses the subgraph[0] , so the segmentation graph only. The training results show the detection subgraph working, As i see a car being detected in the bounding box. While the loss function seems to be using both graphs regardless of these weights.

Could you please give some insight, your weighting technique in the following ?

**File: train.py
lt is alway 0 if using multinet2.json so this is fully weighted on the "segmentation" graph then?
line 202: weights = meta_hypes['selection']['weights']
line 229: sess.run([subgraph[model]['train_op']], feed_dict=feed_dict)

File:multinet2.json
"weights": [1, 0] and older commits had "weights": [1, 2]

 "model_list": ["segmentation", "detection"],
    "selection": {
        "random": false,
        "use_weights": true,
        "weights": [1, 0]
    }
```,

request for updating the codes

The new paper has received the best prize. If you can update the new codes.Thank you.

the detection time is too long.

As the paper says, MultiNet could do the detection within 100ms. However, the time used is 5 seconds long. How could it be?

how to use the KittBox sub for multi-classes detection

 I change the  KittBox hypes and training data with my own, aslo change the "kitti_input.py",

it can train well, but when use demo.py for detection, the boxes go through the function
"acc_rects = stitch_rects(all_rects, tau)", and just car boxes can remain, I don't know why,
can you give some advises, thank you!!

How to run multiNet on different size images

Great project -- Thanks for sharing! I am trying to apply multiNet on different size images (e.g., 940x640), any suggestion on how to proceed? Do you have some reference codes or work to recommend?

Number of ways to split should evenly divide the split dimension, but got split_dim 3 (size = 4) and num_split 3

when I run demo.py, it crashes at "output = sess.run([softmax], feed_dict=feed)" with the error called"Number of ways to split should evenly divide the split dimension, but got split_dim 3 (size = 4) and num_split 3

 [[Node: Validation/Processing/split = Split[T=DT_FLOAT, num_split=3, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Validation/Processing/split/split_dim, ExpandDims)]]"

Camara Input?

Hi Marvin,

Maybe I don't understand the documentation correctly, I successfully trained and evaluated the demo image, and as well the multinet2 demo segmentation. Would it be possible to use the same implementation but instead of using an image as input could I use the camera as input for a real-time application?

How to run the code on a video file

Hi,
Is it possible to use a video file as input for demo.py??

Slow when download MultiNet_ICCV.zip

Hi,
I tried to run the demo.py with the given code in README.md, however, the download phase is very slow and seems to freeze. Same issue happen when I try to run the KittiSeg repository. May I know is there anyway to speed up the download speed.

AssertionError assert(image_height >= shape[0]) in demo.py

@MarvinTeichmann Tried multiple images with different file types but only the demo images work. All other images get the following error:

2018-08-12 18:54:03,523 INFO /data/cvfs/mttt2/RUNS/ICCV/MultiNet/multinet_trained/model.ckpt-99999
2018-08-12 18:54:03,524 INFO Restoring parameters from RUNS/MultiNet_ICCV/model.ckpt-99999
2018-08-12 18:54:03.683228: W tensorflow/core/framework/allocator.cc:108] Allocation of 411041792 exceeds 10% of system memory.
2018-08-12 18:54:04.106021: W tensorflow/core/framework/allocator.cc:108] Allocation of 411041792 exceeds 10% of system memory.
Traceback (most recent call last):
  File "demo.py", line 426, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "demo.py", line 340, in main
    assert(image_height >= shape[0])
AssertionError

How to evaluate my own trained model?

Hi,

I've trained a MultiNet2 (segmentation and detection) model. I was wondering how I can evaluate it on the validation set?

There is no evaluate.py in the root folder. I tried to run the evaluate.py in the submodule folder but didn't succeed. Perhaps I missed something. I'd appreciate it if you could give me some instructions. Thanks.

how to trian my dataset on Multinet3

i want to achieve the joint detection and segmentation but i do not know how to make the train dataset？can you tell me the the format of the data?

Train MultiNet2

After reading README.md in MultiNet, I have a question about training my own data. As the tutorial said, we train a new MultiNet with specifying 'multinet2.json'. Does it means that we just use trained KittiSeg and KittiBox models and recombine them. The training data of multinet2 comes from two separated model (KittiSeg and KittiBox), which means two separated training data set. Finally, do we need to rearrange these two training data set to fit in the multinet2 model?

Crash on train.py

The script train.py crashes in line 229 sess.run([subgraph[model]['train_op']], feed_dict=feed_dict).

The exact message is

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Process finished with exit code 134

I ran the script by
python2 train.py --hypes hypes/multinet2.json

train multinet2.json out of memory seriously

@MarvinTeichmann
I've tried to train multinet2.json with one gpu(1080Ti), and it seems collapse, have you use multiple GPUs?
the information as follows:
2018-12-13 17:56:15.831145: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.64GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-13 17:56:15.869555: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.73GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-13 17:56:16,014 INFO Segmentation Loss was used.

The ftp link to download the vgg16.npy file isn't accessible

The FTP link seems to be broken at this time. Where else can we download it from? The only other download link I found is a mega.nz link, which our firewall won't allow me to access.

How does Detection/Decoder module work?

Hi,
I am totally confused about the working of the Detection Decoder

In the output of the module what are the 2 labels/classes?
According to the paper, the 1st and 2nd channel of the prediction output gives the confidence that an object of interest is present at a particular location.
What are the objects of interest: car/road?
Fig.3 shows 3 crossed gray cells: are those the cells in 'I don't care area'.
is it expected that the top of the image (the sky) is not considered as "I don't care area".

The last 4 channels are the bounding box coordinate ( x0, y0, h, w).

are those coordinates at the scale of the input image dimension, or at the scale of the (39x12) feature map?
What is "delta prediction" (the residue)? Is it the correction to be applied to the coarse estimate of the bounding box (from the prediction)
what's the difference between the output of the Segmentation decoder and the Detection Decoder in terms of output: I understand that the Segmentation outputs a mask related to the 2 classes. But I would thought that the Detection Decoder output the coordinate of the bounding boxes.

Thank you

throwing an instance of 'std::bad_alloc' when restoring parameters from RUNS/MultiNet_ICCV/model.ckpt-99999

Throwing an error when read the model.ckpt-99999。This is why？

Does the codes update to v2 now?

Hi, really hoping the v2 version codes, the paper has been updated into v2 version, when will the codes release anyway?

Using multiple GPU for training

@MarvinTeichmann
I have three GPUs in my system. However, the program is using only one of them. Even though it greedily captures all the memory, it uses the computational power of a single GPU.

Is there a fix for this?

Where is the location of "DATA" directory?

I have try the demo and it stuck because of missing DATA/weight/vgg16.npy then I just download from another source. I just create DATA directory inside of Multinet (root) directory, so it becomes Multinet/DATA/weights/vgg16.npy but still it not found. So I just choose download it using your download script. Now I am in managing my directory space, so, where is DATA location? After finish download wherecan I find DATA ?

error in Initialize all submodules

ubuntu@ip-172-30-5-78:~/zdx/MultiNet$ git submodule update --init --recursive
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/KittiBox/submodules/TensorVision'
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/KittiClass/submodules/TensorVision'
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/KittiSeg/submodules/TensorVision'
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Failed to recurse into submodule path 'submodules/KittiBox'
Failed to recurse into submodule path 'submodules/KittiClass'
Failed to recurse into submodule path 'submodules/KittiSeg'
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/TensorVision'

Some submodules issues

Hi @MarvinTeichmann

Thanks for releasing these codes again!

I found some issues about submodules and report here:

I think you may forget to submit the ref of the submodules, especially TensorVision
The TensorVision's version is not right, so I put kittiseg's tensorvision into this projects
Now the train.py could run and I will check the final results.

Thanks again~

Unable to import stitch_wrapper

Hi,

I am using python 2.7.13 with pyenv (https://github.com/pyenv/pyenv)

when I run demo.py

python demo.py --gpus 1 --input data/demo/um_000005.png

the exceptions

  File "demo.py", line 426, in <module>
    tf.app.run()
  File "/home/tumh/.pyenv/versions/multinet/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "demo.py", line 366, in main
    min_conf=0.50, tau=subhypes['detection']['tau'])
  File "/home/tumh/MultiNet/submodules/KittiBox/incl/utils/train_utils.py", line 103, in add_rectangles
    from stitch_wrapper import stitch_rects
ImportError: /home/tumh/MultiNet/submodules/KittiBox/incl/utils/stitch_wrapper.so: undefined symbol: PyFPE_jbuf

Any idea?

thanks!

When I train the multinet2,there is a problem

Hi @MarvinTeichmann

Thanks for releasing these codes again!

When I train multinet2 using kitti data by myself ,but there is a problem:

`W tensorflow/core/kernels/queue_base.cc:294] _1_Queues_detection/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 608, in main
tv_sess, start_step=start_step)
File "train.py", line 229, in run_united_training
sess.run([subgraph[model]['train_op']], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 786, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 994, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1044, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1064, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,512,3,3]
[[Node: conv4_2_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](conv4_1_1/Relu, conv4_2/filter/read)]]
[[Node: training/Adam_1/update/_72 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7851_training/Adam_1/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op u'conv4_2_1/Conv2D', defined at:
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 595, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 535, in build_united_model
first_iter)
File "train.py", line 130, in build_training_graph
logits = encoder.inference(hypes, image, train=True)
File "/home/nextcar/MultiNet/submodules/KittiBox/hypes/../encoder/vgg.py", line 28, in inference
random_init_fc8=True)
File "/home/nextcar/MultiNet/incl/tensorflow_fcn/fcn8_vgg.py", line 88, in build
self.conv4_2 = self._conv_layer(self.conv4_1, "conv4_2")
File "/home/nextcar/MultiNet/incl/tensorflow_fcn/fcn8_vgg.py", line 155, in _conv_layer
conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 416, in conv2d
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,512,3,3]
[[Node: conv4_2_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](conv4_1_1/Relu, conv4_2/filter/read)]]
[[Node: training/Adam_1/update/_72 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7851_training/Adam_1/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/nextcar/MultiNet/submodules/KittiSeg/hypes/../inputs/kitti_seg_input.py", line 351, in enqueue_loop
sess.run(enqueue_op, feed_dict=make_feed(d))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 786, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 994, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1044, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1064, in _do_call
raise type(e)(node_def, op, message)
CancelledError: Enqueue operation was cancelled
[[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_segmentation/fifo_queue, _recv_Placeholder_2_0, _recv_Placeholder_3_0)]]

Caused by op u'fifo_queue_enqueue', defined at:
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 595, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 564, in build_united_model
'train', sess)
File "/home/nextcar/MultiNet/submodules/KittiSeg/hypes/../inputs/kitti_seg_input.py", line 353, in start_enqueuing_threads
enqueue_op = q.enqueue((image_pl, label_pl))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 322, in enqueue
self._queue_ref, vals, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1587, in _queue_enqueue_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_segmentation/fifo_queue, _recv_Placeholder_2_0, _recv_Placeholder_3_0)]]

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/nextcar/MultiNet/submodules/KittiBox/hypes/../inputs/kitti_input.py", line 230, in thread_loop
sess.run(enqueue_op, feed_dict=make_feed(d))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 786, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 994, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1044, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1064, in _do_call
raise type(e)(node_def, op, message)
CancelledError: Enqueue operation was cancelled
[[Node: fifo_queue_enqueue_1 = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_detection/fifo_queue, _recv_Placeholder_4_0, _recv_Placeholder_5_0, _recv_Placeholder_6_0, _recv_Placeholder_7_0)]]

Caused by op u'fifo_queue_enqueue_1', defined at:
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 595, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 564, in build_united_model
'train', sess)
File "/home/nextcar/MultiNet/submodules/KittiBox/hypes/../inputs/kitti_input.py", line 220, in start_enqueuing_threads
enqueue_op = q.enqueue((x_in, confs_in, boxes_in, mask_in))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 322, in enqueue
self._queue_ref, vals, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1587, in _queue_enqueue_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: fifo_queue_enqueue_1 = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_detection/fifo_queue, _recv_Placeholder_4_0, _recv_Placeholder_5_0, _recv_Placeholder_6_0, _recv_Placeholder_7_0)]]`

I don't know how to solve it. Could you tell me the reason is?

Thank you!

terminate called after throwing an instance of 'std::bad_alloc'

when i run the train.py
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
i run the code on the cpu　ubuntu18.04
can you help me solve the problem

subprocess.CalledProcessError: Command returned non-zero exit status -11

some errors when do test

@MarvinTeichmann
File "demo.py", line 426, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
sys.exit(main(argv))
File "demo.py", line 313, in main
load_out = load_united_model(logdir)
File "demo.py", line 254, in load_united_model
postfix=model)
File "/home/jie/project/MultiNet-master/tensorvision/utils.py", line 217, in load_modules_from_logdir
objective = imp.load_source("objective%s" % postfix, f)
File "RUNS/MultiNet_ICCV/segmentation/objective.py", line 20, in
from seg_utils import seg_utils as seg
ImportError: No module named seg_utils

cannot find the seg_utils, and where can I get this file?

Multinet Enhance

I have tried modified KittiSeg model to recognize multiple classes. Then I want to merge MultiKittiSeg with KittiBox using multinet structure and I found it does not work for input layer not match. It means that I need to modified KittiBox model structure to fit in the same input layer. But how can I achieve it (MultiKittiBox) ? Is it possible that we do single object detection with multi-class segmentation using multinet2 ?

The train stops since the file hypes/../DATA/classification/train4.txt is missing

python train.py --hypes=hypes/multinet3.json --gpus=0

I built the DATA directory with the given scripts but it seems that part of the data is missing.

Regards,

Shai

Hardware setting for MultiNet

Hi Marvin,

As shown in the paper, MultiNet takes about 100 ms to complete all the three task with a FPS of about 10. Could you let us know what hardware you are using to perform the experiment, since I got slower speed when I train on my machine.

Thanks,

GPU fraction

How to edit Multinet to use a GPU fraction ?
Where can I but this code to make Multinet use only 40% of the GPU memory ?

config = tf.ConfigProto()

config.gpu_options.per_process_gpu_memory_fraction = 0.4

session = tf.Session(config=config, ...)

I tried to edit start_tv_session function in core.py from line 174 :

    config= tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.2
    sess= tf.Session(config=config)
  # sess = tf.get_default_session()

    # Run the Op to initialize the variables.
    with sess.as_default():
        if 'init_function' in hypes:
            _initalize_variables = hypes['init_function']
            _initalize_variables(hypes)
        else:
            init = tf.global_variables_initializer()
            sess.run(init)

        # Start the queue runners.
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)

        # Instantiate a SummaryWriter to output summaries and the Graph.
        summary_writer = tf.summary.FileWriter(hypes['dirs']['output_dir'],
                                               graph=sess.graph)

        tv_session = {}
        tv_session['sess'] = sess
        tv_session['saver'] = saver
        tv_session['summary_op'] = summary_op
        tv_session['writer'] = summary_writer
        tv_session['coord'] = coord
        tv_session['threads'] = threads

    return tv_session

real time processing camera input

real time processing camera input very slow. Over a second per frame. Can you supply system specs also video test data.

error in train.py ResourceExhaustedError OOM when allocating tensor with shape [1,256,92,309]

I use python train.py --hypes hypes/multinet2.json script to train a multinet2 model on Kitti dataset. I use a single NVIDIA GTX 1070 GPU with 8GB graphics memory and my computer memory is 16GB. Does somebody know whether my computer has not got enough memory to train a multinet2 model on kitti dataset? If so, how much memory does i need to complete this job.Thanks for replying.

python demo.py error

Run command：python demo.py --gpus 0 --input data/demo/um_000005.png

error info：
2017-03-16 17:08:13,898 INFO f: <open file u'RUNS/MultiNet_pretrained/detection/hypes.json', mode 'r' at 0x7f3579424ed0>
Traceback (most recent call last):
File "demo.py", line 407, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "demo.py", line 305, in main
load_out = load_united_model(logdir)
File "demo.py", line 249, in load_united_model
postfix=model)
File "incl/tensorvision/utils.py", line 213, in load_modules_from_logdir
data_input = imp.load_source("input_%s" % postfix, f)
File "RUNS/MultiNet_pretrained/detection/data_input.py", line 22, in
from utils.data_utils import (annotation_jitter, annotation_to_h5)
ImportError: No module named data_utils

What's wrong with this mistake?

thanks！

trying to run demo.py getting import PalLib not found in AnnotationLib.py

trying to run demo.py getting import PalLib not found in AnnotationLib.py. Unable to locate this lib for install. Also had to add
sys.path.append("..path...) to data_utils.py so it could find AnnotationLib.py.

KeyError: 'fcn_logits' in KittiSeg

problem when run python train.py --hypes hypes/multinet2.json

14:17:55,887 INFO Creating Summary for: fc7/weights
2017-03-17 14:17:55,912 INFO Creating Summary for: fc7/biases
2017-03-17 14:17:55,956 INFO Creating Summary for: score_fr/weights
2017-03-17 14:17:55,980 INFO Creating Summary for: score_fr/biases
2017-03-17 14:17:56,026 INFO Creating Summary for: upscore2/up_filter
2017-03-17 14:17:56,063 INFO Creating Summary for: score_pool4/weights
2017-03-17 14:17:56,140 INFO Creating Summary for: score_pool4/biases
2017-03-17 14:17:56,184 INFO Creating Summary for: upscore4/up_filter
2017-03-17 14:17:56,223 INFO Creating Summary for: score_pool3/weights
2017-03-17 14:17:56,250 INFO Creating Summary for: score_pool3/biases
2017-03-17 14:17:56,301 INFO Creating Summary for: upscore32/up_filter
{'images': <tf.Tensor 'Inputs/ExpandDims:0' shape=(1, ?, ?, 3) dtype=float32>, 'feed2': <tf.Tensor 'pool4:0' shape=(1, ?, ?, 512) dtype=float32>, 'fcn_in': <tf.Tensor 'dropout_1/mul:0' shape=(1, ?, ?, 4096) dtype=float32>, 'feed4': <tf.Tensor 'pool3:0' shape=(1, ?, ?, 256) dtype=float32>}
Traceback (most recent call last):
File "train.py", line 614, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 593, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 533, in build_united_model
first_iter)
File "train.py", line 133, in build_training_graph
decoded_logits = objective.decoder(hypes, logits, train=True)
File "/home/hustxly/MultiNet/submodules/KittiSeg/hypes/../decoder/kitti_multiloss.py", line 49, in decoder
decoded_logits['logits'] = logits['fcn_logits']
KeyError: 'fcn_logits'

And I checkout the logist's outputs, there is no fcn_logits keywords

{'images': <tf.Tensor 'Inputs/ExpandDims:0' shape=(1, ?, ?, 3) dtype=float32>, 'feed2': <tf.Tensor 'pool4:0' shape=(1, ?, ?, 512) dtype=float32>, 'fcn_in': <tf.Tensor 'dropout_1/mul:0' shape=(1, ?, ?, 4096) dtype=float32>, 'feed4': <tf.Tensor 'pool3:0' shape=(1, ?, ?, 256) dtype=float32>}

ERROR File 'DATA/vgg16.npy' not found.

it said "Download it from https://dl.dropboxusercontent.com/u/50333326/vgg16.npy"
But the file is empty there

Data URL not provided

After I downloaded the data using data_download.py, I got this error when I train the model.


ubuntu@ip-172-30-5-78:~/didicompetetion/Hao/MultiNet$ sudo python train.py --hypes hypes/multinet2.json
2017-03-21 03:08:13,093 INFO No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/ubuntu/tv-plugins'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_SHOW' found. Set to '50'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_EVAL' found. Set to '250'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_MAX_KEEP' found. Set to '10'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.
2017-03-21 03:08:13,094 INFO f: <open file 'hypes/multinet2.json', mode 'r' at 0x7fd421d44b70>
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-03-21 03:08:13,097 INFO Initialize training folder
2017-03-21 03:08:13,097 INFO f: <open file u'/home/ubuntu/didicompetetion/Hao/MultiNet/hypes/../submodules/KittiSeg/hypes/KittiSeg.json', mode 'r' at 0x7fd420796540>
2017-03-21 03:08:13,099 ERROR Data URL for Kitti Data not provided.
2017-03-21 03:08:13,099 ERROR Please visit: http://www.cvlibs.net/download.php?file=data_road.zip
2017-03-21 03:08:13,099 ERROR and request Kitti Download link.
2017-03-21 03:08:13,099 ERROR Enter URL in hypes/kittiSeg.json

I could find the json file, but the Kitti_url is empty, if I put the url retrieved from the step-4, its not working.
Any suggestion on it?

marvinteichmann / multinet Goto Github PK

multinet's Issues

Recommend Projects

Recommend Topics

Recommend Org