marvinteichmann / multinet Goto Github PK
View Code? Open in Web Editor NEWReal-time Joint Semantic Reasoning for Autonomous Driving
License: MIT License
Real-time Joint Semantic Reasoning for Autonomous Driving
License: MIT License
Getting an issue trying to read the vgg16 weights. I am running cuda-8 tensorflow 10 on ubuntu 16 with python3. Appear to be version conflict with saved npy format.
Traceback (most recent call last):
File "demo.py", line 413, in <module>
tf.app.run()
File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "demo.py", line 311, in main
load_out = load_united_model(logdir)
File "demo.py", line 273, in load_united_model
logits = modules['arch'].inference(hypes, image, train=False)
File "RUNS/MultiNet_ICCV/detection/architecture.py", line 28, in inference
random_init_fc8=True)
File "/home/richard/opencv3-p3-code/not_working_yet/MultiNet/submodules/KittiClass/incl/tensorflow_fcn/fcn8_vgg.py", line 59, in build
red, green, blue = tf.split(rgb, 3, 3)
File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 918, in split
name=name)
File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2239, in _split
num_split=num_split, name=name)
File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 463, in apply_op
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.
Hi, I am trying to run the train.py code as explained in README file. I am getting the following error.
$ python train.py --hypes hypes/multinet2.json
[I have added parts of the output log relevant here]
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.2405
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.70GiB
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 245.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 278.55MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 4.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.40GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5804326912
InUse: 5803421440
MaxInUse: 5803589120
NumAllocs: 719
MaxAllocSize: 1270318592
_W tensorflow/core/common_runtime/bfc_allocator.cc:274]
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 2.25MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[256,256,3,3]
W tensorflow/core/kernels/queue_base.cc:294] _0_Queues_segmentation/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] 1_Queues_detection/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
I thought I need to adjust the Batch_Size value in hypes/multinet2.json, but even at a value of 1 the training fails.
What could be the reason?
Hello,
After successful completion of all the described steps, tried running the demo.py But it seems, "annolist" is not found. Could you please let me know how do I fix it.
Downloading MultiNet_ICCV.zip 100.0%
2018-04-18 18:14:20,576 INFO Extracting MultiNet_pretrained.zip
2018-04-18 18:15:13,790 INFO Loading model from: RUNS/MultiNet_ICCV
2018-04-18 18:15:13,791 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/hypes.json' mode='r' encoding='UTF-8'>
2018-04-18 18:15:13,793 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/detection/hypes.json' mode='r' encoding='UTF-8'>
Traceback (most recent call last):
File "demo.py", line 426, in
tf.app.run()
File "/home/aev21/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
sys.exit(main(argv))
File "demo.py", line 313, in main
load_out = load_united_model(logdir)
File "demo.py", line 254, in load_united_model
postfix=model)
File "incl/tensorvision/utils.py", line 213, in load_modules_from_logdir
data_input = imp.load_source("input%s" % postfix, f)
File "/home/aev21/anaconda3/lib/python3.6/imp.py", line 172, in load_source
module = _load(spec)
File "", line 684, in _load
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "RUNS/MultiNet_ICCV/detection/data_input.py", line 22, in
from utils.data_utils import (annotation_jitter, annotation_to_h5)
File "PycharmProjects/MultiNet/submodules/KittiBox/incl/utils/data_utils.py", line 11, in
import annolist.AnnotationLib as al
ModuleNotFouldError: No module named 'annolist'
How to train with other models, like resNet, YOLO, SqueezeNet?
I have seen your MultiNet paper V2 is updated in 8 May 2018. But the code is three years ago and the recent update is one year ago. If the update in your paper are all in the codes.Thank you!
When I read a thesis, I do not understand it.
First, for classification, in the initial paper of 2016, 1x1 convolution was used, but in the 2018 paper (Figure2), 3x3 convolution was used.
Are there any spacial reasons for changing numbers? In addition, at the top of page 4, it is more confusing to say that 'we first apply a 1x1 convolution with 30 channels'. I wonder what number is correct.
Second, I want to see why the concated fetures in the Detection Decoder in Figure2 of the 2018 paper are expressed as 39x12x1526.
According to my calculations, ROI Aligh 128 channels is concatenated with 128*8=1024.(+ I wonder why I see 8 instead of 9, except for the existing results in the middle), 500 channels in the Bottleneck block, and finally Prediction 6 channels are concatenated, so the final result is supposed to be 1024+500+6=1530. I will be very grateful if you let me know if I have the wrong part. I have been thinking about this number for a long time, but there is no other conclusion.
I look forward to your reply. Thank you.
Hello,
After I installed all required modules, and followed the steps you had specified; I got the following exception. Would you mind helping me to run? Thanks.
`(tensorflow)ilithefallen@ubuntu:~/Documents/phdThesis/libs/MultiNet$ python demo.py --gpus 0 --input data/demo/um_000005.png
2017-05-03 18:36:22,177 INFO No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/ilithefallen/tv-plugins'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_SHOW' found. Set to '50'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_EVAL' found. Set to '250'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_MAX_KEEP' found. Set to '10'.
2017-05-03 18:36:22,177 INFO No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.
2017-05-03 18:36:22,179 INFO GPUs are set to: 0
2017-05-03 18:36:22,179 INFO Download URL: ftp://mi.eng.cam.ac.uk/pub/mttt2/models/MultiNet_ICCV.zip
2017-05-03 18:36:22,179 INFO Download DIR: RUNS
Downloading MultiNet_ICCV.zip 100.0%
2017-05-03 18:55:18,499 INFO Extracting MultiNet_pretrained.zip
2017-05-03 18:55:31,855 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/hypes.json' mode='r' encoding='UTF-8'>
2017-05-03 18:55:31,855 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/road/hypes.json' mode='r' encoding='UTF-8'>
2017-05-03 18:55:31,864 INFO f: <_io.TextIOWrapper name='RUNS/MultiNet_ICCV/detection/hypes.json' mode='r' encoding='UTF-8'>
Traceback (most recent call last):
File "demo.py", line 413, in
tf.app.run()
File "/home/ilithefallen/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(sys.argv[:1] + flags_passthrough))
File "demo.py", line 311, in main
load_out = load_united_model(logdir)
File "demo.py", line 254, in load_united_model
postfix=model)
File "incl/tensorvision/utils.py", line 213, in load_modules_from_logdir
data_input = imp.load_source("input%s" % postfix, f)
File "/home/ilithefallen/tensorflow/lib/python3.4/imp.py", line 171, in load_source
module = methods.load()
File "", line 1220, in load
File "", line 1200, in _load_unlocked
File "", line 1129, in _exec
File "", line 1471, in exec_module
File "", line 321, in _call_with_frames_removed
File "RUNS/MultiNet_ICCV/detection/data_input.py", line 22, in
from utils.data_utils import (annotation_jitter, annotation_to_h5)
File "/home/ilithefallen/Documents/phdThesis/libs/MultiNet/submodules/KittiBox/incl/utils/data_utils.py", line 11, in
import annolist.AnnotationLib as al
ImportError: No module named 'annolist'`
how can i solve the problem
ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [7,7,512,4096] and type float
[[Node: fc6/weights/Adam/Initializer/zeros = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [7,7,512,4 096] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
I use
python download_data.py --kitti_url http://www.cvlibs.net/download.php?file=data_road.zip
or
python download_data.py --kitti_url http://kitti.is.tue.mpg.de/kitti/data_road.zip
(http://kitti.is.tue.mpg.de/kitti/data_road.zip is the url i get after email verified)
Then i get the error
2017-03-11 22:30:28,331 WARNING File: DATA/vgg16.npy exists.
2017-03-11 22:30:28,331 WARNING Please delete to redownload VGG weights.
2017-03-11 22:30:28,331 ERROR Wrong url.
2017-03-11 22:30:28,331 ERROR Please visit: http://www.cvlibs.net/download.php?file=data_road.zip
2017-03-11 22:30:28,331 ERROR and request Kitti Download link.
2017-03-11 22:30:28,331 ERROR Rerun scipt using'python download_data.py --kitti_url [url]'
i check the source code in download_data.py, the right url is ended with 'kitti/data_object_image_2.zip'. It is a little confused. How can i get the right url ?
I use
python download_data.py --kitti_url http://www.cvlibs.net/download.php?file=data_road.zip
(http://www.cvlibs.net/download.php?file=data_road.zip is the url i get after email verified)
Then i get the error
2018-06-26 10:24:35,922 ERROR Wrong url.
2018-06-26 10:24:35,923 ERROR Please visit: http://www.cvlibs.net/download.php?file=data_road.zip
2018-06-26 10:24:35,923 ERROR and request Kitti Download link.
2018-06-26 10:24:35,923 ERROR You will receive an Email with the kitti download url
2018-06-26 10:24:35,923 ERROR Rerun and enter the received [url] using'python download_data.py --kitti_url [url]'
error on running.
python demo.py --gpus 0 --input data/demo/um_000005.png
*********************************************************************train_utils.py: commented out if use_stitching
File "demo.py", line 413, in <module>
tf.app.run()
File "/home/richard/.virtualenvs/cvp3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "demo.py", line 364, in main
min_conf=0.50, tau=subhypes['detection']['tau'])
File "/home/richard/opencv3-p3-code/not_working_yet/MultiNet/submodules/KittiBox/incl/utils/train_utils.py", line 103, in add_rectangles
from stitch_wrapper import stitch_rects
ImportError: ./submodules/KittiBox/submodules/utils/stitch_wrapper.so: undefined symbol: _Py_ZeroStruct
got around this by commenting out lines in train_utils.py
#if use_stitching:
# from stitch_wrapper import stitch_rects
# acc_rects = stitch_rects(all_rects, tau)
#else:
acc_rects = all_rects_r
Very good work Marvin,
If possible could you please explain, the training weighting for training logic?? In the paper its hard to follow what exactly you did.
I see your weights in the .json files and the subgraph training selection is based on this logic train.py. The weight values are now [1, 0] seems to indicate you only uses the subgraph[0] , so the segmentation graph only. The training results show the detection subgraph working, As i see a car being detected in the bounding box. While the loss function seems to be using both graphs regardless of these weights.
Could you please give some insight, your weighting technique in the following ?
**File: train.py
lt is alway 0 if using multinet2.json so this is fully weighted on the "segmentation" graph then?
line 202: weights = meta_hypes['selection']['weights']
line 229: sess.run([subgraph[model]['train_op']], feed_dict=feed_dict)
File:multinet2.json
"weights": [1, 0] and older commits had "weights": [1, 2]
"model_list": ["segmentation", "detection"],
"selection": {
"random": false,
"use_weights": true,
"weights": [1, 0]
}
```,
The new paper has received the best prize. If you can update the new codes.Thank you.
As the paper says, MultiNet could do the detection within 100ms. However, the time used is 5 seconds long. How could it be?
I change the KittBox hypes and training data with my own, aslo change the "kitti_input.py",
it can train well, but when use demo.py for detection, the boxes go through the function
"acc_rects = stitch_rects(all_rects, tau)", and just car boxes can remain, I don't know why,
can you give some advises, thank you!!
Great project -- Thanks for sharing! I am trying to apply multiNet on different size images (e.g., 940x640), any suggestion on how to proceed? Do you have some reference codes or work to recommend?
when I run demo.py, it crashes at "output = sess.run([softmax], feed_dict=feed)" with the error called"Number of ways to split should evenly divide the split dimension, but got split_dim 3 (size = 4) and num_split 3
[[Node: Validation/Processing/split = Split[T=DT_FLOAT, num_split=3, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Validation/Processing/split/split_dim, ExpandDims)]]"
Hi Marvin,
Maybe I don't understand the documentation correctly, I successfully trained and evaluated the demo image, and as well the multinet2 demo segmentation. Would it be possible to use the same implementation but instead of using an image as input could I use the camera as input for a real-time application?
Hi,
Is it possible to use a video file as input for demo.py??
@MarvinTeichmann Tried multiple images with different file types but only the demo images work. All other images get the following error:
2018-08-12 18:54:03,523 INFO /data/cvfs/mttt2/RUNS/ICCV/MultiNet/multinet_trained/model.ckpt-99999
2018-08-12 18:54:03,524 INFO Restoring parameters from RUNS/MultiNet_ICCV/model.ckpt-99999
2018-08-12 18:54:03.683228: W tensorflow/core/framework/allocator.cc:108] Allocation of 411041792 exceeds 10% of system memory.
2018-08-12 18:54:04.106021: W tensorflow/core/framework/allocator.cc:108] Allocation of 411041792 exceeds 10% of system memory.
Traceback (most recent call last):
File "demo.py", line 426, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "demo.py", line 340, in main
assert(image_height >= shape[0])
AssertionError
Hi,
I've trained a MultiNet2 (segmentation and detection) model. I was wondering how I can evaluate it on the validation set?
There is no evaluate.py in the root folder. I tried to run the evaluate.py in the submodule folder but didn't succeed. Perhaps I missed something. I'd appreciate it if you could give me some instructions. Thanks.
i want to achieve the joint detection and segmentation but i do not know how to make the train dataset?can you tell me the the format of the data?
After reading README.md in MultiNet, I have a question about training my own data. As the tutorial said, we train a new MultiNet with specifying 'multinet2.json'. Does it means that we just use trained KittiSeg and KittiBox models and recombine them. The training data of multinet2 comes from two separated model (KittiSeg and KittiBox), which means two separated training data set. Finally, do we need to rearrange these two training data set to fit in the multinet2 model?
The script train.py crashes in line 229 sess.run([subgraph[model]['train_op']], feed_dict=feed_dict)
.
The exact message is
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Process finished with exit code 134
I ran the script by
python2 train.py --hypes hypes/multinet2.json
@MarvinTeichmann
I've tried to train multinet2.json with one gpu(1080Ti), and it seems collapse, have you use multiple GPUs?
the information as follows:
2018-12-13 17:56:15.831145: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.64GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-13 17:56:15.869555: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.73GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-13 17:56:16,014 INFO Segmentation Loss was used.
The FTP link seems to be broken at this time. Where else can we download it from? The only other download link I found is a mega.nz link, which our firewall won't allow me to access.
Hi,
I am totally confused about the working of the Detection Decoder
In the output of the module what are the 2 labels/classes?
According to the paper, the 1st and 2nd channel of the prediction output gives the confidence that an object of interest is present at a particular location.
What are the objects of interest: car/road?
Fig.3 shows 3 crossed gray cells: are those the cells in 'I don't care area'.
is it expected that the top of the image (the sky) is not considered as "I don't care area".
The last 4 channels are the bounding box coordinate ( x0, y0, h, w).
are those coordinates at the scale of the input image dimension, or at the scale of the (39x12) feature map?
What is "delta prediction" (the residue)? Is it the correction to be applied to the coarse estimate of the bounding box (from the prediction)
what's the difference between the output of the Segmentation decoder and the Detection Decoder in terms of output: I understand that the Segmentation outputs a mask related to the 2 classes. But I would thought that the Detection Decoder output the coordinate of the bounding boxes.
Thank you
Hi, really hoping the v2 version codes, the paper has been updated into v2 version, when will the codes release anyway?
@MarvinTeichmann
I have three GPUs in my system. However, the program is using only one of them. Even though it greedily captures all the memory, it uses the computational power of a single GPU.
Is there a fix for this?
I have try the demo and it stuck because of missing DATA/weight/vgg16.npy
then I just download from another source. I just create DATA
directory inside of Multinet
(root) directory, so it becomes Multinet/DATA/weights/vgg16.npy
but still it not found. So I just choose download it using your download script. Now I am in managing my directory space, so, where is DATA
location? After finish download wherecan I find DATA
?
ubuntu@ip-172-30-5-78:~/zdx/MultiNet$ git submodule update --init --recursive
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/KittiBox/submodules/TensorVision'
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/KittiClass/submodules/TensorVision'
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/KittiSeg/submodules/TensorVision'
fatal: reference is not a tree: 81644b23b22cb192a590543094a4b928a711d3b8
Failed to recurse into submodule path 'submodules/KittiBox'
Failed to recurse into submodule path 'submodules/KittiClass'
Failed to recurse into submodule path 'submodules/KittiSeg'
Unable to checkout '81644b23b22cb192a590543094a4b928a711d3b8' in submodule path 'submodules/TensorVision'
Thanks for releasing these codes again!
I found some issues about submodules and report here:
Thanks again~
Hi,
I am using python 2.7.13 with pyenv (https://github.com/pyenv/pyenv)
when I run demo.py
python demo.py --gpus 1 --input data/demo/um_000005.png
the exceptions
File "demo.py", line 426, in <module>
tf.app.run()
File "/home/tumh/.pyenv/versions/multinet/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "demo.py", line 366, in main
min_conf=0.50, tau=subhypes['detection']['tau'])
File "/home/tumh/MultiNet/submodules/KittiBox/incl/utils/train_utils.py", line 103, in add_rectangles
from stitch_wrapper import stitch_rects
ImportError: /home/tumh/MultiNet/submodules/KittiBox/incl/utils/stitch_wrapper.so: undefined symbol: PyFPE_jbuf
Any idea?
thanks!
Thanks for releasing these codes again!
When I train multinet2 using kitti data by myself ,but there is a problem:
`W tensorflow/core/kernels/queue_base.cc:294] _1_Queues_detection/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 608, in main
tv_sess, start_step=start_step)
File "train.py", line 229, in run_united_training
sess.run([subgraph[model]['train_op']], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 786, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 994, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1044, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1064, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,512,3,3]
[[Node: conv4_2_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](conv4_1_1/Relu, conv4_2/filter/read)]]
[[Node: training/Adam_1/update/_72 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7851_training/Adam_1/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'conv4_2_1/Conv2D', defined at:
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 595, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 535, in build_united_model
first_iter)
File "train.py", line 130, in build_training_graph
logits = encoder.inference(hypes, image, train=True)
File "/home/nextcar/MultiNet/submodules/KittiBox/hypes/../encoder/vgg.py", line 28, in inference
random_init_fc8=True)
File "/home/nextcar/MultiNet/incl/tensorflow_fcn/fcn8_vgg.py", line 88, in build
self.conv4_2 = self._conv_layer(self.conv4_1, "conv4_2")
File "/home/nextcar/MultiNet/incl/tensorflow_fcn/fcn8_vgg.py", line 155, in _conv_layer
conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 416, in conv2d
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[512,512,3,3]
[[Node: conv4_2_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](conv4_1_1/Relu, conv4_2/filter/read)]]
[[Node: training/Adam_1/update/_72 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7851_training/Adam_1/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/nextcar/MultiNet/submodules/KittiSeg/hypes/../inputs/kitti_seg_input.py", line 351, in enqueue_loop
sess.run(enqueue_op, feed_dict=make_feed(d))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 786, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 994, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1044, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1064, in _do_call
raise type(e)(node_def, op, message)
CancelledError: Enqueue operation was cancelled
[[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_segmentation/fifo_queue, _recv_Placeholder_2_0, _recv_Placeholder_3_0)]]
Caused by op u'fifo_queue_enqueue', defined at:
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 595, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 564, in build_united_model
'train', sess)
File "/home/nextcar/MultiNet/submodules/KittiSeg/hypes/../inputs/kitti_seg_input.py", line 353, in start_enqueuing_threads
enqueue_op = q.enqueue((image_pl, label_pl))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 322, in enqueue
self._queue_ref, vals, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1587, in _queue_enqueue_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()
CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_segmentation/fifo_queue, _recv_Placeholder_2_0, _recv_Placeholder_3_0)]]
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/nextcar/MultiNet/submodules/KittiBox/hypes/../inputs/kitti_input.py", line 230, in thread_loop
sess.run(enqueue_op, feed_dict=make_feed(d))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 786, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 994, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1044, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1064, in _do_call
raise type(e)(node_def, op, message)
CancelledError: Enqueue operation was cancelled
[[Node: fifo_queue_enqueue_1 = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_detection/fifo_queue, _recv_Placeholder_4_0, _recv_Placeholder_5_0, _recv_Placeholder_6_0, _recv_Placeholder_7_0)]]
Caused by op u'fifo_queue_enqueue_1', defined at:
File "train.py", line 616, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 595, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 564, in build_united_model
'train', sess)
File "/home/nextcar/MultiNet/submodules/KittiBox/hypes/../inputs/kitti_input.py", line 220, in start_enqueuing_threads
enqueue_op = q.enqueue((x_in, confs_in, boxes_in, mask_in))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 322, in enqueue
self._queue_ref, vals, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1587, in _queue_enqueue_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()
CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: fifo_queue_enqueue_1 = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](Queues_detection/fifo_queue, _recv_Placeholder_4_0, _recv_Placeholder_5_0, _recv_Placeholder_6_0, _recv_Placeholder_7_0)]]`
I don't know how to solve it. Could you tell me the reason is?
Thank you!
when i run the train.py
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
i run the code on the cpu ubuntu18.04
can you help me solve the problem
subprocess.CalledProcessError: Command returned non-zero exit status -11
@MarvinTeichmann
File "demo.py", line 426, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
sys.exit(main(argv))
File "demo.py", line 313, in main
load_out = load_united_model(logdir)
File "demo.py", line 254, in load_united_model
postfix=model)
File "/home/jie/project/MultiNet-master/tensorvision/utils.py", line 217, in load_modules_from_logdir
objective = imp.load_source("objective%s" % postfix, f)
File "RUNS/MultiNet_ICCV/segmentation/objective.py", line 20, in
from seg_utils import seg_utils as seg
ImportError: No module named seg_utils
cannot find the seg_utils, and where can I get this file?
I have tried modified KittiSeg model to recognize multiple classes. Then I want to merge MultiKittiSeg with KittiBox using multinet structure and I found it does not work for input layer not match. It means that I need to modified KittiBox model structure to fit in the same input layer. But how can I achieve it (MultiKittiBox) ? Is it possible that we do single object detection with multi-class segmentation using multinet2 ?
python train.py --hypes=hypes/multinet3.json --gpus=0
I built the DATA directory with the given scripts but it seems that part of the data is missing.
Regards,
Hi Marvin,
As shown in the paper, MultiNet takes about 100 ms to complete all the three task with a FPS of about 10. Could you let us know what hardware you are using to perform the experiment, since I got slower speed when I train on my machine.
Thanks,
How to edit Multinet to use a GPU fraction ?
Where can I but this code to make Multinet use only 40% of the GPU memory ?
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
I tried to edit start_tv_session function in core.py from line 174 :
config= tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
sess= tf.Session(config=config)
# sess = tf.get_default_session()
# Run the Op to initialize the variables.
with sess.as_default():
if 'init_function' in hypes:
_initalize_variables = hypes['init_function']
_initalize_variables(hypes)
else:
init = tf.global_variables_initializer()
sess.run(init)
# Start the queue runners.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(hypes['dirs']['output_dir'],
graph=sess.graph)
tv_session = {}
tv_session['sess'] = sess
tv_session['saver'] = saver
tv_session['summary_op'] = summary_op
tv_session['writer'] = summary_writer
tv_session['coord'] = coord
tv_session['threads'] = threads
return tv_session
real time processing camera input very slow. Over a second per frame. Can you supply system specs also video test data.
I use python train.py --hypes hypes/multinet2.json script to train a multinet2 model on Kitti dataset. I use a single NVIDIA GTX 1070 GPU with 8GB graphics memory and my computer memory is 16GB. Does somebody know whether my computer has not got enough memory to train a multinet2 model on kitti dataset? If so, how much memory does i need to complete this job.Thanks for replying.
Run command:python demo.py --gpus 0 --input data/demo/um_000005.png
error info:
2017-03-16 17:08:13,898 INFO f: <open file u'RUNS/MultiNet_pretrained/detection/hypes.json', mode 'r' at 0x7f3579424ed0>
Traceback (most recent call last):
File "demo.py", line 407, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "demo.py", line 305, in main
load_out = load_united_model(logdir)
File "demo.py", line 249, in load_united_model
postfix=model)
File "incl/tensorvision/utils.py", line 213, in load_modules_from_logdir
data_input = imp.load_source("input_%s" % postfix, f)
File "RUNS/MultiNet_pretrained/detection/data_input.py", line 22, in
from utils.data_utils import (annotation_jitter, annotation_to_h5)
ImportError: No module named data_utils
What's wrong with this mistake?
thanks!
trying to run demo.py getting import PalLib not found in AnnotationLib.py. Unable to locate this lib for install. Also had to add
sys.path.append("..path...) to data_utils.py so it could find AnnotationLib.py.
problem when run python train.py --hypes hypes/multinet2.json
14:17:55,887 INFO Creating Summary for: fc7/weights
2017-03-17 14:17:55,912 INFO Creating Summary for: fc7/biases
2017-03-17 14:17:55,956 INFO Creating Summary for: score_fr/weights
2017-03-17 14:17:55,980 INFO Creating Summary for: score_fr/biases
2017-03-17 14:17:56,026 INFO Creating Summary for: upscore2/up_filter
2017-03-17 14:17:56,063 INFO Creating Summary for: score_pool4/weights
2017-03-17 14:17:56,140 INFO Creating Summary for: score_pool4/biases
2017-03-17 14:17:56,184 INFO Creating Summary for: upscore4/up_filter
2017-03-17 14:17:56,223 INFO Creating Summary for: score_pool3/weights
2017-03-17 14:17:56,250 INFO Creating Summary for: score_pool3/biases
2017-03-17 14:17:56,301 INFO Creating Summary for: upscore32/up_filter
{'images': <tf.Tensor 'Inputs/ExpandDims:0' shape=(1, ?, ?, 3) dtype=float32>, 'feed2': <tf.Tensor 'pool4:0' shape=(1, ?, ?, 512) dtype=float32>, 'fcn_in': <tf.Tensor 'dropout_1/mul:0' shape=(1, ?, ?, 4096) dtype=float32>, 'feed4': <tf.Tensor 'pool3:0' shape=(1, ?, ?, 256) dtype=float32>}
Traceback (most recent call last):
File "train.py", line 614, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 593, in main
subhypes, submodules, subgraph, tv_sess = build_united_model(hypes)
File "train.py", line 533, in build_united_model
first_iter)
File "train.py", line 133, in build_training_graph
decoded_logits = objective.decoder(hypes, logits, train=True)
File "/home/hustxly/MultiNet/submodules/KittiSeg/hypes/../decoder/kitti_multiloss.py", line 49, in decoder
decoded_logits['logits'] = logits['fcn_logits']
KeyError: 'fcn_logits'
And I checkout the logist's outputs, there is no fcn_logits keywords
{'images': <tf.Tensor 'Inputs/ExpandDims:0' shape=(1, ?, ?, 3) dtype=float32>, 'feed2': <tf.Tensor 'pool4:0' shape=(1, ?, ?, 512) dtype=float32>, 'fcn_in': <tf.Tensor 'dropout_1/mul:0' shape=(1, ?, ?, 4096) dtype=float32>, 'feed4': <tf.Tensor 'pool3:0' shape=(1, ?, ?, 256) dtype=float32>}
it said "Download it from https://dl.dropboxusercontent.com/u/50333326/vgg16.npy"
But the file is empty there
After I downloaded the data using data_download.py, I got this error when I train the model.
ubuntu@ip-172-30-5-78:~/didicompetetion/Hao/MultiNet$ sudo python train.py --hypes hypes/multinet2.json
2017-03-21 03:08:13,093 INFO No environment variable 'TV_PLUGIN_DIR' found. Set to '/home/ubuntu/tv-plugins'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_SHOW' found. Set to '50'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_EVAL' found. Set to '250'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_WRITE' found. Set to '1000'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_MAX_KEEP' found. Set to '10'.
2017-03-21 03:08:13,093 INFO No environment variable 'TV_STEP_STR' found. Set to 'Step {step}/{total_steps}: loss = {loss_value:.2f}; lr = {lr_value:.2e}; {sec_per_batch:.3f} sec (per Batch); {examples_per_sec:.1f} imgs/sec'.
2017-03-21 03:08:13,094 INFO f: <open file 'hypes/multinet2.json', mode 'r' at 0x7fd421d44b70>
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-03-21 03:08:13,097 INFO Initialize training folder
2017-03-21 03:08:13,097 INFO f: <open file u'/home/ubuntu/didicompetetion/Hao/MultiNet/hypes/../submodules/KittiSeg/hypes/KittiSeg.json', mode 'r' at 0x7fd420796540>
2017-03-21 03:08:13,099 ERROR Data URL for Kitti Data not provided.
2017-03-21 03:08:13,099 ERROR Please visit: http://www.cvlibs.net/download.php?file=data_road.zip
2017-03-21 03:08:13,099 ERROR and request Kitti Download link.
2017-03-21 03:08:13,099 ERROR Enter URL in hypes/kittiSeg.json
I could find the json file, but the Kitti_url is empty, if I put the url retrieved from the step-4, its not working.
Any suggestion on it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.