nwojke / cosine_metric_learning Goto Github PK

Deep Cosine Metric Learning for Person Re-identification

License: GNU General Public License v3.0

Python 100.00%

cosine_metric_learning's Issues

Switching from cpu to gpu error

Hi,
I would like to know if it is possible to run the current model on GPU, by simply changing /device:CPU:0 to /device:GPU:0 or there are some further steps to be taken.

PS. I keep getting this error message:

InvalidArgumentError (see above for traceback): Cannot assign a device for operation map/Shape: Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. Registered kernels:...
Could you clarify this point please.
Thank you

About the model structure.

The training on MARS dataset was successfull,but the losses did not improve whereas the accuracy reached 99%. Can you point me how to improve the model structure so that feature extraction improves?

Thanks

ValueError: Checkpoint is missing variable [ball/beta]

Hi, nwojke

I want to generate the trained model and use it for deep-sort, but I have a trouble here.

At first the following command is used to to generate the model.ckpt:

'python train_market1501.py --dataset_dir=./Market-1501-v15.09.15/ --loss_mode=cosine-softmax --log_dir=./output/market1501/ --run_id=cosine-softmax'

After a few iterations, I interrupted it and used the following command to generate the detections.npy:

'python generate_detections.py --mot_dir=./MOT16/train --model=./model.ckpt'

However, I got this ERROR:

Traceback (most recent call last):
File "generate_detections_V2.py", line 474, in
f = create_box_encoder(args.model, batch_size=32, loss_mode=args.loss_mode)
File "generate_detections_V2.py", line 363, in create_box_encoder
image_encoder = create_image_encoder(model_filename, batch_size, loss_mode)
File "generate_detections_V2.py", line 358, in create_image_encoder
model_filename, loss_mode)
File "generate_detections_V2.py", line 337, in _create_image_encoder
checkpoint_path, slim.get_variables_to_restore())
File "E:\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\variables.py", line 612, in assign_from_checkpoint
raise ValueError(log_str)
ValueError: Checkpoint is missing variable [ball/beta]

(I did a few changes in generate_detections_V2.py but nothing to do with the network..)
Could you please show me how to fix it ? Thanks a lot!

training on new person dataset

Hi,
I want to train the model on person data itself but images are taken in different setting so background and illumination condition can largely differ.

I am doubtful about how to annotate/label the data? Is it necessary to annotate on MATLAB or I can also use tools like LabelImg where annotations are saved in xml format.

Are image name is used as labels?

Does the mars-small128.pb file provided by the author only use the network in cosine_metric_learning_master for training?

Hello. Does the mars-small128.pb file you provided only use the network in cosine_metric_learning_master for training? Have you made any changes? I saw you say it's better to look for network files than in deep sort, but I didn't find the corresponding network files in deep sort.

training the model with my own data

precated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Train set size: 58 images, 21 identities
feature dimensionality: 128
WARNING:tensorflow:From C:\tensorflow1\cosine_metric_learning-master\nets\deep_sort\network_definition.py:84: calling l2_normalize (from tensorflow.python.ops.nn_impl) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From C:\tensorflow1\cosine_metric_learning-master\train_app.py:615: sparse_softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.sparse_softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From C:\Users\ayman\Anaconda3\envs\py35\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:433: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From C:\Users\ayman\Anaconda3\envs\py35\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:146: add_arg_scope..func_with_args (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.

Run ID: cosine-softmax
Log directory: ./output/market1501/cosine-softmax

WARNING:tensorflow:From C:\Users\ayman\Anaconda3\envs\py35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:736: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-05-28 15:10:18.081763: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-05-28 15:10:18.646344: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.2415
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.66GiB
2018-05-28 15:10:18.651127: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-28 15:10:19.266360: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-28 15:10:19.268433: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-05-28 15:10:19.270037: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-05-28 15:10:19.271564: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1430 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
EnqueueError: Cannot take a larger sample than population when 'replace=False'

can you help me please fixing this problem. cause i am new in deeplearning and python and all that stuff.

softmargin_triplet_loss

In losses.py
line 15 , def softmargin_triplet_loss(features, labels, create_summaries=True):
the features are (n,m) ,so the n is batch size and m is dim
but softmargin_triplet_loss is triplet ,the features should be anchor ,positive , negetive .
but the features are one , why?

and can you tell me the labels ,it look like ?

Training a custom dataset for deep sort

I want to use deep sort for tracking objects other than people (say cars). I presume I have to train this cosine metric model to generate a .pb file to use in deep sort? Now to train this model, what is the format or structure of dataset/folders that I need? And what type of data do I need? Is a set of images with their gt_boxes enough?
What I can understand is, I need to have a directory of frames from videos ( training data images) and the gt_boxes
--> create .pb file by training cosine metric model on train data (on cropped images of specific objects(?))
--> train Faster RCNN model using training data
--> generate predictions using Faster RCNN on test data
--> create .npy files on test image directory and predicted boxes with generate_detections.py from deep sort
--> feed them into deep_sort_app.py.
Anything else I need to take care of?
Thanks for the great work!

loading weights from pretrained weights

Hello nwojke,

Awesome work.I have one doubt.When i load the pretrained models u have given to retrain the network for a different dataset ,i am getting "variable missing" errors.Any thoughts on this.I am using the restore path argument to supply a pretrained model

DataLossError: Unable to open table file

Hi Nwojke,
I could perfectly run the training script of mars couple of months ago, but when I run it now, I face the following errors.
NB: I took a fresh git clone of your repo today, downloaded the mars dataset and pointed the directories correctly imo and even installed tensorflow 1.4.1 like you pointed out somewhere. Any idea as to what is causing this error is highly appreciated.
The error is

DataLossError (see above for traceback): Unable to open table file /media/data/data/output/mars/cosine-softmax/mars.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2_43/_259 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_560_save/RestoreV2_43", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Google search gives a lot of different causes from tensorflow version to tensorflow ckpt file format which has been updated, but I guess since this repo is quite new, this should not be the case. I have not made any changes to the code except replacing 'keepdims' by 'keep_dims' which is an old issue.

The full stack trace is :

2018-08-31 14:22:04.069807: W tensorflow/core/framework/op_kernel.cc:1192] Data loss: Unable to open table file /media/data/data/output/mars/cosine-softmax/mars.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
File "train_mars.py", line 120, in
main()
File "train_mars.py", line 74, in main
num_images_per_id=4, image_shape=IMAGE_SHAPE, **train_kwargs)
File "/media/data/ML/test_cosine/train_app.py", line 199, in train_loop
save_interval_secs=save_interval_secs, number_of_steps=number_of_steps)
File "/media/data/ML/test_cosine/queued_trainer.py", line 408, in run
**kwargs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 742, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1666, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /media/data/data/output/mars/cosine-softmax/mars.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2_43/_259 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_560_save/RestoreV2_43", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op u'save/RestoreV2', defined at:
File "train_mars.py", line 120, in
main()
File "train_mars.py", line 74, in main
num_images_per_id=4, image_shape=IMAGE_SHAPE, **train_kwargs)
File "/media/data/ML/test_cosine/train_app.py", line 199, in train_loop
save_interval_secs=save_interval_secs, number_of_steps=number_of_steps)
File "/media/data/ML/test_cosine/queued_trainer.py", line 404, in run
saver = tf.train.Saver(max_to_keep=max_checkpoints_to_keep)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1218, in init
self.build()
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1227, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1263, in _build
build_save=build_save, build_restore=build_restore)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
restore_sequentially, reshape)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 267, in restore_op
[spec.tensor.dtype])[0])
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

Training Sample Dataset Error

I am training on the Mars data set but I am facing following error
"" EnqueueError: Cannot take a larger sample than population when 'replace=False' ""
Please guide me

GPU training error

Hello @nwojke thank you for the great work! I really appreciate it! I was planning to train the cosine metric learning for VERI dataset as I'm trying to detect and track vehicles in my research. However, when I got all my folders ready and started training I got this error. Can you please help me figure out what it is and how I resolve the issue?

My GPU IS : NVIDIA GeForce GTX 1050Ti with 4GB GDDR5

NewRandomAccessFile failed to Create/Open

I can train without problem with mars dataset. But when I try on my own dataset, I hit into this error. I have tried to rename my images to be the same format as mars, not helping. Do you have any idea what did I do wrong?

(tripletloss) D:\xrv\XRV_projects\cosine_metric_learning>python train_mars.py --dataset_dir=./FR-dataset --loss_mode=cosine-softmax --log_dir=./output/FR/ --run_id=cosine-softmax
Train set size: 1537 images, 36 identities
feature dimensionality:  128
WARNING:tensorflow:From D:\xrv\XRV_projects\cosine_metric_learning\nets\deep_sort\network_definition.py:84: calling l2_normalize (from tensorflow.python.ops.nn_impl) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From D:\xrv\XRV_projects\cosine_metric_learning\train_app.py:615: sparse_softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.sparse_softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:434: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\contrib\losses\python\losses\loss_ops.py:147: add_arg_scope.<locals>.func_with_args (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
---------------------------------------
Run ID:  cosine-softmax
Log directory:  ./output/FR/cosine-softmax
---------------------------------------
WARNING:tensorflow:From C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py:737: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-07-17 22:58:10.377791: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-17 22:58:10.559399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2019-07-17 22:58:10.570178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-17 22:58:11.103766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-17 22:58:11.110903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2019-07-17 22:58:11.114184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2019-07-17 22:58:11.117979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
ERROR: Failed to sample a full batch. Adding corrupt data.
2019-07-17 22:58:33.914975: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at whole_file_read_ops.cc:114 : Not found: NewRandomAccessFile failed to Create/Open:  : The system cannot find the path specified.
; No such process
EnqueueError: NewRandomAccessFile failed to Create/Open:  : The system cannot find the path specified.
; No such process
         [[node map/while/ReadFile (defined at D:\xrv\XRV_projects\cosine_metric_learning\train_app.py:250)  = ReadFile[_device="/job:localhost/replica:0/task:0/device:CPU:0"](map/while/TensorArrayReadV3)]]

Caused by op 'map/while/ReadFile', defined at:
  File "train_mars.py", line 119, in <module>
    main()
  File "train_mars.py", line 73, in main
    num_images_per_id=4, image_shape=IMAGE_SHAPE, **train_kwargs)
  File "D:\xrv\XRV_projects\cosine_metric_learning\train_app.py", line 188, in train_loop
    trainable_scopes=trainable_scopes)
  File "D:\xrv\XRV_projects\cosine_metric_learning\train_app.py", line 251, in create_trainer
    filename_var, back_prop=False, dtype=tf.uint8)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\functional_ops.py", line 494, in map_fn
    maximum_iterations=n)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3291, in while_loop
    return_same_structure)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3004, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2939, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3260, in <lambda>
    body = lambda i, lv: (i + 1, orig_body(*lv))
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\functional_ops.py", line 483, in compute
    packed_fn_values = fn(packed_values)
  File "D:\xrv\XRV_projects\cosine_metric_learning\train_app.py", line 250, in <lambda>
    tf.read_file(x), channels=num_channels),
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 531, in read_file
    "ReadFile", filename=filename, name=name)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "C:\Users\Noel Tam\AppData\Local\conda\conda\envs\tripletloss\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): NewRandomAccessFile failed to Create/Open:  : The system cannot find the path specified.
; No such process
         [[node map/while/ReadFile (defined at D:\xrv\XRV_projects\cosine_metric_learning\train_app.py:250)  = ReadFile[_device="/job:localhost/replica:0/task:0/device:CPU:0"](map/while/TensorArrayReadV3)]]


(tripletloss) D:\xrv\XRV_projects\cosine_metric_learning>

EnqueueError: 'a' must be greater than 0 unless no samples are taken

When trying to train on my own dataset. My dataset has two folders; test & train. each has a bunch of cropped pictures for my object. When i try to run my own script i get this error

When start training, seems to stuck and no feedback

@nwojke hello, thanks for your work!
I'm training this model on VeRi DataSet, and when I run train_veri.py. here are following information occured, it seems start training, but I was hoping to see more feedback information such as current iteration, current loss etc. But no feedback occured, which makes me wondering whether the program is stucked. but then use nvidia-smi to see the GPU usage, it seems working on training, than I go to the log_dir, there are some files seems to be the training checkpoint. I trained for about 10 hours, but there's no sign to stop.
Is there anyone who knows how long it takes to train on VeRi DataSet? Or is it normal to have no feedback while training? Thanks a lot!

Run ID: cosine-softmax
Log directory: ./output/veri/cosine-softmax

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: TITAN Xp
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:03:00.0
Total memory: 11.90GiB
Free memory: 11.70GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN Xp, pci bus id: 0000:03:00.0)

training method

@nwojke Hello, is there any difference between soft Max classifier and triplet training? Which training method is better?

Is the model mars-small128.pb provided by author only trained through the Mars dataset?

@nwojke @abewley Hi,Is the model mars-small128.pb provided by author only trained through the Mars dataset?

How to evaluate Market1501 model?

Does the '--mode=eval' can help evaluate? Cause I did not find the measure interface in the Market1501 evaluation code ...

Thx a lot if anybody knows how to fix it !

Changed the net and can get higher acc,but eval precision becomes lower?

i changed the net and the out feature dimension is 256, and i trained about 8hours,when i eval the model the precision 1 , 5, 10, 20 is lower? the precision 1 is only 0.66
and i find that the same net trained 2 hours, the precision 1 is 0.77, why the model trained longer get the precision lower?

need help for VeRi model file

Hello @nwojke ,
Thanks for your work!

As you mentioned here,, you have trained a model for VeRi DataSet, right? According to the readme file I cannot get the DataSet from the original owner(nobody reply for my request email), so could you please share me the trained model for VeRi or send me a copy of the DataSet(previous better)? Thank you very much!

training time

Thanks for this nice paper and codebase. It is very helpful for comparing,

How does the training time differs and what is the actual training time on your GPU (Mars or Market) ? I am seeing very fast training on Mars < 1 hour.. on 1080 Ti, Mars.

How to set MAX_LABEL in custom dataset

Hi, @nwojke , I'm trying to train on my own dataset, I created the dataset just as Market 1501 format (naming format). Now my training dataset has 500 identities, indexing from 0001 to 0500.

Now I am confused about how to set the MAX_LABEL. I see here MAX_LABEL is set to 1501, however, the maximum id number of images inside Market1501/bounding_box_train is 1500.

Could you tell me how to set the MAX_LABEL in my case, and what's the purpose of this value?

Thanks!

how to train on other losses? when i change to triplet loss, it just pops up with no warning,why?.

Custom Dataset Format

Can anybody explain me how we can convert our own data set to the format of Mars and Market dataset .What will be the steps to convert our dataset format to mars or market dataset.

Where the author's trained model?

where can I find the author's trained model which is not changed to .pb?

Unable to understand the significance of the 6 camera views

Could you please inform the relevance of using 6 cameras? Is it just by coincidence that both the training datasets are using 6 cameras, because I couldn't find their this unique property being used. Also, I wish to train it on my data, which has not been shot in the similar fashion. Is it possible to train it then or not?
Also**, could you please give a description of what do the .mat files contain.** I couldn't decode them so that I could put my data for training.
A read-me file for the gt-query would be appreciated.
Your help would be really appreciated. Thanks

Question about certain lines of code

Thanks so much for such an excellent repository! I have a question about the code, if that's alright.

In your good_mask in market1501.py, where do you set (i,j) to zero if i is a query image and j is a gallery image that starts with -1? You set the junk images in junk_mat to equal zero, but aren't there other junk images that start with -1 that aren't in junk_mat?

Thanks so much!

About model export issue to deepsort

Hi @nwojke,

I am training on MARS and want to test the model export part. I am getting files in the output/mars/cosine-softmax/ like this:

checkpoint
events.out.tfevents.1555681314.newdatapath
graph.pbtxt
model.ckpt-0.data-00000-of-00001
model.ckpt-80.data-00000-of-00001
model.ckpt-161.data-00000-of-00001
model.ckpt-0.index
model.ckpt-80.index
model.ckpt-161.index
model.ckpt-0.meta
model.ckpt-80.meta
model.ckpt-161.meta

When I tried the command for model export using

python train_mars.py --mode=freeze --restore_path=output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001

It gave me the following errors. I am not sure if this caused by training (bad model data) or just about the exporting part. Thanks for your reply in advance.

Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-04-19 10:00:09.059699: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
2019-04-19 10:00:09.061005: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
2019-04-19 10:00:09.061033: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at save_restore_tensor.cc:175 : Data loss: Unable to open table file output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_mars.py", line 119, in
main()
File "train_mars.py", line 113, in main
output_filename="./mars.pb")
File "/data/xiaohui/2019/cosine_metric_learning/train_app.py", line 519, in freeze
saver.restore(session, checkpoint_path)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1276, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[node save/RestoreV2 (defined at /data/xiaohui/2019/cosine_metric_learning/train_app.py:518) ]]

Caused by op 'save/RestoreV2', defined at:
File "train_mars.py", line 119, in
main()
File "train_mars.py", line 113, in main
output_filename="./mars.pb")
File "/data/xiaohui/2019/cosine_metric_learning/train_app.py", line 518, in freeze
saver = tf.train.Saver(slim.get_variables_to_restore())
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in init
self.build()
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
build_save=build_save, build_restore=build_restore)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
restore_sequentially, reshape)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 332, in _AddRestoreOps
restore_sequentially)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 580, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
name=name)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/xiaohui/.conda/envs/MASK/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): Unable to open table file output/mars/cosine-softmax/model.ckpt-161.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[node save/RestoreV2 (defined at /data/xiaohui/2019/cosine_metric_learning/train_app.py:518) ]]

When I trained this model in Market, it took 7 days and was still training. The total file size of the storage space already has 81G. Is this normal?

add support for a new dataset

Hi, Nicolai @nwojke
Thank you for sharing the training code of deep_sort.

I'm excited to read it and test it on another dataset. Specifically, I try to add train_sdd.py and sdd.py to use the UAV dataset .
The following training command is used to start training:

python train_sdd.py \
    --loss_mode=cosine-softmax \
    --log_dir=./output/sdd/ \
    --run_id=cosine-softmax

However, I got the following logging errors:

Train set size: 140670 images, 115 identities
feature dimensionality:  128
---------------------------------------
Run ID:  cosine-softmax
Log directory:  ./output/sdd/cosine-softmax
---------------------------------------
EnqueueError: Cannot feed value of shape (128, 128, 128, 3) for Tensor 'Placeholder_1:0', which has shape '(?,)'

and the following warnings:

--- Logging error ---
Traceback (most recent call last):
  File "/home/yuan2/anaconda3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
OSError: [Errno 5] Input/output error
Call stack:
  File "train_sdd.py", line 141, in <module>
    main()
  File "train_sdd.py", line 82, in main
    **train_kwargs)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 188, in train_loop
    trainable_scopes=trainable_scopes)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 268, in create_trainer
    feature_var, logit_var = network_factory(image_var)
  File "/home/yuan2/cosine_metric_learning/nets/deep_sort/network_definition.py", line 120, in factory_fn
    weight_decay=weight_decay)
  File "/home/yuan2/cosine_metric_learning/nets/deep_sort/network_definition.py", line 84, in create_network
    features = tf.nn.l2_normalize(features, dim=1)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 315, in new_func
    instructions)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/tf_logging.py", line 118, in warning
    _get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: calling %s (from %s) with %s is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/home/yuan2/cosine_metric_learning/nets/deep_sort/network_definition.py:84', 'l2_normalize', 'tensorflow.python.ops.nn_impl', 'dim', 'in a future version', 'dim is deprecated, use axis instead')
--- Logging error ---
Traceback (most recent call last):
  File "/home/yuan2/anaconda3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
OSError: [Errno 5] Input/output error
Call stack:
  File "train_sdd.py", line 141, in <module>
    main()
  File "train_sdd.py", line 82, in main
    **train_kwargs)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 188, in train_loop
    trainable_scopes=trainable_scopes)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 269, in create_trainer
    _create_loss(feature_var, logit_var, label_var, mode=loss_mode)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 643, in _create_loss
    _create_softmax_loss(feature_var, logit_var, label_var)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 615, in _create_softmax_loss
    logit_var, tf.cast(label_var, tf.int64))
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 135, in new_func
    instructions)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/tf_logging.py", line 118, in warning
    _get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/home/yuan2/cosine_metric_learning/train_app.py:615', 'sparse_softmax_cross_entropy', 'tensorflow.contrib.losses.python.losses.loss_ops', 'after 2016-12-30', 'Use tf.losses.sparse_softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.')
--- Logging error ---
Traceback (most recent call last):
  File "/home/yuan2/anaconda3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
OSError: [Errno 5] Input/output error
Call stack:
  File "train_sdd.py", line 141, in <module>
    main()
  File "train_sdd.py", line 82, in main
    **train_kwargs)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 188, in train_loop
    trainable_scopes=trainable_scopes)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 269, in create_trainer
    _create_loss(feature_var, logit_var, label_var, mode=loss_mode)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 643, in _create_loss
    _create_softmax_loss(feature_var, logit_var, label_var)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 615, in _create_softmax_loss
    logit_var, tf.cast(label_var, tf.int64))
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
    return func(*args, **kwargs)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/losses/python/losses/loss_ops.py", line 435, in sparse_softmax_cross_entropy
    return compute_weighted_loss(losses, weights, scope=scope)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 135, in new_func
    instructions)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/tf_logging.py", line 118, in warning
    _get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:435', 'compute_weighted_loss', 'tensorflow.contrib.losses.python.losses.loss_ops', 'after 2016-12-30', 'Use tf.losses.compute_weighted_loss instead.')
--- Logging error ---
Traceback (most recent call last):
  File "/home/yuan2/anaconda3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
OSError: [Errno 5] Input/output error
Call stack:
  File "train_sdd.py", line 141, in <module>
    main()
  File "train_sdd.py", line 82, in main
    **train_kwargs)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 188, in train_loop
    trainable_scopes=trainable_scopes)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 269, in create_trainer
    _create_loss(feature_var, logit_var, label_var, mode=loss_mode)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 643, in _create_loss
    _create_softmax_loss(feature_var, logit_var, label_var)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 615, in _create_softmax_loss
    logit_var, tf.cast(label_var, tf.int64))
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
    return func(*args, **kwargs)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/losses/python/losses/loss_ops.py", line 435, in sparse_softmax_cross_entropy
    return compute_weighted_loss(losses, weights, scope=scope)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 136, in new_func
    return func(*args, **kwargs)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/losses/python/losses/loss_ops.py", line 152, in compute_weighted_loss
    add_loss(mean_loss)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 135, in new_func
    instructions)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/tf_logging.py", line 118, in warning
    _get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:152', 'add_arg_scope.<locals>.func_with_args', 'tensorflow.contrib.losses.python.losses.loss_ops', 'after 2016-12-30', 'Use tf.losses.add_loss instead.')
--- Logging error ---
Traceback (most recent call last):
  File "/home/yuan2/anaconda3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
OSError: [Errno 5] Input/output error
Call stack:
  File "train_sdd.py", line 141, in <module>
    main()
  File "train_sdd.py", line 82, in main
    **train_kwargs)
  File "/home/yuan2/cosine_metric_learning/train_app.py", line 199, in train_loop
    save_interval_secs=save_interval_secs, number_of_steps=number_of_steps)
  File "/home/yuan2/cosine_metric_learning/queued_trainer.py", line 408, in run
    **kwargs)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 736, in train
    init_fn=init_fn)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 135, in new_func
    instructions)
  File "/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/tf_logging.py", line 118, in warning
    _get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/home/yuan2/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py:736', 'Supervisor.__init__', 'tensorflow.python.training.supervisor', 'in a future version', 'Please switch to tf.train.MonitoredTrainingSession')

Could you help me? Thank you.

Pre-training and Data Augmentation

Hi Nwojke,
I had a few doubts I would like to clear. Since this is not pre-trained on Imagenet and you report that performance can increase with imagenet pre training, how should I go about it if I want to pre-train it? Will the improvement be significant in your opinion?
Also is test time augmentation done while reporting the results of this paper?
Again, since the image names carry vital info like tracklet id, label and camera, how can one do train data augmentation and still somehow take care of the image names? I presume these info are used during training somehow and augmenting will only disturb it. What are your thoughts on it?

Paper

Is the pre-print of the paper available now?

Training on custom data

Can this be trained on a custom dataset? For example, if I (hypothetically) wanted to use it to detect dogs in an image?

the error on training with VeRi

Hi, @nwojke
thank you for your sharing the training code of deep_sort

I train it on VeRi dataset, and use the code you provide for it. How it shows some error, I google it, but there is no answer like that kind of problem

Could you help me? thank you.

Error occur while training on VeRi dataset

@nwojke Thanks for your excellent work! I trained Deep-SORT model on Jeston TX2 , and this error occurs
can you help me figure out why this happens? Thank you very much!
##############################################
Train set size: 34267 images, 519 identites
('feature dimensionality: ', 128)
WARNING:tensorflow:From /home/nvidia/cosine_metric_learning/nets/deep_sort/network_definition.py:84: calling l2_normalize (from tensorflow.python.ops.nn_impl) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/nvidia/cosine_metric_learning/train_app.py:615: sparse_softmax_cross_entropy (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.sparse_softmax_cross_entropy instead. Note that the order of the logits and labels arguments has been changed.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:434: compute_weighted_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.compute_weighted_loss instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:147: add_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.

('Run ID: ', 'cosine-softmax')
('Log directory: ', './output/veri/cosine-softmax')

WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py:737: init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-02-21 02:46:09.671040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-02-21 02:46:09.671174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.53GiB
2019-02-21 02:46:09.671227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-02-21 02:46:10.997518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-21 02:46:10.997589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-02-21 02:46:10.997617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-02-21 02:46:10.997788: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3986 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-02-21 02:46:29.238167: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:415 : Invalid argument: TensorArray map/TensorArray_1_0: Could not write to TensorArray index 1 because the value shape is [373,413,3] which is incompatible with the TensorArray's inferred element shape: [207,229,3] (consider setting infer_shape=False).
('EnqueueError:', InvalidArgumentError())
2019-02-21 02:46:29.403975: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:415 : Invalid argument: TensorArray map/TensorArray_1_2: Could not write to TensorArray index 1 because the value shape is [608,653,3] which is incompatible with the TensorArray's inferred element shape: [420,277,3] (consider setting infer_shape=False).
('EnqueueError:', InvalidArgumentError())
2019-02-21 02:46:29.560724: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:415 : Invalid argument: TensorArray map/TensorArray_1_4: Could not write to TensorArray index 1 because the value shape is [210,181,3] which is incompatible with the TensorArray's inferred element shape: [219,511,3] (consider setting infer_shape=False).
('EnqueueError:', InvalidArgumentError())
2019-02-21 02:46:29.810188: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:415 : Invalid argument: TensorArray map/TensorArray_1_7: Could not write to TensorArray index 1 because the value shape is [167,169,3] which is incompatible with the TensorArray's inferred element shape: [328,284,3] (consider setting infer_shape=False).
('EnqueueError:', InvalidArgumentError())

Question About Code

I can not understand the use of "read_train_split_to_image" and "read_test_split_to_image" Function.How exactly we are using and linking these function to the other code.

python train_mars.py --mode=freeze --restore_path=PATH_TO_CHECKPOINT error

DataLossError (see above for traceback): Unable to open table file /home/sdu/cosine_metric_learning/output/mars/cosine-softmax/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[node save/RestoreV2 (defined at /home/cosine_metric_learning/train_app.py:518) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

why?

How can use this code for vehicle counting system

I want to count vehicle from video,how can I use deep sort
Thanks

The problem of model validation

In README.md ,the code splits off 10% of the training data for validation. But i found that the lable of training data and validation are different.So i how to evaluate my model?

where are the cosine softmax classifier and the corresponding loss in the code?

where is Market-1501_baseline-v16.01.14 file ?

after successfully training, when i try to test than i noticed that sdk directory as well as Market-1501_baseline-v16.01.14 file is missing.. any solutions ?

train cosine_softmax combine tripletloss, and when use freeze model in deepsort, error like this

like title, and detail bug info:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'net/map/TensorArray': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for
GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
TensorArrayReadV3: CPU
Enter: GPU CPU
TensorArrayV3: CPU
TensorArrayScatterV3: CPU
Placeholder: GPU CPU

Colocation members and user-requested devices:
net/images (Placeholder) /device:GPU:0
net/map/TensorArray (TensorArrayV3) /device:GPU:0
net/map/while/TensorArrayReadV3/Enter (Enter) /device:GPU:0
net/map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (TensorArrayScatterV3) /device:GPU:0
net/map/while/TensorArrayReadV3 (TensorArrayReadV3) /device:GPU:0

Registered kernels:
device='GPU'; dtype in [DT_BFLOAT16]
device='GPU'; dtype in [DT_COMPLEX128]
device='GPU'; dtype in [DT_COMPLEX64]
device='GPU'; dtype in [DT_DOUBLE]
device='GPU'; dtype in [DT_FLOAT]
device='GPU'; dtype in [DT_HALF]
device='CPU'

     [[Node: net/map/TensorArray = TensorArrayV3[clear_after_read=true, dtype=DT_UINT8, dynamic_size=false, element_shape=<unknown>, identical_element_shapes=true, tensor_array_name="", _device="/device:GPU:0"](net/map/strided_slice)]]

Changing image shapes to larger dimensions

Hi,
is there a way to change the image dimensions and train on larger images?, if so, could you please elaborate this point and the steps to be taken in this regard.
Thank you

validation on MARS

I am trying to validate on MARS but the tips given under Testing on modifying
utils/process_box_features.m seem not to be enough. The dimensions of the embedding in MARS evaluation are 1024 instead of 128. Do you have a round a modified branch to evaluate the networks trained here?

Thanks!

Questions about magnet loss

I would like to ask something about how to assign the datasets as the set of training set

triplet loss or cross entropy loss?

I have attached my training and validation graphs.am not able to reduce the cross entropy loss beyond 2.4 and the gap between training and validation accuracy is 30% most of the time.I understand this is overfitting but whatever I do am not able to reduce this overfitting.Attached few images for reference.My doubt is in the paper only the triplet loss is shown not the cross entropy loss like it is shown here.

Anyhelp will be great

How to evaluate the Market1501 model?

I have trained the Market1501 model ,but I don't know the model is best or not? What should I do? @nwojke

python train_mars.py --mode=freeze --restore_path=PATH_TO_CHECKPOINT Can anyone tell me what 's the PATH_TO_CHECKPOINT in the upper form ?

@nwojke python train_mars.py --mode=freeze --restore_path=PATH_TO_CHECKPOINT
Can anyone tell me what the PATH_TO_CHECKPOINT in the upper form is for?

I have got feat_query. mat and feat_test. mat. How to use them to evaluat the Market1501 model?

Tracking drones with deepsort

Great repo! For training on a custom drone dataset. I already have yolov3-tiny weight files trained on my own custom dataset for drones that I want to use with deep sort model for tracking drones but it looks like I need to do another training for cosine metric learning, I just need to confirm if that's right? also if this is correct, can you just mention the detailed steps for training on a custom dataset. It would really be beneficial for everyone and will greatly appreciate it!

nwojke / cosine_metric_learning Goto Github PK

cosine_metric_learning's Issues

Run ID: cosine-softmax Log directory: ./output/market1501/cosine-softmax

Run ID: cosine-softmax Log directory: ./output/veri/cosine-softmax

('Run ID: ', 'cosine-softmax') ('Log directory: ', './output/veri/cosine-softmax')

Recommend Projects

Recommend Topics

Recommend Org

Run ID: cosine-softmax
Log directory: ./output/market1501/cosine-softmax

Run ID: cosine-softmax
Log directory: ./output/veri/cosine-softmax

('Run ID: ', 'cosine-softmax')
('Log directory: ', './output/veri/cosine-softmax')