Code Monkey home page Code Monkey logo

Comments (14)

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 The origin paper has training details about coco-stuff dataset:)

from bisenetv2-tensorflow.

mychina75 avatar mychina75 commented on May 24, 2024

in the paper, 150K, 10K, 20K iterations for the Cityscapes dataset, CamVid dataset, and COCO-Stuff datasets respectively....
but image number of COCO db is much larger than Cityscapes.. why the iterations so small?
maybe something wrong?

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 That's a problem which you may get a satisfied answer at https://github.com/ycszen/BiSeNet (The origin auther's repo) :)

from bisenetv2-tensorflow.

mychina75 avatar mychina75 commented on May 24, 2024

thank you. I will check.
and There is a error report about resume training... plz check.

##################
2020-05-25 16:52:38.994 | INFO | trainner.human_bisenetv2_multi_gpu_trainner:init:229 - Initialize human bisenetv2 multi gpu trainner complete
2020-05-25 16:52:41.706 | INFO | trainner.human_bisenetv2_multi_gpu_trainner:train:319 - => Restoring weights from: ./model/coco_human/bisenetv2/ ...
WARNING:tensorflow:From /opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-05-25 16:52:42.368599: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
2020-05-25 16:52:42.376 | ERROR | trainner.human_bisenetv2_multi_gpu_trainner:train:332 - Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
(0) Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
[[node loader_and_saver/save/RestoreV2 (defined at /opt/project/semantic_segmentation/bisenetv2-tensorflow-master/trainner/human_bisenetv2_multi_gpu_trainner.py:201) ]]
(1) Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
[[node loader_and_saver/save/RestoreV2 (defined at /opt/project/semantic_segmentation/bisenetv2-tensorflow-master/trainner/human_bisenetv2_multi_gpu_trainner.py:201) ]]
[[loader_and_saver/save/RestoreV2/_37]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'loader_and_saver/save/RestoreV2':
File "tools/train_bisenetv2_human.py", line 40, in
train_model()
File "tools/train_bisenetv2_human.py", line 27, in train_model
worker = multi_gpu_trainner.BiseNetV2HumanMultiTrainer() #MultiTrainer()
File "/opt/project/semantic_segmentation/bisenetv2-tensorflow-master/trainner/human_bisenetv2_multi_gpu_trainner.py", line 201, in init
self._loader = tf.train.Saver(self._net_var)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in init
self.build()
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
build_restore=build_restore)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

2020-05-25 16:52:42.377 | INFO | trainner.human_bisenetv2_multi_gpu_trainner:train:333 - => Can not load pretrained model weights: ./model/coco_human/bisenetv2/
2020-05-25 16:52:42.377 | INFO | trainner.human_bisenetv2_multi_gpu_trainner:train:334 - => Now it starts to train BiseNetV2 from scratch ...

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 Which ckpt file did you use to do resume training?

from bisenetv2-tensorflow.

mychina75 avatar mychina75 commented on May 24, 2024

I set the model_checkpoint_path as "./model/coco_human/bisenetv2/"
and make some changes in restore:
ckpt = tf.train.get_checkpoint_state(os.path.dirname(self._initial_weight))
self._loader.restore(self._sess, ckpt.model_checkpoint_path) #moself._initial_weight)

the original code: 'self._loader.restore(self._sess, self._initial_weight)'
not work for the SNAPSHOT_PATH: './model/coco_human/bisenetv2/human_train_miou=0.4369.ckpt-1.index'
either...

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 The snapshot file path should be ./model/coco_human/bisenetv2/human_train_miou=0.4369.ckpt-1 instead:)

from bisenetv2-tensorflow.

mychina75 avatar mychina75 commented on May 24, 2024

额... 还是这个错误,Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint

################
2020-05-26 09:39:59.213 | INFO | trainner.human_bisenetv2_multi_gpu_trainner:train:319 - => Restoring weights from: ./model/coco_human/bisenetv2/human_train_miou=0.4369.ckpt-1 ...
WARNING:tensorflow:From /opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-05-26 09:39:59.880135: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
2020-05-26 09:39:59.928 | ERROR | trainner.human_bisenetv2_multi_gpu_trainner:train:332 - Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
(0) Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
[[node loader_and_saver/save/RestoreV2 (defined at /project/semantic_segmentation/bisenetv2-tensorflow-master/trainner/human_bisenetv2_multi_gpu_trainner.py:201) ]]
(1) Not found: Key BiseNetV2/aggregation_branch/guided_aggregation_block/aggregation_features/aggregation_feature_output/bn/beta/Momentum not found in checkpoint
[[node loader_and_saver/save/RestoreV2 (defined at /project/semantic_segmentation/bisenetv2-tensorflow-master/trainner/human_bisenetv2_multi_gpu_trainner.py:201) ]]
[[loader_and_saver/save/RestoreV2/_223]]

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 你的ckpt文件怎么生成的?确定ckpt文件的路径没有输入错误吗。你这个错误就是ckpt模型文件和当前的计算图模型不匹配:)

from bisenetv2-tensorflow.

mychina75 avatar mychina75 commented on May 24, 2024

模型保存没改呀,就在xxx_gpu_trainner.py里面
# define saver and loader
with tf.variable_scope('loader_and_saver'):
self._net_var = [vv for vv in tf.global_variables() if 'lr' not in vv.name]
self._loader = tf.train.Saver(self._net_var)
self._saver = tf.train.Saver(max_to_keep=5)
restore在这里:
if CFG.TRAIN.RESTORE_FROM_SNAPSHOT.ENABLE:
try:
LOG.info('=> Restoring weights from: {:s} ... '.format(self._initial_weight))
self._loader.restore(self._sess, self._initial_weight)
...

是不是跟FREEZE_BN的设置有关,默认ENABLE: False
代码里面有判断:
# define moving average op
with tf.variable_scope(name_or_scope='moving_avg'):
if CFG.TRAIN.FREEZE_BN.ENABLE:
train_var_list = [
v for v in tf.trainable_variables() if 'beta' not in v.name and 'gamma' not in v.name
]
else:
train_var_list = tf.trainable_variables()
需要单独保存一下这个参数?

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 默认是不freeze bn的 你如果使用的是训练过程中保存的ckpt文件的话 不应该有这个问题。如果你使用的是预测过程中保存的ckpt文件那么会出现这个问题。这个我之前都是自己试用过的,没有问题,下来有时间我再测试下:)

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 还有就是你能不能提供更详细的能复现你的问题的过程。比如你修改了代码的什么地方,然后怎么开始训练的,怎么保存参数,怎么开始restore weights的:)

from bisenetv2-tensorflow.

mychina75 avatar mychina75 commented on May 24, 2024

解决了,改了下*_gpu_trainner.py的这个地方。貌似有些变量没有存下来,改了以后.meta文件从7.35MB变到了9.09MB。应该不会影响pb文件。
# define saver and loader
with tf.variable_scope('loader_and_saver'):
self._net_var = [vv for vv in tf.global_variables() if 'lr' not in vv.name]
self._loader = tf.train.Saver(self._net_var)
self._saver = tf.train.Saver(tf.global_variables(), max_to_keep=5)
---------------------- 》
with tf.variable_scope('loader_and_saver'):
self._net_var = [vv for vv in tf.global_variables() if 'lr' not in vv.name]
self._loader = tf.train.Saver(self._net_var)
self._saver = tf.train.Saver(max_to_keep=5)

from bisenetv2-tensorflow.

MaybeShewill-CV avatar MaybeShewill-CV commented on May 24, 2024

@mychina75 好滴:)

from bisenetv2-tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.