Code Monkey home page Code Monkey logo

additive-margin-softmax's People

Contributors

joker316701882 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

additive-margin-softmax's Issues

acc on lfw cannot be imporoved over 0.9

Hello, I have tried several combinations of parameters while running train.py, such as epoch_size =200, max_epoch_num =500, keep_probability = 0.9. And the weight_decay and batch_size are set to 5e-4 and 256 as suggested. But the accuracy is around 0.9 on the lfw dataset. So I hope for some advice from you by setting training parameters .
What'more, I wonder whether it is because that I have not used any per-trained model. I'm gratitude to wait for your reply soon.

Some problems about align_lfw.py

As is mentioned is the README.md, I should align both the training set and lfw. Is this mean that I should use the align_lfw.py for twice? My setting is like following:
Step1 --"python align_lfw.py --input-dir /DataSet/lfwdataset/lfw-deepfunneled --output-dir /home/myname/lfwdataset"
Step2 --"python train.py --data_dir /home/myname/lfwdataset --random_flip --learning_rate -1 --learning_rate_schedule_file ./data/learning_rate_AM_softmax.txt --lfw_dir /home/myrname/lfwdataset --keep_probability 0.8 --weight_decay 5e-4"
Is there any problems? Hope for your reply soon, thank you~

can't load saved model

when set arg 'pretrained_model' to the path of some saved model, it will complain with errors:

Caused by op u'save/RestoreV2_54', defined at:
File "train.py", line 528, in
main(parse_arguments(sys.argv[1:]))
File "train.py", line 162, in main
saver_load = tf.train.Saver(tf.trainable_variables(), max_to_keep=1)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1218, in init
self.build()
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1227, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1263, in _build
build_save=build_save, build_restore=build_restore)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
restore_sequentially, reshape)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 267, in restore_op
[spec.tensor.dtype])[0])
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/td/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./trained_model/20180409-183909/
[[Node: save/RestoreV2_54 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_54/tensor_names, save/RestoreV2_54/shape_and_slices)]]
[[Node: save/RestoreV2_7/_497 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_490_save/RestoreV2_7", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Pre-trained model

Dear @Joker316701882 ,

Can you share the Tensorflow pre-trained model, so we can download and play with it. For example, the model trained with VGGFace2.
Thank you very much. Btw, what a great repository!

when I try to python train.py, test sets with lfw data sets ,error

Traceback (most recent call last):
File "train.py", line 542, in
main(parse_arguments(sys.argv[1:]))
File "train.py", line 213, in main
label_batch, lfw_paths, actual_issame, args.lfw_batch_size, args.lfw_nrof_folds, log_dir, step, summary_writer,best_accuracy,saver_save,model_dir,subdir)
File "train.py", line 313, in evaluate
_, _, accuracy, val, val_std, far = lfw.evaluate(emb_array, actual_issame, nrof_folds=nrof_folds)
File "/home/yjzx/Downloads/Additive-Margin-Softmax/lfw.py", line 43, in evaluate
np.asarray(actual_issame), 1e-3, nrof_folds=nrof_folds)
File "/home/yjzx/Downloads/Additive-Margin-Softmax/facenet.py", line 566, in calculate_val
val[fold_idx], far[fold_idx] = calculate_val_far(threshold, dist[test_set], actual_issame[test_set])
File "/home/yjzx/Downloads/Additive-Margin-Softmax/facenet.py", line 580, in calculate_val_far
val = float(true_accept) / float(n_same)
far = float(false_accept) / float(n_diff)
ZeroDivisionError: float division by zero

about cos_theta

Hi :
I have question : in order to get the cos_theta , here tf.matmul is used, when I train a net, "nan" log is showed, so I confused, tf.matmul get a matrix result, tf.multiply get a scalar value.
Can I use tf.multiply here ?

Validation on LFW

Hi, @Joker316701882. I use the CASIA-Webface as training set and the LFW dataset as validation set. However, during the training process, the accuracy on LFW dataset is always 50% and the selected threshold is always 0. To find the problem, I read the source code and find the euclidean distance is used to calculate the distance between the two embedded features. As I know, AM-Softmax is a angular-based face recognition algorithm. My problem is:

  1. Is that ok just using the euclidean-based distance to evaluate?
  2. Using the euclidean-based algorithm to evaluate is logically feasible, so what's problem for the fixed accuracy? Is there anything I forget to configure before training?

Hi, How can I use function AM_logits_compute?

I tried to use your resface.py as inference network while using Facenet as trainning logic.
But I am not sure how to use function AM_logits_compute, neither using it to replace prelogits_center_loss nor using it to replace 'logits' parameter of tf.nn.sparse_softmax_cross_entropy_with_logits worked.
Can you tell me how to use funtion AM_logits_compute? Thanks!

Training time suddenly increases at some point?

Epoch: [0][82/87866] Time 0.469 Loss 22.866 RegLoss 0.000000
Epoch: [0][83/87866] Time 0.468 Loss 22.917 RegLoss 0.000000
Epoch: [0][84/87866] Time 0.491 Loss 22.838 RegLoss 0.000000
Epoch: [0][85/87866] Time 0.453 Loss 22.843 RegLoss 0.000000
Epoch: [0][86/87866] Time 0.459 Loss 22.799 RegLoss 0.000000
Epoch: [0][87/87866] Time 0.459 Loss 22.761 RegLoss 0.000000
Epoch: [0][88/87866] Time 0.473 Loss 22.805 RegLoss 0.000000
Epoch: [0][89/87866] Time 0.466 Loss 22.893 RegLoss 0.000000
Epoch: [0][90/87866] Time 0.456 Loss 22.780 RegLoss 0.000000
Epoch: [0][91/87866] Time 0.457 Loss 22.721 RegLoss 0.000000
Epoch: [0][92/87866] Time 0.484 Loss 22.789 RegLoss 0.000000
Epoch: [0][93/87866] Time 0.465 Loss 22.911 RegLoss 0.000000
Epoch: [0][94/87866] Time 0.470 Loss 22.807 RegLoss 0.000000
Epoch: [0][95/87866] Time 0.483 Loss 22.808 RegLoss 0.000000
Epoch: [0][96/87866] Time 0.459 Loss 22.879 RegLoss 0.000000
Epoch: [0][97/87866] Time 0.459 Loss 22.673 RegLoss 0.000000
Epoch: [0][98/87866] Time 0.457 Loss 22.901 RegLoss 0.000000
Epoch: [0][99/87866] Time 0.494 Loss 22.780 RegLoss 0.000000
Epoch: [0][100/87866] Time 0.471 Loss 22.781 RegLoss 0.000000
Epoch: [0][101/87866] Time 0.458 Loss 22.840 RegLoss 0.000000
Epoch: [0][102/87866] Time 0.480 Loss 22.848 RegLoss 0.000000
Epoch: [0][103/87866] Time 0.470 Loss 22.788 RegLoss 0.000000
Epoch: [0][104/87866] Time 0.461 Loss 22.811 RegLoss 0.000000
Epoch: [0][105/87866] Time 0.471 Loss 22.855 RegLoss 0.000000
Epoch: [0][106/87866] Time 0.467 Loss 22.886 RegLoss 0.000000
Epoch: [0][107/87866] Time 0.453 Loss 22.856 RegLoss 0.000000
Epoch: [0][108/87866] Time 0.462 Loss 22.899 RegLoss 0.000000
Epoch: [0][109/87866] Time 0.490 Loss 22.719 RegLoss 0.000000
Epoch: [0][110/87866] Time 0.460 Loss 22.970 RegLoss 0.000000
Epoch: [0][111/87866] Time 0.457 Loss 22.761 RegLoss 0.000000
Epoch: [0][112/87866] Time 0.468 Loss 22.725 RegLoss 0.000000
Epoch: [0][113/87866] Time 0.479 Loss 22.768 RegLoss 0.000000
Epoch: [0][114/87866] Time 0.462 Loss 22.871 RegLoss 0.000000
Epoch: [0][115/87866] Time 0.456 Loss 22.947 RegLoss 0.000000
Epoch: [0][116/87866] Time 0.472 Loss 22.737 RegLoss 0.000000
Epoch: [0][117/87866] Time 0.472 Loss 22.790 RegLoss 0.000000
Epoch: [0][118/87866] Time 0.454 Loss 22.665 RegLoss 0.000000
Epoch: [0][119/87866] Time 0.453 Loss 22.742 RegLoss 0.000000
Epoch: [0][120/87866] Time 1.965 Loss 22.788 RegLoss 0.000000
Epoch: [0][121/87866] Time 1.992 Loss 22.809 RegLoss 0.000000
Epoch: [0][122/87866] Time 1.956 Loss 22.791 RegLoss 0.000000
Epoch: [0][123/87866] Time 2.076 Loss 22.890 RegLoss 0.000000
Epoch: [0][124/87866] Time 2.076 Loss 22.677 RegLoss 0.000000
Epoch: [0][125/87866] Time 2.029 Loss 22.981 RegLoss 0.000000
Epoch: [0][126/87866] Time 2.021 Loss 22.807 RegLoss 0.000000
Epoch: [0][127/87866] Time 1.953 Loss 22.661 RegLoss 0.000000
Epoch: [0][128/87866] Time 1.979 Loss 22.750 RegLoss 0.000000
Epoch: [0][129/87866] Time 2.029 Loss 22.915 RegLoss 0.000000

Train lfw for bug

Additive-Margin-Softmax ,train lfw as following eorror:
$python train.py --data_dir out_data1 --random_flip --learning_rate -1 --learning_rate_schedule_file ./data/learning_rate_AM_softmax.txt --lfw_dir data/lfw --keep_probability 0.8 --weight_decay 5e-4

1
2
3
Thanks for you !
molyswu

Hi, another question

Hi, I use MS1M as train dataset , but I can only get acc = 0.73, align is done on the dataset, imagesize = 224.

Is here any tricks?

ValueError on training

Im getting the following error while training.How can I be able to sort it out?

ValueError: Cannot have number of splits n_splits=10 greater than the number of samples: 0.

about the bias initialization in insightface and resface

speak first,i appreciate your excellent work very much. but what confuses me a lot is why you initialize the last fc layer(bias term) in insightface.py using xavier_init() while in resface.py using 0.
Are there any explanations?

thanks a lot

TypeError: Fetch argument None has invalid type <class 'NoneType'>

what happend??how to solve it? thanks
I want to train this model use own datasests,just change the datasetpath,but error is:

2018-11-22 17:13:28.944300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-11-22 17:13:29.488915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-22 17:13:29.488996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-11-22 17:13:29.489019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-11-22 17:13:29.489518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 18335 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:af:00.0, compute capability: 6.1)
WARNING:tensorflow:Error encountered when serializing regularization_losses.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'NoneType' object has no attribute 'name'
Running training
training a epoch...
Traceback (most recent call last):
File "train.py", line 519, in
main(parse_arguments(sys.argv[1:]))
File "train.py", line 182, in main
total_loss, train_op, summary_op, summary_writer, regularization_losses, args.learning_rate_schedule_file)
File "train.py", line 248, in train
err, _, step, reg_loss, summary_str = sess.run([loss, train_op, global_step, regularization_losses, summary_op], feed_dict=feed_dict)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1125, in _run
self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 427, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 245, in for_fetch
return _ListFetchMapper(fetch)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 352, in init
self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 352, in
self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 245, in for_fetch
return _ListFetchMapper(fetch)
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 352, in init
self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 352, in
self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 242, in for_fetch
type(fetch)))
TypeError: Fetch argument None has invalid type <class 'NoneType'>

Trouble with loading the checkpoint

Hello ! @Joker316701882
I train the resface20 model with vggface2 dataset .And I want to restore the model checkpoint by the
following code but it throw some error.

image_batch = tf.placeholder(tf.float32,shape=(None,112,96,3),name='input')
prelogits, _ = network.inference(image_batch, 1.0, phase_train=False, bottleneck_layer_size=512,
weight_decay=0.0)
embeddings = tf.nn.l2_normalize(prelogits, 1, 1e-10, name='embeddings')
saver = tf.train.Saver()
saver.restore(sess,tf.train.latest_checkpoint(save_path))

The error is:

NotFoundError: Key Resface/Bottleneck/BatchNorm/moving_mean not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_133 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_138_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

I didn't change any code, and I train the model just following your readme.md instruction.

Please tell me how to deal with it , Thank you very much .

About the Network backbone of the program

The res 20 network used in this program does not contain the 77 convolution layer and the 33 max-pooling or I miss them at some place? If you did it on purpose, why?

error:'_5_batch_join/fifo_queue' is closed and has insufficient elements (requested 256, current size 0)

Hi,I compiled for python train.py with lfw datasets,as follow error:
OutOfRangeError (see above for traceback): FIFOQueue '_5_batch_join/fifo_queue' is closed and has insufficient elements (requested 256, current size 0)
[[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_join/fifo_queue, _arg_batch_size_0_0)]]

Thanks for your help

some question about the model

Hi, I want to know something more about the News on 2018/08/29/.
You mentioned that "latest experiment-- Resface20(bn) + vggface2 + weight_decay5e-4 + batch_size256 + momentum achieves 0.995+-0.003". I think Resface20 represents the model you used. But what does the "vggface2" means here? Does it means a pre-trained model ?
Gratitude for your reply soon, thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.