potterhsu / svhnclassifier Goto Github PK

A TensorFlow implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks (http://arxiv.org/pdf/1312.6082.pdf)

License: GNU General Public License v3.0

Python 1.75% Jupyter Notebook 98.25%

deep-learning svhn tensorflow

svhnclassifier's Introduction

SVHNClassifier

A TensorFlow implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Graph

Results

Accuracy

Accuracy 93.45% on test dataset after about 14 hours

Loss

Samples

Training	Test

Inference of outside image

digit "10" means no digits

Requirements

Python 2.7
Tensorflow

h5py

In Ubuntu:
$ sudo apt-get install libhdf5-dev
$ sudo pip install h5py

Setup

Clone the source code

$ git clone https://github.com/potterhsu/SVHNClassifier
$ cd SVHNClassifier

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to TFRecords format

$ python convert_to_tfrecords.py --data_dir ./data

(Optional) Test for reading TFRecords files

Open `read_tfrecords_sample.ipynb` in Jupyter
Open `donkey_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --train_logdir ./logs/train

Retrain if you need

$ python train.py --data_dir ./data --train_logdir ./logs/train2 --restore_checkpoint ./logs/train/latest.ckpt

Evaluate

$ python eval.py --data_dir ./data --checkpoint_dir ./logs/train --eval_logdir ./logs/eval

Visualize
```
$ tensorboard --logdir ./logs
```

(Optional) Try to make an inference

Open `inference_sample.ipynb` in Jupyter
Open `inference_outside_sample.ipynb` in Jupyter
$ python inference.py --image /path/to/image.jpg --restore_checkpoint ./logs/train/latest.ckpt

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs/train2
or
$ rm -rf ./logs/eval

svhnclassifier's People

Contributors

Stargazers

Watchers

Forkers

jaredyedh ferama lixuan0023 songxun-buaa lotusk xiobama12 kdg1016 tarnenok dengcy028 sundawei samithaj rosssong tuandao2511 ceniymei amayalyj glenncameron2 minmingzhao marx1855 lawrencemq1992 jsomers margadongit gogameboy11 masonwang513 successren kinglu lngao andymc629 charubutr gjmulder jimmkimoon zhikangd fuorigregge dashankadesilva nrstott eric-czech griffinkelly ankita-das pikaqiu10086 xuwangyin arvind-india xiaohuicomeon rusuvalentin nagahama-d mmwa klqulei blueyxq76 bruceou0614 rahuljinturkar tangboheng vcvycy yugenlgy tedyao grayking skloismary wuhaodemo yanxiaofei395118 dgudlek melnimr 360wcui alishsuper ms-5898 frankzzzyf grand-cat-unknown jbottala02 belal-bh powpi2000 zhouyingchaoai mhloulis rajaoldman zero1997 dk2803 sanwan juicedata luozhihao2 joaom02

svhnclassifier's Issues

How to make a web server ?

I have successfully run this project and fine tuning with my custom data, thanks @potterhsu very much!

However, I don't know how to make it a web server to provide an interface which receives an image as input parameter and return the recognized result.

I wrote a function based on inference.py. I modified little, just changed it from a script to a function.

Unfortunately, the script (inference.py) runs well while the function based on it crashes, following is the error mesage:

F tensorflow/stream_executor/cuda/cuda_driver.cc: current context was not created by the StreamExecutor cuda_driver API: 0x36dc170; a CUDA runtime call was likely performed without using a StreamExecutor context
[1] 30570 abort (core dumped) python host.py

I'm a newcomer in TensorFlow, can you help me ? Thank you very much !

Can you share the checkpoints as I don't have the resources to train the model

read and convert svhn dataset

Hello potterhsu,

I noticed the following in the convert_to_tfrecords.py file:

for idx, label_of_digit in enumerate(label_of_digits):
digits[idx] = int(label_of_digit if label_of_digit != 10 else 0) # label 10 is essentially digit zero

is this really correct? The digit zero is actually a digit while 10 is a no digit at all.

issue about model.py/dropout = tf.layers.dropout(pool, rate=drop_rate)

Hello, when i train the model with my own dataset, i found the statement:
dropout = tf.layers.dropout(pool, rate=drop_rate)
in model.py may be have problem.
The operation: tf.layers.dropout() in TensorFlow's document has an argument of training, and training=False by default, in your code, you didn't set the training argument, so, although you set the rate value, the model won't conduct the dropout operation???

How long did it take to train the model?

What were the hardware requirements for training the model? How long did it take?
Do you think this model would be trainable in a PC with 8gb of RAM ?

inference_outside_sample.ipynb reshape

Hi, I found out that using tf.reshape causes errors. For this particular file, there is no need to use tf.reshape. Simply use tf.image.resize_images(image, [54, 54])

finetune doesn't converge

I trained the model using SVHN dataset and get around 80% accuracy on my own dataset. So I determine to finetune it on my dataset.

train_layers = ['hidden10', 'digit_length', 'digit1', 'digit2', 'digit3', 'digit4']
fine_tune_var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]
train_op = optimizer.minimize(loss, global_step=global_step, var_list=fine_tune_var_list)

I tried learning rate from 1e-2 to 1e-5 but the accuracy is always around 80% with loss around 1~2.
I wonder how to make it perform better?

Change input size

I saw that the code saves images with a shape of 64x64 pixels into TFRecords.
Then donkey.py reshapes data first into 64x64 then it extracts a random crop of 54x54.

I tried to change these dimensions because I want to use larger size.
I changed the line:
image = tf.reshape(image, [64, 64, 3])
with
image = tf.reshape(image, [128, 128, 3])

But I obtain the following error

ValueError: Dimensions must be equal, but are 128 and 32 for 'sparse_softmax_cross_entropy_loss/xentropy/xentropy' (op: 'SparseSoftmaxCrossEntropyWithLogits') with input shapes: [128,7], [32].

How can I achieve my goal?

Maxout units and subtractive normalization

Hello @potterhsu,

first I want to thank you for your implementation, it is doing great for me so far.

As I was reading the original paper by Goodfellow et al. I noticed that they mention the use of maxout units for the first hidden layer and relu elsewhere. Furthermore they use subtractive normalization. In your implementation, however, you use batch_normalization and relu activation functions on all layers. So, did I miss something or what were your reasons for those modifications? Thanks.

ValueError: invalid literal for int() with base 10: '

Is it possible to access a trained model somewhere?

Thanks

How to prepare and train my custom data ?

how to solve data imbalance

Hi,

I wonder what if for certain digits, I have more diverse samples in the dataset, for the rest, I have less diverse samples. In the current implementation and also the paper, I feel like the digit classifiers do not have knowledge transfer with regards to digit recognition. Is there any approaches to tackle with this kind of data imbalance problem? Or any transfer learning could be done here? Many thanks.

Will it work for characters?

I wanna recognize images including short characters as well as digits, like "A25", "B03" and so on. I wonder could this structure work on this situation?

inference problem

hi @potterhsu
i have a very odd problem i have trained my model and got an very good accuracy. but when i try to inference my images if i use batch sizes lower than 16 (i.e 1) i get rangom numbers mostly 9.
can you please help me?

How to use this project on windows

I was try to train this project on windows
the original version is on linux and using python2
I change some code and running this project on windows and using python3

Hope help anyone who need to run on windows

Here is the change :
#python2->python3 #linux->windows

Every print
print '123'
print ('123')

convert_to_tfrecords.py line 64
index = int(path_to_image_file.split("/")[-1].split('.')[0]) - 1
index = int(path_to_image_file.split("\\ ")[-1].split('.')[0]) - 1

reason:
linux path using "/" but windows using "\ "

evaluator.py line 53
for _ in xrange(num_batches):
for _ in range(int(num_batches)):

reason:
evaluator.py line 12
num_batches = num_examples / batch_size
in python2 division is int type but in python3 division is float type

Export the checkpoints to .pb format graph

Hello, for some reason, I have to frozen the variables of checkpoints file and export the graph to pb file. After research the graph structure, I export the pb file successfully using flowing code:

output_node_names = ['digit_length/dense/BiasAdd','stack']
with tf.Session() as sess:
    # Restore the graph
    saver = tf.train.import_meta_graph(FLAGS.meta_path)

    # Load Weight
    saver.restore(sess, FLAGS.checkpoint_path)

    # Freeze the graph
    frozen_graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, output_node_names)

    # Save the frozen graph
    with open('output/svhnclassification.pb', 'wb') as output:
        output.write(frozen_graph_def.SerializeToString())

But then I don't know how to load this pb file, and input a image nparray into the graph to get the predict result. Could you help me?

Thanks

How to access/interpret the confidence values?

Hello potterhsu,

the digits_predictions are accessed via:
digits_predictions = tf.argmax(digits_logits, axis=2)

but how can one access the confidence values itself, so that one can apply confidence thresholding as mentioned in the paper (Goodfellow et al.)?

In my case the digits_predictions for example is a tensor of shape (1,5,11). If I understand correctly, axis 1 corresponds to the position of a digit and axis 2 contains 11 (confidence?) values for each digit (0-9 and 10 as no digit), so that tf.argmax gets those digits with the highest values. But as I looked into it, the values ranged from -16.5 to 11.5. If those are indeed the confidence values, then how can one normalize them to a range between 0 and 1? Can you help me out?

Tensorboard View Scalars

Hi @potterhsu, thanks so much for documenting this, it has been very helpful to walk through. I had a question about running TensorBoard - in the training code I thought I saw references to summary = tf.summary.merge_all() which should send the scalar/image values to tb.

When I ran TensorBoard as specified, while I saw the graph structure loaded correctly, I couldn't see any graphs of the accuracy/loss over time. Do they only appear after quite some time? I have only been training for less than an hour so far.

Help with a fork of the model that doesn't converge

Hi! Thanks very much for this. I was able to get the model to train on a Google Cloud machine in about ~15 hours.

I've created a fork (https://github.com/jsomers/ChessClassifier) whose goal is to recognizes images of chess boards, like this one:

The idea is to say which pieces are in which square. So instead of the network terminating in 5 softmaxes of size 10, one for each of the digits, it terminates in 64 softmaxes of size 13 (13 because there are 6 possible black pieces, 6 possible white pieces, and the "null" piece, i.e., an empty square). I've removed the node that outputs a "length" because my length is always 64.

Unfortunately the model runs but doesn't train. The training accuracy is reported as "0.00000" and the loss hovers around ~60-65 indefinitely.

Is there any chance you might take a few minutes to discuss how I could get the model to converge? I have it on good authority that this problem is solvable, but perhaps my current model is broken in some way. My email address is listed on my website, which is listed on my Github profile. Thank you!

Freezing on Flask or gRPC server

I am trying to let Flask server to infer digits from input image.

So, I added server.py and implemented a view function that receives an image in POST method and sends the digit inference result back.

But there is freezing whenever the function starts inference.

Strictly the freezing is started at line 20 in model.py (self._hidden1 = nn.Sequential(...)).

This freezing can be also seen my gRPC server.

I am guessing that torch.jit.ScriptModule which is the super class of the model causes something but I am a PyTorch beginner and it's too hard to solve it.

Can somebody help me to get a solution or hint?

About the batchnorm layer

Since batchnorm layer should have different behaviour in training phase or testing phase, the code seems does not implement it... Could that be an issue?