Code Monkey home page Code Monkey logo

svhnclassifier's Introduction

SVHNClassifier

A TensorFlow implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Graph

Graph

Results

Accuracy

Accuracy

Accuracy 93.45% on test dataset after about 14 hours

Loss

Loss

Samples

Training Test
Train1 Test1
Train2 Test2

Inference of outside image

digit "10" means no digits

Requirements

  • Python 2.7

  • Tensorflow

  • h5py

    In Ubuntu:
    $ sudo apt-get install libhdf5-dev
    $ sudo pip install h5py
    

Setup

  1. Clone the source code

    $ git clone https://github.com/potterhsu/SVHNClassifier
    $ cd SVHNClassifier
    
  2. Download SVHN Dataset format 1

  3. Extract to data folder, now your folder structure should be like below:

    SVHNClassifier
        - data
            - extra
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - test
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - train
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
    

Usage

  1. (Optional) Take a glance at original images with bounding boxes

    Open `draw_bbox.ipynb` in Jupyter
    
  2. Convert to TFRecords format

    $ python convert_to_tfrecords.py --data_dir ./data
    
  3. (Optional) Test for reading TFRecords files

    Open `read_tfrecords_sample.ipynb` in Jupyter
    Open `donkey_sample.ipynb` in Jupyter
    
  4. Train

    $ python train.py --data_dir ./data --train_logdir ./logs/train
    
  5. Retrain if you need

    $ python train.py --data_dir ./data --train_logdir ./logs/train2 --restore_checkpoint ./logs/train/latest.ckpt
    
  6. Evaluate

    $ python eval.py --data_dir ./data --checkpoint_dir ./logs/train --eval_logdir ./logs/eval
    
  7. Visualize

    $ tensorboard --logdir ./logs
    
  8. (Optional) Try to make an inference

    Open `inference_sample.ipynb` in Jupyter
    Open `inference_outside_sample.ipynb` in Jupyter
    $ python inference.py --image /path/to/image.jpg --restore_checkpoint ./logs/train/latest.ckpt
    
  9. Clean

    $ rm -rf ./logs
    or
    $ rm -rf ./logs/train2
    or
    $ rm -rf ./logs/eval
    

svhnclassifier's People

Contributors

potterhsu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

svhnclassifier's Issues

How to make a web server ?

I have successfully run this project and fine tuning with my custom data, thanks @potterhsu very much!

However, I don't know how to make it a web server to provide an interface which receives an image as input parameter and return the recognized result.

I wrote a function based on inference.py. I modified little, just changed it from a script to a function.

Unfortunately, the script (inference.py) runs well while the function based on it crashes, following is the error mesage:

F tensorflow/stream_executor/cuda/cuda_driver.cc: current context was not created by the StreamExecutor cuda_driver API: 0x36dc170; a CUDA runtime call was likely performed without using a StreamExecutor context
[1] 30570 abort (core dumped) python host.py

I'm a newcomer in TensorFlow, can you help me ? Thank you very much !

read and convert svhn dataset

Hello potterhsu,

I noticed the following in the convert_to_tfrecords.py file:

for idx, label_of_digit in enumerate(label_of_digits):
digits[idx] = int(label_of_digit if label_of_digit != 10 else 0) # label 10 is essentially digit zero

is this really correct? The digit zero is actually a digit while 10 is a no digit at all.

issue about model.py/dropout = tf.layers.dropout(pool, rate=drop_rate)

Hello, when i train the model with my own dataset, i found the statement:
dropout = tf.layers.dropout(pool, rate=drop_rate)
in model.py may be have problem.
The operation: tf.layers.dropout() in TensorFlow's document has an argument of training, and training=False by default, in your code, you didn't set the training argument, so, although you set the rate value, the model won't conduct the dropout operation???

inference_outside_sample.ipynb reshape

Hi, I found out that using tf.reshape causes errors. For this particular file, there is no need to use tf.reshape. Simply use tf.image.resize_images(image, [54, 54])

finetune doesn't converge

I trained the model using SVHN dataset and get around 80% accuracy on my own dataset. So I determine to finetune it on my dataset.

train_layers = ['hidden10', 'digit_length', 'digit1', 'digit2', 'digit3', 'digit4']
fine_tune_var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]
train_op = optimizer.minimize(loss, global_step=global_step, var_list=fine_tune_var_list)

I tried learning rate from 1e-2 to 1e-5 but the accuracy is always around 80% with loss around 1~2.
I wonder how to make it perform better?

Change input size

I saw that the code saves images with a shape of 64x64 pixels into TFRecords.
Then donkey.py reshapes data first into 64x64 then it extracts a random crop of 54x54.

I tried to change these dimensions because I want to use larger size.
I changed the line:
image = tf.reshape(image, [64, 64, 3])
with
image = tf.reshape(image, [128, 128, 3])

But I obtain the following error

ValueError: Dimensions must be equal, but are 128 and 32 for 'sparse_softmax_cross_entropy_loss/xentropy/xentropy' (op: 'SparseSoftmaxCrossEntropyWithLogits') with input shapes: [128,7], [32].

How can I achieve my goal?

Maxout units and subtractive normalization

Hello @potterhsu,

first I want to thank you for your implementation, it is doing great for me so far.

As I was reading the original paper by Goodfellow et al. I noticed that they mention the use of maxout units for the first hidden layer and relu elsewhere. Furthermore they use subtractive normalization. In your implementation, however, you use batch_normalization and relu activation functions on all layers. So, did I miss something or what were your reasons for those modifications? Thanks.

how to solve data imbalance

Hi,

I wonder what if for certain digits, I have more diverse samples in the dataset, for the rest, I have less diverse samples. In the current implementation and also the paper, I feel like the digit classifiers do not have knowledge transfer with regards to digit recognition. Is there any approaches to tackle with this kind of data imbalance problem? Or any transfer learning could be done here? Many thanks.

Will it work for characters?

I wanna recognize images including short characters as well as digits, like "A25", "B03" and so on. I wonder could this structure work on this situation?

inference problem

hi @potterhsu
i have a very odd problem i have trained my model and got an very good accuracy. but when i try to inference my images if i use batch sizes lower than 16 (i.e 1) i get rangom numbers mostly 9.
can you please help me?

How to use this project on windows

I was try to train this project on windows
the original version is on linux and using python2
I change some code and running this project on windows and using python3

Hope help anyone who need to run on windows

Here is the change :
#python2->python3 #linux->windows

Every print
print '123'
print ('123')

convert_to_tfrecords.py line 64
index = int(path_to_image_file.split("/")[-1].split('.')[0]) - 1
index = int(path_to_image_file.split("\\ ")[-1].split('.')[0]) - 1

reason:
linux path using "/" but windows using "\ "

evaluator.py line 53
for _ in xrange(num_batches):
for _ in range(int(num_batches)):

reason:
evaluator.py line 12
num_batches = num_examples / batch_size
in python2 division is int type but in python3 division is float type

Export the checkpoints to .pb format graph

Hello, for some reason, I have to frozen the variables of checkpoints file and export the graph to pb file. After research the graph structure, I export the pb file successfully using flowing code:

output_node_names = ['digit_length/dense/BiasAdd','stack']
with tf.Session() as sess:
    # Restore the graph
    saver = tf.train.import_meta_graph(FLAGS.meta_path)

    # Load Weight
    saver.restore(sess, FLAGS.checkpoint_path)

    # Freeze the graph
    frozen_graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, output_node_names)

    # Save the frozen graph
    with open('output/svhnclassification.pb', 'wb') as output:
        output.write(frozen_graph_def.SerializeToString())

But then I don't know how to load this pb file, and input a image nparray into the graph to get the predict result. Could you help me?

Thanks

How to access/interpret the confidence values?

Hello potterhsu,

the digits_predictions are accessed via:
digits_predictions = tf.argmax(digits_logits, axis=2)

but how can one access the confidence values itself, so that one can apply confidence thresholding as mentioned in the paper (Goodfellow et al.)?

In my case the digits_predictions for example is a tensor of shape (1,5,11). If I understand correctly, axis 1 corresponds to the position of a digit and axis 2 contains 11 (confidence?) values for each digit (0-9 and 10 as no digit), so that tf.argmax gets those digits with the highest values. But as I looked into it, the values ranged from -16.5 to 11.5. If those are indeed the confidence values, then how can one normalize them to a range between 0 and 1? Can you help me out?

Tensorboard View Scalars

Hi @potterhsu, thanks so much for documenting this, it has been very helpful to walk through. I had a question about running TensorBoard - in the training code I thought I saw references to summary = tf.summary.merge_all() which should send the scalar/image values to tb.

When I ran TensorBoard as specified, while I saw the graph structure loaded correctly, I couldn't see any graphs of the accuracy/loss over time. Do they only appear after quite some time? I have only been training for less than an hour so far.

Help with a fork of the model that doesn't converge

Hi! Thanks very much for this. I was able to get the model to train on a Google Cloud machine in about ~15 hours.

I've created a fork (https://github.com/jsomers/ChessClassifier) whose goal is to recognizes images of chess boards, like this one:

image

The idea is to say which pieces are in which square. So instead of the network terminating in 5 softmaxes of size 10, one for each of the digits, it terminates in 64 softmaxes of size 13 (13 because there are 6 possible black pieces, 6 possible white pieces, and the "null" piece, i.e., an empty square). I've removed the node that outputs a "length" because my length is always 64.

Unfortunately the model runs but doesn't train. The training accuracy is reported as "0.00000" and the loss hovers around ~60-65 indefinitely.

Is there any chance you might take a few minutes to discuss how I could get the model to converge? I have it on good authority that this problem is solvable, but perhaps my current model is broken in some way. My email address is listed on my website, which is listed on my Github profile. Thank you!

Freezing on Flask or gRPC server

I am trying to let Flask server to infer digits from input image.

So, I added server.py and implemented a view function that receives an image in POST method and sends the digit inference result back.

But there is freezing whenever the function starts inference.

Strictly the freezing is started at line 20 in model.py (self._hidden1 = nn.Sequential(...)).

This freezing can be also seen my gRPC server.

I am guessing that torch.jit.ScriptModule which is the super class of the model causes something but I am a PyTorch beginner and it's too hard to solve it.

Can somebody help me to get a solution or hint?

About the batchnorm layer

Since batchnorm layer should have different behaviour in training phase or testing phase, the code seems does not implement it... Could that be an issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.