Code Monkey home page Code Monkey logo

tinyyolov2's Introduction

TinyYOLOv2 in Tensorflow made easier

What you can do with this code

Extract weights from binary file of the original yolo-v2, assign them to a TF network, save ckpt, perform detection on an input image or webcam

What you CANNOT do with this code

Train in any way YOLOv2 for any dataset

Description

I've been searching for a Tensorflow implementation of YOLOv2 for a while but the darknet version and derivatives are not really easy to understand. This one is an hopefully easier-to-understand version of Tiny YOLOv2. The weight extraction, weights structure, weight assignment, network, inference and postprocessing are made as simple as possible.

The output of this implementation on the test image "dog.jpg" is the following:

alt text

Just to be clear, this implementation is called "tiny-yolo-voc" on pjreddie's site and can be found here:

alt text

This is a specific implementation of "tiny-yolo-voc" but the code could be re-used to import other configurations! You will need to change the network architecture and hyperparameters according to the "cfg" file you want to use.

The code is organized in this way:

  • weights_loader.py : loads the weights from pjreddie's binary weights file into the tensorflow network and saves the ckpt
  • net.py : contains the definition of the Tiny YOLOv2 network as defined in pjreddie's cfg file https://github.com/pjreddie/darknet/blob/master/cfg/yolov2-tiny-voc.cfg
  • test.py : performs detection on an input_image that you can define in the main. Outputs the input_image with B-Boxes
  • test_webcam.py: performs detection on the webcam. It is exactly like test.py but some functions are slightly modified to take directly the frames from the webcam as inputs (instead of the image_path).

To use this code:

  • Clone the project and place it where you want
  • Download the binary file (~60MB) from pjreddie's site: https://pjreddie.com/media/files/yolov2-tiny-voc.weights and place it into the folder where the scripts are
  • Launch test.py or test_webcam.py. Change the input_img_path and the weights_path in the main if you want, now the network has "dog.jpg" as input_img. The code is now configured to run with weights and input image in the same folder as the script.
python3 test.py
  • If you are launching them for the first time, the weights will be extracted from the binary file and a ckpt will be created. Next time only the ckpt will be used!

Requirements:

I've implemented everything with Tensorflow 1.0, Ubuntu 16.04, Numpy 1.13.0, Python 3.4, OpenCV 3.0

How to use the binary weights file ( Only if you want to use it in other projects, here it is already done )

I've been struggling on understanding how the binary weights file was written. I hope to save you some time by explaining how I imported the weights into a Tensorflow network:

  • Download the binary file from pjreddie's site: https://pjreddie.com/media/files/yolov2-tiny-voc.weights
  • Extract the weights from binary to a numpy float32 array with weight_array = np.fromfile(weights_path, dtype='f')
  • Delete the first 4 numbers because they are not relevant
  • Define a function ( load_conv_layer ) to take a part of the array and assign it to the Tensorflow variables of the net
  • IMPORTANT: the weights order is [ 'biases','gamma','moving_mean','moving_variance','kernel']
  • IMPORTANT: the 'biases' here refer to the beta value of the Batch Normalization. It does not refer to the biases that must be added after the conv2d because they are set all to zero! ( According to the paper by Ioffe et al. https://arxiv.org/abs/1502.03167 )
  • IMPORTANT: the kernel weights are written in Caffe style which means they have shape = (out_dim, in_dim, height, width). They must be converted into Tensorflow style which has shape = (height, width, in_dim, out_dim)
  • IMPORTANT: in order to obtain the correct results from the weights they need to be DENORMALIZED according to Batch Normalization. It can be done in two ways: define the network with Batch Normalization and use the weights as they are OR define the net without BN ( this implementation ) and DENORMALIZE the weights. ( details are in weights_loader.py )
  • In order to verify that the weights extraction is succesfull, I check the total number of params with the number of weights into the weight file. They are both 15867885 in my case.

How to postprocess the predictions ( Only if you want to use it in other projects, here it is already done )

Another key point is how the predictions tensor is made. It is a 13x13x125 tensor. To process it better:

  • Convert the tensor to have shape = 13x13x5x25 = grid_cells x n_boxes_in_each_cell x n_predictions_for_each_box
  • The 25 predictions are: 2 coordinates and 2 shape values (x,y,h,w), 1 Objectness score, 20 Class scores
  • Now access to the tensor in an easy way! E.g. predictions[row, col, b, :4] will return the 2 coords and shape of the "b" B-Box which is in the [row,col] grid cell
  • They must be postprocessed according to the parametrization of YOLOv2. In my implementation it is made like this:
# Pre-defined anchors shapes!
# They are not coordinates of the boxes, they are height and width of the 5 anchors defined by YOLOv2
anchors = [1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52]
image_height = image_width = 416
n_grid_cells = 13
n_b_boxes = 5

for row in range(n_grid_cells):
  for col in range(n_grid_cells):
    for b in range(n_b_boxes):

      tx, ty, tw, th, tc = predictions[row, col, b, :5]
      
      # IMPORTANT: (416) / (13) = 32! The coordinates and shape values are parametrized w.r.t center of the grid cell
      # They are parameterized to be in [0,1] so easier for the network to predict and learn
      # With the iterations on every grid cell at [row,col] they return to their original positions
      
      # The x,y coordinates are: (pre-defined coordinates of the grid cell [row,col] + parametrized offset)*32 
      center_x = (float(col) + sigmoid(tx)) * 32.0
      center_y = (float(row) + sigmoid(ty)) * 32.0

      # Also the width and height must return to the original value by looking at the shape of the anchors
      roi_w = np.exp(tw) * anchors[2*b + 0] * 32.0
      roi_h = np.exp(th) * anchors[2*b + 1] * 32.0
      
      # Compute the final objectness score (confidence that there is an object in the B-Box) 
      final_confidence = sigmoid(tc)

      class_predictions = predictions[row, col, b, 5:]
      class_predictions = softmax(class_predictions)
      

YOLOv2 predicts parametrized values that must be converted to full size by multiplying them by 32! You can see other EQUIVALENT ways to do this but this one works fine. I've seen someone who, instead of multiplying by 32, divides by 13 and then multiplies by 416 which at the end equals a single multiplication by 32.

Notes

  • The code runs at ~15fps on my laptop which has a 2GB Nvidia GeForce GTX 960M GPU
  • This implementation does not have the training part

If you have questions or suggestions do not wait! I'm looking forward to help

tinyyolov2's People

Contributors

simo23 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tinyyolov2's Issues

class prediction in test.py

thanks you for your code.

in the test.py, class prediction is calculated per a grid cell.
why is it?
i think, it is calculated based on all grid cell.

Offset and loaded weights differ by one

Hi,

I was inspired by your code on loading weights for tiny yolo v2 from coco weight file. I have implemented my yolo network as below:
YOLO-Net

Yolo-Sub-Func

However, there is a difference in the final offset and loaded weights of 1. I am not able to figure out what mistake I am making. Could you please help me out?
Here is more details:
LoadingWeight
LoadingWeight_2
LoadingWeight_3
WeightLoader

Size of weight file is -> 44948600
Total number of params to load = 11237146
Loading 496 weights of conv0 ...
Loading 4736 weights of conv1 ...
Loading 18688 weights of conv2 ...
Loading 74240 weights of conv3 ...
Loading 295936 weights of conv4 ...
Loading 1181696 weights of conv5 ...
Loading 4722688 weights of conv6 ...
Loading 4720640 weights of conv7 ...
Loading 218025 weights of conv8 ...
len-> 11237145
len-> 11237146
Final offset = 11237145
Total number of params in the weight file = 11237146

Name of All Variables:
All Var [<tf.Variable 'conv0/kernel:0' shape=(3, 3, 3, 16) dtype=float32_ref>, <tf.Variable 'conv0_bn/gamma:0' shape=(16,) dtype=float32_ref>, <tf.Variable 'conv0_bn/beta:0' shape=(16,) dtype=float32_ref>, <tf.Variable 'conv0_bn/moving_mean:0' shape=(16,) dtype=float32_ref>, <tf.Variable 'conv0_bn/moving_variance:0' shape=(16,) dtype=float32_ref>, <tf.Variable 'conv1/kernel:0' shape=(3, 3, 16, 32) dtype=float32_ref>, <tf.Variable 'conv1_bn/gamma:0' shape=(32,) dtype=float32_ref>, <tf.Variable 'conv1_bn/beta:0' shape=(32,) dtype=float32_ref>, <tf.Variable 'conv1_bn/moving_mean:0' shape=(32,) dtype=float32_ref>, <tf.Variable 'conv1_bn/moving_variance:0' shape=(32,) dtype=float32_ref>, <tf.Variable 'conv2/kernel:0' shape=(3, 3, 32, 64) dtype=float32_ref>, <tf.Variable 'conv2_bn/gamma:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'conv2_bn/beta:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'conv2_bn/moving_mean:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'conv2_bn/moving_variance:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'conv3_1/kernel:0' shape=(3, 3, 64, 128) dtype=float32_ref>, <tf.Variable 'conv3_1_bn/gamma:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'conv3_1_bn/beta:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'conv3_1_bn/moving_mean:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'conv3_1_bn/moving_variance:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'conv4_1/kernel:0' shape=(3, 3, 128, 256) dtype=float32_ref>, <tf.Variable 'conv4_1_bn/gamma:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'conv4_1_bn/beta:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'conv4_1_bn/moving_mean:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'conv4_1_bn/moving_variance:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'conv5_1/kernel:0' shape=(3, 3, 256, 512) dtype=float32_ref>, <tf.Variable 'conv5_1_bn/gamma:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv5_1_bn/beta:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv5_1_bn/moving_mean:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv5_1_bn/moving_variance:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv6_1/kernel:0' shape=(3, 3, 512, 1024) dtype=float32_ref>, <tf.Variable 'conv6_1_bn/gamma:0' shape=(1024,) dtype=float32_ref>, <tf.Variable 'conv6_1_bn/beta:0' shape=(1024,) dtype=float32_ref>, <tf.Variable 'conv6_1_bn/moving_mean:0' shape=(1024,) dtype=float32_ref>, <tf.Variable 'conv6_1_bn/moving_variance:0' shape=(1024,) dtype=float32_ref>, <tf.Variable 'conv6_3/kernel:0' shape=(3, 3, 1024, 512) dtype=float32_ref>, <tf.Variable 'conv6_3_bn/gamma:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv6_3_bn/beta:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv6_3_bn/moving_mean:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv6_3_bn/moving_variance:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'conv_dec/kernel:0' shape=(1, 1, 512, 425) dtype=float32_ref>, <tf.Variable 'conv_dec/bias:0' shape=(425,) dtype=float32_ref>]
Thanks

test.py crashes

[~/tinyYOLOv2]$ python3 test.py

Total number of params = 15867885
2018-12-25 11:34:48.750981: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Looking for a checkpoint...
No checkpoint found!
Loading weights from file and creating new checkpoint...
Total number of params to load = 15867885
Loading 496 weights of conv1 ...
Loading 4736 weights of conv2 ...
Loading 18688 weights of conv3 ...
Loading 74240 weights of conv4 ...
Loading 295936 weights of conv5 ...
Loading 1181696 weights of conv6 ...
Loading 4722688 weights of conv7 ...
Loading 9441280 weights of conv8 ...
Loading 128125 weights of conv9 ...
Final offset = 15867885
Total number of params in the weight file = 15867885
Saving new checkpoint to the new checkpoint directory ./ckpt/ !
Preprocessing...
Traceback (most recent call last):
File "test.py", line 249, in
tf.app.run(main=main)
File "/home/amd/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "test.py", line 237, in main
preprocessed_image = preprocessing(input_img_path,input_height,input_width)
File "test.py", line 82, in preprocessing
resized_image = cv2.resize(input_image,(input_height, input_width), interpolation = cv2.INTER_CUBIC)
cv2.error: OpenCV(3.4.4) /io/opencv/modules/imgproc/src/resize.cpp:3784: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

Purpose of load_conv_layer_bn

Hi, I finally figured out the purpose for my NaN value. It's due to line 53 of https://github.com/simo23/tinyYOLOv2/blob/master/weights_loader.py which is:
scale = gammas[i] / np.sqrt(var[i] + net.bn_epsilon)
as there was a negative number being square root. However, because I do not fully understand the code, I am unable to figure out why the negative value exist. It is definitely due to this line:
var = loaded_weights[offset:offset+n_bn_var] at line 42 but I am not very sure what var is for.. I am puzzled since the only difference between my code and yours is that I have 1 class only while you have 20 so I change the relevant parts such as the last layer. Sorry for the trouble man!

Tiny Yolo is very slow on Jetson TX1

hi simo,
hw r u doing....

i have taken your tinyYOLOv2 and start running on my GPU(GTX-980), its working good. But i need the same tiny-yolo to be implemented on Jetson TX1. I have tried that, but it was very slow it was around 2 fps but where as in my GPU its touching to 30fps. Earlier to tiny-yolo i have done people detection and is working well on TX1. My Jetson TX1 board is of 4GB RAM and even i have swapped with 16GB(total 20GB). So, could you please suggest me ways to increase the speed.

implemented on :
Ubuntu 16.04
Tensorflow 1.0
Python 2.7
OpenCV 3.0
Numpy 1.13.0

thanks in advance,
parvez.

Support YOLOv2

Did you plan to support YOLOv2 ?
If yes, do you know when ?
If no, do you know why ?
And what we could do to help to do it ?

Thanks

Outdated weights file name

It looks like the app expects a different name for the weights file being downloaded. Renaming the file works.

mv yolov2-tiny-voc.weights ./tiny-yolo-voc.weights

Any update on training part

@simo23 Thanks for your wonderful work!
I am trying the low-precision version of tiny-yolo recently, and I wonder if there is any update on training part.
Best wishes!

How to open the binary weights file?

Hi, Simo! You've done an amazing project and really help me a lot! I have a small question that how do you open the binary weights file like tiny-yolo-voc.weights? I wanted to edit it with 'vi tiny-yolo-voc.weights' or 'gedit tiny-yolo-voc.weights' in Ubuntu 16.04 but failed. Does it need any special applications to open it? Thx.

Hello,thank you for your code.

My question is ,is the network in net.py is based on MobilenetV1? I want to change the net.py based on MobilenetV2,what should I do ? Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.