Code Monkey home page Code Monkey logo

Comments (10)

tuandoan998 avatar tuandoan998 commented on September 12, 2024 2
  1. OpenCV return size (H, W, C) when using cv.imread(), but X_data need size [B, W, H, C] or [[B, W, H, C]. So, have to transformed the image.
  2. First, not padded ones. https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Preprocessor.py#L8
    Second, padding depends on whether the background of the image has inverted or not.

There may be lots of hard and unclean code. But you should read and understand before making a question. Cheers.

from handwritten-text-recognition.

tuandoan998 avatar tuandoan998 commented on September 12, 2024 1

@tuandoan998, I am trying to understand how you have pre-processed your images and labels.
What is
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L20

the purpose of i_len or input_length?
You have initialised it to 30 for word model.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Parameter.py#L7

And this is the only place you have used it:
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L57

Why multiply with 30?

Size of output of word-word model is (Batch_size, 32, 80) (see Resource/word_model.png).
32 is number timesteps. The first 2 timesteps are usually worth the rubbish.
So, i_len = 30 (=32-2): number of timesteps to decode output to label with CTC.

from handwritten-text-recognition.

naveen-marthala avatar naveen-marthala commented on September 12, 2024 1
  1. I didn't know that array with (batch_size, width, height, channels) had to fed. I am a newbie and this is the first project I am exploring. Thanks.
  2. My bad. And I couldn't understand this part:

depends on whether the background of the image has inverted or not.

mainly "inverted or not".

It really would be foolish of me to say that I have read and tried to understand things. But, I actually have and didn't write these questions right after I had a doubt. I did look up internet and asked questions only specific to code in this repository. Cheers.

from handwritten-text-recognition.

tuandoan998 avatar tuandoan998 commented on September 12, 2024 1

An example of an image with a value in the range [0-255] (or [0-1]). The black foreground value is close to 0, the white background has a value close to 255 (or 1).
In this case, the value 255 (or 1) should be padded around the image if necessary.
In some cases, to facilitate the training of neural network models, people often inverted the color of the image ([0-255] -> [255-0]). So, the value 0 should be padded, like you said.

This is also my first project =)))

from handwritten-text-recognition.

tuandoan998 avatar tuandoan998 commented on September 12, 2024 1

Inputs also include label_length, labels will be truncated when training.

from handwritten-text-recognition.

tuandoan998 avatar tuandoan998 commented on September 12, 2024 1

https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Utils.py#L55
I have ignored samples have length greater than 16. You can change the config 'max_text_len': 21 if you want.

from handwritten-text-recognition.

tuandoan998 avatar tuandoan998 commented on September 12, 2024

Here's how, as I could understand, you have pre-processed your images. Please tell me where I have got wrong or missed something.

Put briefly,

for word model, batch size = 64.
So, you read 64 images along with their labels.
then upscale all the images to (128, 64, 1) by padding zeroes/
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Preprocessor.py#L38

convert them to grayscale.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Preprocessor.py#L42

and store them in your batch array.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L67

Then, get the labelled version of text:
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L68

So, for every batch, you will be sending to your model,
64 images, labels of these 64 images, a vector of shape=(64, 1) filled with 30s and finally length of each label,
as shown here, to your inputs argument of your model.fit_generator method, right?
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L71-L76

and outputs are, a dict of zeroes, with shape=(64,1).
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L77

That's how, as I could understand, you have pre-processed and fed to your model in each batch. What have I missed and where have I got wrong? Please do tell me.

I don't understand your question.

from handwritten-text-recognition.

naveen-marthala avatar naveen-marthala commented on September 12, 2024

I have replicated the way you pre-processed your images. I now two small doubts with pre-processing.

  1. You have transformed the images before feeding to your model.
    https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L62
    Above line is in next_batch method of TextImageGenerator class in "ImageGenerator.py"
    It has something to do with computation? Will you be able to tell me the reason images are transformed?

This is how an image would look like, after all the pre-processing:
image

  1. You have padded ones around the words. Asking this because I have seen people pad zeros after the word. What is the reason you have done it this way?

from handwritten-text-recognition.

naveen-marthala avatar naveen-marthala commented on September 12, 2024

since zeroes are padded to ground truth labels, wouldn't model during training(and also the loss function) treat the spaces same as 'no text', since they both are encoded as zeroes.

I mean to say, for a sentence that has less than 74 characters in total, after all the pre-processing it will look like this:
(for sentence: this is an example sentence.)
image
So, my question or doubt now is, would the model and the loss function during training not treat spaces the same as empty characters, since they both are encoded as zeroes.

from handwritten-text-recognition.

naveen-marthala avatar naveen-marthala commented on September 12, 2024

if this line:
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Parameter.py#L9
means the maximum length of words in all the images in IAM words dataset;
There are many images with characters more than 16, for example, image at location: "..\words\j07\j07-000\j07-000-04-03.png", which is:
j07-000-04-03
has 21 characters.

So, why set the maximum text length to only 16, when it is actually 21? Asking because this will be the length of the Y-array.

from handwritten-text-recognition.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.