<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

OpenCV return size (H, W, C) when using cv.imread(), but X_data need size [B, W,

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

I didn't know that array with (batch_size, width, height, channels) had to fed.

if this line: <a href="https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744

About pre-processing of images about handwritten-text-recognition HOT 10 CLOSED

tuandoan998 commented on September 12, 2024

About pre-processing of images

from handwritten-text-recognition.

Comments (10)

tuandoan998 commented on September 12, 2024 2

OpenCV return size (H, W, C) when using cv.imread(), but X_data need size [B, W, H, C] or [[B, W, H, C]. So, have to transformed the image.
First, not padded ones. https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Preprocessor.py#L8
Second, padding depends on whether the background of the image has inverted or not.

There may be lots of hard and unclean code. But you should read and understand before making a question. Cheers.

from handwritten-text-recognition.

tuandoan998 commented on September 12, 2024 1

@tuandoan998, I am trying to understand how you have pre-processed your images and labels.
What is
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L20

the purpose of i_len or input_length?
You have initialised it to 30 for word model.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Parameter.py#L7

And this is the only place you have used it:
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L57

Why multiply with 30?

Size of output of word-word model is (Batch_size, 32, 80) (see Resource/word_model.png).
32 is number timesteps. The first 2 timesteps are usually worth the rubbish.
So, i_len = 30 (=32-2): number of timesteps to decode output to label with CTC.

from handwritten-text-recognition.

naveen-marthala commented on September 12, 2024 1

I didn't know that array with (batch_size, width, height, channels) had to fed. I am a newbie and this is the first project I am exploring. Thanks.
My bad. And I couldn't understand this part:

depends on whether the background of the image has inverted or not.

mainly "inverted or not".

It really would be foolish of me to say that I have read and tried to understand things. But, I actually have and didn't write these questions right after I had a doubt. I did look up internet and asked questions only specific to code in this repository. Cheers.

from handwritten-text-recognition.

tuandoan998 commented on September 12, 2024 1

An example of an image with a value in the range [0-255] (or [0-1]). The black foreground value is close to 0, the white background has a value close to 255 (or 1).
In this case, the value 255 (or 1) should be padded around the image if necessary.
In some cases, to facilitate the training of neural network models, people often inverted the color of the image ([0-255] -> [255-0]). So, the value 0 should be padded, like you said.

This is also my first project =)))

from handwritten-text-recognition.

tuandoan998 commented on September 12, 2024 1

Inputs also include label_length, labels will be truncated when training.

from handwritten-text-recognition.

tuandoan998 commented on September 12, 2024 1

https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Utils.py#L55
I have ignored samples have length greater than 16. You can change the config 'max_text_len': 21 if you want.

from handwritten-text-recognition.

tuandoan998 commented on September 12, 2024

Here's how, as I could understand, you have pre-processed your images. Please tell me where I have got wrong or missed something.

Put briefly,

for word model, batch size = 64.
So, you read 64 images along with their labels.
then upscale all the images to (128, 64, 1) by padding zeroes/
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Preprocessor.py#L38

convert them to grayscale.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Preprocessor.py#L42

and store them in your batch array.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L67

Then, get the labelled version of text:
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L68

So, for every batch, you will be sending to your model,
64 images, labels of these 64 images, a vector of shape=(64, 1) filled with 30s and finally length of each label,
as shown here, to your inputs argument of your model.fit_generator method, right?
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L71-L76

and outputs are, a dict of zeroes, with shape=(64,1).
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L77

That's how, as I could understand, you have pre-processed and fed to your model in each batch. What have I missed and where have I got wrong? Please do tell me.

I don't understand your question.

from handwritten-text-recognition.

naveen-marthala commented on September 12, 2024

I have replicated the way you pre-processed your images. I now two small doubts with pre-processing.

You have transformed the images before feeding to your model.
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L62
Above line is in next_batch method of TextImageGenerator class in "ImageGenerator.py"
It has something to do with computation? Will you be able to tell me the reason images are transformed?

This is how an image would look like, after all the pre-processing:

You have padded ones around the words. Asking this because I have seen people pad zeros after the word. What is the reason you have done it this way?

from handwritten-text-recognition.

naveen-marthala commented on September 12, 2024

since zeroes are padded to ground truth labels, wouldn't model during training(and also the loss function) treat the spaces same as 'no text', since they both are encoded as zeroes.

I mean to say, for a sentence that has less than 74 characters in total, after all the pre-processing it will look like this:
(for sentence: this is an example sentence.)

So, my question or doubt now is, would the model and the loss function during training not treat spaces the same as empty characters, since they both are encoded as zeroes.

from handwritten-text-recognition.

naveen-marthala commented on September 12, 2024

if this line:
https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/Parameter.py#L9
means the maximum length of words in all the images in IAM words dataset;
There are many images with characters more than 16, for example, image at location: "..\words\j07\j07-000\j07-000-04-03.png", which is:

has 21 characters.

So, why set the maximum text length to only 16, when it is actually 21? Asking because this will be the length of the Y-array.

from handwritten-text-recognition.

About pre-processing of images about handwritten-text-recognition HOT 10 CLOSED

Comments (10)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent