I tried following the usage instructions you posted on a sample .jpg image of a receip

Have a look here: <a href="https://github.com/uakarsh/docformer/blob/master/examples/D

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error When Following the Usage Instructions about docformer HOT 7 CLOSED

shabie commented on July 20, 2024

Error When Following the Usage Instructions

from docformer.

Comments (7)

uakarsh commented on July 20, 2024

Have a look here: https://github.com/uakarsh/docformer/blob/master/examples/DocFormer_for_MLM.ipynb

The error is because, the entity is not batched (i.e having a shape of (....), rather than (batch_size,....)

from docformer.

ynusinovich commented on July 20, 2024

@uakarsh Thank you for your help! Does this mean that the Usage section of the README can't actually be used? I was trying to do a demo of it to my study group. I tried encoding['resized_scaled_img'] = encoding['resized_scaled_img'].unsqueeze(0) to add a batch size of 1, but that didn't work either.

from docformer.

uakarsh commented on July 20, 2024

It can be used, we just need to pass an argument, add_batch_dim=True, in dataset.create_features function.

from docformer.

uakarsh commented on July 20, 2024

The thing, which you did also won't work, because there are more than just image features, i.e you need to unsqueeze the other features as well. I have updated the readme, hope it helps

from docformer.

ynusinovich commented on July 20, 2024

Thank you so much, it runs now! Unsqueezing each feature also works for me, but add_batch_dim is more straightforward. Are there any examples of followup steps (i.e., what the resulting tensor means in terms of the input image)? I can't find that in the README and examples.

from docformer.

uakarsh commented on July 20, 2024

Maybe, you can have a look at the notebook, which I shared previously. In that notebook, you can go through the DocFormerForMLM class, and look at the forward method there. I would briefly describe it here:

All the shapes are mentioned as per the default configuration

The self.embeddings, are responsible for encoding the spatial features of the bounding boxes (size -> (512,768)
The self.resent, is responsible for extracting the image feature (size -> (512, 768)
The self.lang_emb, is responsible for the language feature extraction from the words of the bounding boxes (size -> (512,768)
The self.encoder, calculates the attention and forward propagates it (size -> (512,768)

And then, for downstream task, the linear layers are attached. Hope it helps.

from docformer.

ynusinovich commented on July 20, 2024

Ok, understood, thank you very much for your help. I'll close the issue since the example runs!

from docformer.

Recommend Projects

Error When Following the Usage Instructions about docformer HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent