Code Monkey home page Code Monkey logo

Comments (7)

uakarsh avatar uakarsh commented on July 20, 2024

Have a look here: https://github.com/uakarsh/docformer/blob/master/examples/DocFormer_for_MLM.ipynb

The error is because, the entity is not batched (i.e having a shape of (....), rather than (batch_size,....)

from docformer.

ynusinovich avatar ynusinovich commented on July 20, 2024

@uakarsh Thank you for your help! Does this mean that the Usage section of the README can't actually be used? I was trying to do a demo of it to my study group. I tried encoding['resized_scaled_img'] = encoding['resized_scaled_img'].unsqueeze(0) to add a batch size of 1, but that didn't work either.

from docformer.

uakarsh avatar uakarsh commented on July 20, 2024

It can be used, we just need to pass an argument, add_batch_dim=True, in dataset.create_features function.

from docformer.

uakarsh avatar uakarsh commented on July 20, 2024

The thing, which you did also won't work, because there are more than just image features, i.e you need to unsqueeze the other features as well. I have updated the readme, hope it helps

from docformer.

ynusinovich avatar ynusinovich commented on July 20, 2024

Thank you so much, it runs now! Unsqueezing each feature also works for me, but add_batch_dim is more straightforward. Are there any examples of followup steps (i.e., what the resulting tensor means in terms of the input image)? I can't find that in the README and examples.

from docformer.

uakarsh avatar uakarsh commented on July 20, 2024

Maybe, you can have a look at the notebook, which I shared previously. In that notebook, you can go through the DocFormerForMLM class, and look at the forward method there. I would briefly describe it here:

All the shapes are mentioned as per the default configuration

  1. The self.embeddings, are responsible for encoding the spatial features of the bounding boxes (size -> (512,768)
  2. The self.resent, is responsible for extracting the image feature (size -> (512, 768)
  3. The self.lang_emb, is responsible for the language feature extraction from the words of the bounding boxes (size -> (512,768)
  4. The self.encoder, calculates the attention and forward propagates it (size -> (512,768)

And then, for downstream task, the linear layers are attached. Hope it helps.

from docformer.

ynusinovich avatar ynusinovich commented on July 20, 2024

Ok, understood, thank you very much for your help. I'll close the issue since the example runs!

from docformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.