Code Monkey home page Code Monkey logo

Comments (4)

uakarsh avatar uakarsh commented on August 20, 2024 1

Hi,

I think that the good results is a subjective topic (maybe, depends upon your usage of application, i.e how you wish to use this architecture). From my perspective, you can just to tune the hyperparameters and observe the results (I have integrated W&B for visualization of progress), and come to a conclusion

Currently, if you are talking about the 4th Notebook (i.e the Kaggle Notebook), no we are not using the pre-trained weights (I have seperately described the pre-training on MLM), but rather directly training it from scratch

If you want to go with pre-training, here is a simple approach. Pre-train the docformer with MLM (from 3rd notebook in the example), and then use the 4th notebook (and load the specific weights from the checkpoint you saved)

Hope this helps.

Regards,
Akarsh

from docformer.

uakarsh avatar uakarsh commented on August 20, 2024

@BakingBrains Hi,

Actually, the way I trained DocFormer on RVL-CDIP has some assumptions (you can check out the Kaggle Notebook, where I mentioned the assumptions, and these assumptions were based on the fact that, the GPU memory was limited and the time constraints)

You can check out this line of code, where I load the weights. Ckpt. If that doesn't work, the checkpoint are saved in the version 5 of the kaggle notebook (shared in the examples/docformer_pl/ of cloned repo of docformer)

And for the NER task, in some time I would be uploading the script as well, so stay tuned. And, as soon as I upload it, I would let you know the same

Hope this helps

Regards,
Akarsh

from docformer.

BakingBrains avatar BakingBrains commented on August 20, 2024

@uakarsh Thank you.
What do you think the number of epochs required to get good output for document classification.

Also, in the paper author has mentioned:

DocFormer is pre-trained for 5 epochs, then we remove all
three task heads. We add one linear projection head and
fine-tune all components of the model for all downstream
tasks.

I see here adding Linear layer.

self.resnet = ResNetFeatureExtractor(hidden_dim = config['max_position_embeddings'])
self.embeddings = DocFormerEmbeddings(config)
self.lang_emb = LanguageFeatureExtractor()
self.config = config
self.dropout = nn.Dropout(config['hidden_dropout_prob'])
self.linear_layer = nn.Linear(in_features = config['hidden_size'], out_features = len(id2label)) 

In section 3.2. Are we doing the same thing for document classification as well?
Had this doubt.

Thanks and Regards

from docformer.

BakingBrains avatar BakingBrains commented on August 20, 2024

@uakarsh Thank you😄

from docformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.