Hi I ran the code, it is giving me final output that is too weird irrespective of

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I have almost finished the training for RVL-CDIP (Document Classification), and

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Weird output about docformer HOT 7 CLOSED

shabie commented on August 20, 2024

Weird output

from docformer.

Comments (7)

uakarsh commented on August 20, 2024

Sorry for the delay, but can do let me know, from which layer did you extract the output?

Regards,

from docformer.

kmr2017 commented on August 20, 2024

Hi @uakarsh

Thanks for your response.

I tried below code

config = {
"coordinate_size": 96,
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"image_feature_pool_shape": [7, 7, 256],
"intermediate_ff_size_factor": 4,
"max_2d_position_embeddings": 1000,
"max_position_embeddings": 512,
"max_relative_positions": 8,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"shape_size": 96,
"vocab_size": 30522,
"layer_norm_eps": 1e-12,
}

fp = "img.jpeg"

tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
encoding = dataset.create_features(fp, tokenizer, add_batch_dim=True)

feature_extractor = modeling.ExtractFeatures(config)
docformer = modeling.DocFormerEncoder(config)
v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
output = docformer(v_bar, t_bar, v_bar_s, t_bar_s) # shape (1, 512, 768)

then I visualized the output.

from docformer.

uakarsh commented on August 20, 2024

HI,

Actually, we know that the output is (512, 768), now, this output results from the attention of three different entities:

Image feature of (512, 768)
Language Feature of (512, 768)
Spatial Dimension of (512, 768)

Now, when we perform any downstream task, we have an encoded version of these three modalities, so the diagram (which you have plotted) would be helpful for the model to know, which encoding to attend to when performing the downstream task.

The same can be seen in Pg No. 15, Figure 11. B of DocFormer Paper. Hope it helps

from docformer.

kmr2017 commented on August 20, 2024

Thanks for your info. How can I do entity level classification like in FUNSD dataset?

from docformer.

kmr2017 commented on August 20, 2024

@uakarsh

from docformer.

uakarsh commented on August 20, 2024

I have almost finished the training script for RVL-CDIP (Document Classification), and have started working on FUNSD for token classification.

You can visit my cloned repo (https://github.com/uakarsh/docformer/tree/master/examples/docformer_pl), and in the examples/docformer_pl, you can get the

Data visualizing
Dataset making
MLM with Pytorch Lightning
Document Classification with DocFormer (would be uploaded soon)
And next would be NER with FUNSD.

Would update you soon!!

from docformer.

BakingBrains commented on August 20, 2024

@uakarsh Hello,

Any update on NER with FUNSD using docformer?

from docformer.

Weird output about docformer HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent