shabie / docformer Goto Github PK

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

License: MIT License

Python 98.93% Shell 1.07%

docformer's People

Contributors

Stargazers

Watchers

docformer's Issues

will you release any pretrained model?

thank you for your open source code
will you release your pretrain model trained by this repo?

How to train on custom dataset?

Can anyone guide me, how to train on the Custom dataset for example CORD (https://github.com/clovaai/cord)

About rvl_cdip_dataset.csv

Hi,

Can you please provide code to generate the csv file used in the examples folder in dataset_creation_for_docformer.py file

Inference for token classification.

Hi @uakarsh, I trained docformer on a custom dataset and got a checkpoint file. I have no idea how to perform inference on test images. Can you show me how to do inference on images or if possible share a code snippet for the same.

Using pre-trained models

Hello, thank you for the great work.
I used this script to run the pre-training for MLM task: https://github.com/shabie/docformer/blob/master/examples/docformer_pl/3_Pre_training_DocFormer_Task_MLM_Task.ipynb
Afterwards, I used the resulting model in the token-classification task. ( using load_from_check_point which copies all the weight except the linear layer which has a different shape).

The problem is that no matter how much I run the pre-training, I always get the same metrics in the token-classification task (using that pre-trained model as a starting point).

I even tried the model from document-classification task as a base for token classification and I still the get same exact metrics as the results I was getting from using the MLM-pretrained task.

Any suggestion on how to properly use the pre-trained models?

Pre-trained models

Thanks for the great work! Do you have any plan to release the pre-trained model of docformer?

AssertionError:

`AssertionError                            Traceback (most recent call last)
<ipython-input-8-02f52eee118a> in <module>()
     25 
     26 tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
---> 27 encoding = dataset.create_features(fp, tokenizer)
     28 
     29 feature_extractor = modeling.ExtractFeatures(config)

/content/docformer/src/docformer/dataset.py in create_features(image, tokenizer, add_batch_dim, target_size, max_seq_length, path_to_save, save_to_disk, apply_mask_for_mlm, extras_for_debugging)
    259         "y_features": torch.as_tensor(a_rel_y, dtype=torch.int32),
    260         })
--> 261     assert torch.lt(encoding["x_features"], 0).sum().item() == 0
    262     assert torch.lt(encoding["y_features"], 0).sum().item() == 0
    263 

AssertionError:

First I tried with png image, later converted to tif but still it is giving this error

Do you have a plan for pre-trained model and inference code?

HI, Thank you for sharing nice work.

I'm new in Document Understanding, I wanna check the inference on FUNSD dataset.
Could you share any pre-trained models, and inference code?

:) i'm waiting your reply. thank you.

DocFormer for key-value pairs extraction

Hello,

Is it possible to train DocFormer on key-value (or Question/Answer) extraction task?
If so, could you please explain the approach?

Thanks!

Permission Denied Pretraining Weights

Is it possible to update the link for the pretraining weights? I was not able to download them

Error When Following the Usage Instructions

I tried following the usage instructions you posted on a sample .jpg image of a receipt.
Every time I run it, I get an error saying, "RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 384, 500] instead".
How do I fix that?

Full code:

import pytesseract
import sys 
sys.path.extend(['docformer/src/docformer/'])
import modeling, dataset
from transformers import BertTokenizerFast


config = {
  "coordinate_size": 96,
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "image_feature_pool_shape": [7, 7, 256],
  "intermediate_ff_size_factor": 4,
  "max_2d_position_embeddings": 1000,
  "max_position_embeddings": 512,
  "max_relative_positions": 8,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "shape_size": 96,
  "vocab_size": 30522,
  "layer_norm_eps": 1e-12,
}

fp = "images/data_sample.jpg"

tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
encoding = dataset.create_features(fp, tokenizer)

pytesseract.pytesseract.tesseract_cmd = r'‪C:\Program Files\Tesseract-OCR\tesseract.exe'

feature_extractor = modeling.ExtractFeatures(config)
docformer = modeling.DocFormerEncoder(config)

v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
output = docformer(v_bar, t_bar, v_bar_s, t_bar_s)  # shape (1, 512, 768)

Full error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [3], in <module>
     31 feature_extractor = modeling.ExtractFeatures(config)
     32 docformer = modeling.DocFormerEncoder(config)
---> 34 v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
     35 output = docformer(v_bar, t_bar, v_bar_s, t_bar_s)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\Documents\Projects\docformer_implementation\docformer/src/docformer\modeling.py:512, in ExtractFeatures.forward(self, encoding)
    509 x_feature = encoding['x_features']
    510 y_feature = encoding['y_features']
--> 512 v_bar = self.visual_feature(image)
    513 t_bar = self.language_feature(language)
    515 v_bar_s, t_bar_s = self.spatial_feature(x_feature, y_feature)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\Documents\Projects\docformer_implementation\docformer/src/docformer\modeling.py:48, in ResNetFeatureExtractor.forward(self, x)
     47 def forward(self, x):
---> 48     x = self.resnet50(x)
     49     x = self.conv1(x)
     50     x = self.relu1(x)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\container.py:141, in Sequential.forward(self, input)
    139 def forward(self, input):
    140     for module in self:
--> 141         input = module(input)
    142     return input

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\conv.py:446, in Conv2d.forward(self, input)
    445 def forward(self, input: Tensor) -> Tensor:
--> 446     return self._conv_forward(input, self.weight, self.bias)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\conv.py:442, in Conv2d._conv_forward(self, input, weight, bias)
    438 if self.padding_mode != 'zeros':
    439     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    440                     weight, bias, self.stride,
    441                     _pair(0), self.dilation, self.groups)
--> 442 return F.conv2d(input, weight, bias, self.stride,
    443                 self.padding, self.dilation, self.groups)

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 384, 500] instead

Weird output

Hi
I ran the code, it is giving me final output that is too weird irrespective of changing the image. I am attaching it. Can you explain what it is?

Thanks

NotImplementedError: Support for `validation_epoch_end` has been removed in v2.0.0. `DocFormer` implements this method

Ran into this issue while running sample notebook on colab ... https://github.com/shabie/docformer/blob/master/examples/docformer_pl/document_image_classification_on_rvl_cdip/4.Document-image-classification-with-docformer.ipynb


1 if __name__ == "__main__":
----> 2     main()

6 frames
[<ipython-input-25-82eba9431764>](https://localhost:8080/#) in main()
     29         deterministic=True
     30     )
---> 31     trainer.fit(docformer, datamodule)

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py](https://localhost:8080/#) in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    527         model = _maybe_unwrap_optimized(model)
    528         self.strategy._lightning_module = model
--> 529         call._call_and_handle_interrupt(
    530             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    531         )

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py](https://localhost:8080/#) in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     40         if trainer.strategy.launcher is not None:
     41             return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 42         return trainer_fn(*args, **kwargs)
     43 
     44     except _TunerExitException:

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py](https://localhost:8080/#) in _fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    566             model_connected=self.lightning_module is not None,
    567         )
--> 568         self._run(model, ckpt_path=ckpt_path)
    569 
    570         assert self.state.stopped

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py](https://localhost:8080/#) in _run(self, model, ckpt_path)
    919         self._callback_connector._attach_model_logging_functions()
    920 
--> 921         _verify_loop_configurations(self)
    922 
    923         # hook

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py](https://localhost:8080/#) in _verify_loop_configurations(trainer)
     34         raise ValueError("Unexpected: Trainer state fn must be set before validating loop configuration.")
     35     if trainer.state.fn == TrainerFn.FITTING:
---> 36         __verify_train_val_loop_configuration(trainer, model)
     37         __verify_manual_optimization_support(trainer, model)
     38         __check_training_step_requires_dataloader_iter(model)

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py](https://localhost:8080/#) in __verify_train_val_loop_configuration(trainer, model)
     82         )
     83     if callable(getattr(model, "validation_epoch_end", None)):
---> 84         raise NotImplementedError(
     85             f"Support for `validation_epoch_end` has been removed in v2.0.0. `{type(model).__name__}` implements this"
     86             " method. You can use the `on_validation_epoch_end` hook instead. To access outputs, save them in-memory as"

NotImplementedError: Support for `validation_epoch_end` has been removed in v2.0.0. `DocFormer` implements this method. You can use the `on_validation_epoch_end` hook instead. To access outputs, save them in-memory as instance attributes. You can find migration examples in https://github.com/Lightning-AI/lightning/pull/16520.

finetune the Docformer

whether I can fine-tune the model of Docformer?
can you give some instruction please. thanks

what this line represent?

output = docformer(v_bar, t_bar, v_bar_s, t_bar_s) # shape (1, 512, 768)

There is no SEP token appended

You have a [SEP] or an equivalent token at the end which I think is not what the authors used:

docformer/src/docformer/dataset.py

Lines 260 to 261 in ae1ce38

    
           token_boxes = [[0, 0, 0, 0]] + token_boxes + [[1000, 1000, 1000, 1000]] 
        
           unnormalized_token_boxes = [[0, 0, 0, 0]] + unnormalized_token_boxes + [[1000, 1000, 1000, 1000]]

See the first paragraph of the sub-section "Language Features" in the section 3.1

NER task

@uakarsh @shabie Hello

Thank you for the great work.
Can you give some more insights on NER task?

Thanks and Regards.

can you provide visual question answering task code

DocFormer for Token Classification.

Hi,
First of all great work. I wanted to ask if DocFormer can be used for token classification like LayoutLM series models of Microsoft Research which support tasks like Token Classification, Document Image Classification and Visual Question-Answering and if it does how we can adapt the model to the task of token classification.

checkpoint of docformer

I wonder if you can provide the checkpoint of docformer. It's a difficult thing for me to train it from scratch. Thank you.

DocFormerv2

Hi @shabie!
Are there any plans to release the DocFormerv2 soon? Great work!
Thanks!

[Errno 2] No such file or directory: 'rvl_cdip_dataset.csv'

I tried to run the codes in docformer/example, but got the above mentioned error. May I know where is the csv file?

Unable to convert model to onnx

Hi, I have been meaning to convert the final o/p of docformer in the document classification notebook from .ckpt to onnx but am unable to do so. I have tried the standard torch.onnx.export method but the issue lies in the "dummy_variable" input in the code. Can you please point me in the right direction.

model = DocFormer.load_from_checkpoint(CKPT_PATH)
model.eval() 

    # Let's create a dummy input tensor  


    # Export the model   
torch.onnx.export(model,         # model being run 
     #for_conversion,  
     maintup,
      #(input_ids,resized_scaled_img, x_features, y_features),         # model input (or a tuple for multiple inputs) 
     "ImageClassifier.onnx",       # where to save the model  
     export_params=True,  # store the trained parameter weights inside the model file 
     opset_version=10,    # the ONNX version to export the model to 
     do_constant_folding=True,  # whether to execute constant folding for optimization 
     input_names = ['resized_scaled_img', 'x_features', 'y_features', 'input_ids', 'resized_and_aligned_bounding_boxes', 'label'],   # the model's input names 
 # the model's output names 
                 ) 
print(" ") 
print('Model has been converted to ONNX')

The pre-trained weight link is invalid

The pre-trained weight link is invalid and I can not access this link.

Would you please re-send a link. Thank you very much!

How to find rvl_cdip_dataset.csv?

Hi,

Thank you for sharing your great work. Can you please also share your rvl_cdip_dataset.csv file or refer me to where I can find it?

Thanks,
Sayna

pre-training code tutorial

Hi I was wondering if you could prepare a tutorial for pre-training DocFormer from scratch. It would be a huge help.

Shape mismatch during sanity check

Can the target size be changed. Currently when I try to change it throws mat1 and mat2 shapes cannot be multiplied (3072x768 and 192x128) . I tried with target size (1000, 768). Tried setting max_position_embeddings to 768 and it throws stack expects each tensor to be equal size, but got [767, 8] at entry 0 and [768, 8] at entry 2. I am trying to replicate document classification task from the notebook provided.

Error in Example: Please provide the bounding box and words or pass the argument "use_ocr" = True

Ran into this error while running the example notebook.

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/tmp/ipykernel_33283/863471981.py in <cell line: 2>()
      1 ## Using a single batch for the forward propagation
----> 2 features = next(iter(train_data_loader))
      3 img,token,x_feat,y_feat, labels = features

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    679                 # TODO(https://github.com/pytorch/pytorch/issues/76750)
    680                 self._reset()  # type: ignore[call-arg]
--> 681             data = self._next_data()
    682             self._num_yielded += 1
    683             if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    719     def _next_data(self):
    720         index = self._next_index()  # may raise StopIteration
--> 721         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    722         if self._pin_memory:
    723             data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     47     def fetch(self, possibly_batched_index):
     48         if self.auto_collation:
---> 49             data = [self.dataset[idx] for idx in possibly_batched_index]
     50         else:
     51             data = self.dataset[possibly_batched_index]

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     47     def fetch(self, possibly_batched_index):
     48         if self.auto_collation:
---> 49             data = [self.dataset[idx] for idx in possibly_batched_index]
     50         else:
     51             data = self.dataset[possibly_batched_index]

/tmp/ipykernel_33283/2543102337.py in __getitem__(self, index)
     22         If labels are not None, then labels also
     23         '''
---> 24         encoding = create_features(self.entries[index],self.tokenizer, apply_mask_for_mlm=self.use_mlm)
     25 
     26         if self.labels==None:

~/docformer/examples/../src/docformer/dataset.py in create_features(image, tokenizer, add_batch_dim, target_size, max_seq_length, path_to_save, save_to_disk, apply_mask_for_mlm, extras_for_debugging, use_ocr, bounding_box, words)
    190 
    191     if (use_ocr == False) and (bounding_box == None or words == None):
--> 192         raise Exception('Please provide the bounding box and words or pass the argument "use_ocr" = True')
    193 
    194     if use_ocr == True:

Exception: Please provide the bounding box and words or pass the argument "use_ocr" = True

How to replicate FUNSD dataset for question answering

I have tried to implement funsd dataset question answering but I am consfued how to use docformer multi modal features output.

not able to import docformer

I have tried to import docformer by using pip and setup install but did not able to import docformer in ubuntu

Not able to import docformer

After following steps, not able to import to docformer but able to import transformer.

Reference code of pre-training tasks

Thanks for your impressive work.
I have one question. will you share the code of other pre-training tasks?

Thanks for your reply.
@uakarsh @shabie

seems some device issue

RuntimeError: Expected tensor to have cuda DeviceType, but got tensor with cpu DeviceType (while checking arguments for einsum())

Detail information about pre-training dataset

Thank you for the great work! I was wondering what criteria (ex. document words) were used to select the subset of 5 million pages from IIT-CDIP dataset in pre-training. To compare our model with docformer under the same condition, please share information or release a pre-process code.

import error after installing via pip and via command python setup.py install

ModuleNotFoundError Traceback (most recent call last)
in ()
----> 1 from docformer import modeling, dataset

ModuleNotFoundError: No module named 'docformer'

[TODO] `create_features` does not detect rotated images leading to no extractions

TODO

I specially like this answer with tesserocr (faster than pytesseract): https://stackoverflow.com/a/69131832/7996306

Again Device issue

I am trying to the code. But I face problem

when I execute below line:
output = docformer(v_bar, t_bar, v_bar_s, t_bar_s) # shape (1, 512, 768)

I get this error.

/usr/local/lib/python3.7/dist-packages/torch/functional.py in einsum(*args)
328 return einsum(equation, *_operands)
329
--> 330 return _VF.einsum(equation, operands) # type: ignore[attr-defined]
331
332 # Wrapper around _histogramdd and _histogramdd_bin_edges needed due to (Tensor, Tensor[]) return type
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_bmm)

This time, I face issue in wrapper_

FileNotFoundError: [Errno 2] No such file or directory: 'rvl_cdip_dataset.csv'

I cannot find the rvl_cdip_dataset.csv file in your repository. It gives me an error in line 69 of dataset_creation_for_docformer.py
Where do I find the rvl_cdip_dataset.csv file?

Predictions are wrong.

I am training docformer, 6 epochs completed. The predictions are not good.
Can you please tell what is minimum number of epochs required to get better prediction on document classification?

Also, in hugging face demo, why the id2label is in this sequence?

id2label = ['scientific_report',
            'resume',
            'memo',
            'file_folder',
            'specification',
            'news_article',
            'letter',
            'form',
            'budget',
            'handwritten',
            'email',
            'invoice',
            'presentation',
            'scientific_publication',
            'questionnaire',
            'advertisement']

We can not find DocFormer_For_IR in modeling.py and modeling_l.py

model = DocFormer_For_IR(config,classes).to(device)

	token_boxes = [[0, 0, 0, 0]] + token_boxes + [[1000, 1000, 1000, 1000]]
	unnormalized_token_boxes = [[0, 0, 0, 0]] + unnormalized_token_boxes + [[1000, 1000, 1000, 1000]]

shabie / docformer Goto Github PK

docformer's People

Contributors

Stargazers

Watchers

Forkers

docformer's Issues

TODO

Recommend Projects

Recommend Topics

Recommend Org