Code Monkey home page Code Monkey logo

docformer's People

Contributors

shabie avatar uakarsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docformer's Issues

About rvl_cdip_dataset.csv

Hi,

Can you please provide code to generate the csv file used in the examples folder in dataset_creation_for_docformer.py file

Inference for token classification.

Hi @uakarsh, I trained docformer on a custom dataset and got a checkpoint file. I have no idea how to perform inference on test images. Can you show me how to do inference on images or if possible share a code snippet for the same.

Using pre-trained models

Hello, thank you for the great work.
I used this script to run the pre-training for MLM task: https://github.com/shabie/docformer/blob/master/examples/docformer_pl/3_Pre_training_DocFormer_Task_MLM_Task.ipynb
Afterwards, I used the resulting model in the token-classification task. ( using load_from_check_point which copies all the weight except the linear layer which has a different shape).

The problem is that no matter how much I run the pre-training, I always get the same metrics in the token-classification task (using that pre-trained model as a starting point).

I even tried the model from document-classification task as a base for token classification and I still the get same exact metrics as the results I was getting from using the MLM-pretrained task.

Any suggestion on how to properly use the pre-trained models?

Pre-trained models

Thanks for the great work! Do you have any plan to release the pre-trained model of docformer?

AssertionError:

`AssertionError                            Traceback (most recent call last)
<ipython-input-8-02f52eee118a> in <module>()
     25 
     26 tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
---> 27 encoding = dataset.create_features(fp, tokenizer)
     28 
     29 feature_extractor = modeling.ExtractFeatures(config)

/content/docformer/src/docformer/dataset.py in create_features(image, tokenizer, add_batch_dim, target_size, max_seq_length, path_to_save, save_to_disk, apply_mask_for_mlm, extras_for_debugging)
    259         "y_features": torch.as_tensor(a_rel_y, dtype=torch.int32),
    260         })
--> 261     assert torch.lt(encoding["x_features"], 0).sum().item() == 0
    262     assert torch.lt(encoding["y_features"], 0).sum().item() == 0
    263 

AssertionError: 

First I tried with png image, later converted to tif but still it is giving this error

Error When Following the Usage Instructions

I tried following the usage instructions you posted on a sample .jpg image of a receipt.
Every time I run it, I get an error saying, "RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 384, 500] instead".
How do I fix that?

Full code:

import pytesseract
import sys 
sys.path.extend(['docformer/src/docformer/'])
import modeling, dataset
from transformers import BertTokenizerFast


config = {
  "coordinate_size": 96,
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "image_feature_pool_shape": [7, 7, 256],
  "intermediate_ff_size_factor": 4,
  "max_2d_position_embeddings": 1000,
  "max_position_embeddings": 512,
  "max_relative_positions": 8,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "shape_size": 96,
  "vocab_size": 30522,
  "layer_norm_eps": 1e-12,
}

fp = "images/data_sample.jpg"

tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
encoding = dataset.create_features(fp, tokenizer)

pytesseract.pytesseract.tesseract_cmd = r'‪C:\Program Files\Tesseract-OCR\tesseract.exe'

feature_extractor = modeling.ExtractFeatures(config)
docformer = modeling.DocFormerEncoder(config)

v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
output = docformer(v_bar, t_bar, v_bar_s, t_bar_s)  # shape (1, 512, 768)

Full error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [3], in <module>
     31 feature_extractor = modeling.ExtractFeatures(config)
     32 docformer = modeling.DocFormerEncoder(config)
---> 34 v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
     35 output = docformer(v_bar, t_bar, v_bar_s, t_bar_s)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\Documents\Projects\docformer_implementation\docformer/src/docformer\modeling.py:512, in ExtractFeatures.forward(self, encoding)
    509 x_feature = encoding['x_features']
    510 y_feature = encoding['y_features']
--> 512 v_bar = self.visual_feature(image)
    513 t_bar = self.language_feature(language)
    515 v_bar_s, t_bar_s = self.spatial_feature(x_feature, y_feature)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\Documents\Projects\docformer_implementation\docformer/src/docformer\modeling.py:48, in ResNetFeatureExtractor.forward(self, x)
     47 def forward(self, x):
---> 48     x = self.resnet50(x)
     49     x = self.conv1(x)
     50     x = self.relu1(x)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\container.py:141, in Sequential.forward(self, input)
    139 def forward(self, input):
    140     for module in self:
--> 141         input = module(input)
    142     return input

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\conv.py:446, in Conv2d.forward(self, input)
    445 def forward(self, input: Tensor) -> Tensor:
--> 446     return self._conv_forward(input, self.weight, self.bias)

File ~\anaconda3\envs\docformer_env\lib\site-packages\torch\nn\modules\conv.py:442, in Conv2d._conv_forward(self, input, weight, bias)
    438 if self.padding_mode != 'zeros':
    439     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    440                     weight, bias, self.stride,
    441                     _pair(0), self.dilation, self.groups)
--> 442 return F.conv2d(input, weight, bias, self.stride,
    443                 self.padding, self.dilation, self.groups)

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 384, 500] instead

Weird output

Hi
I ran the code, it is giving me final output that is too weird irrespective of changing the image. I am attaching it. Can you explain what it is?

image

Thanks

NotImplementedError: Support for `validation_epoch_end` has been removed in v2.0.0. `DocFormer` implements this method

Ran into this issue while running sample notebook on colab ... https://github.com/shabie/docformer/blob/master/examples/docformer_pl/document_image_classification_on_rvl_cdip/4.Document-image-classification-with-docformer.ipynb


1 if __name__ == "__main__":
----> 2     main()

6 frames
[<ipython-input-25-82eba9431764>](https://localhost:8080/#) in main()
     29         deterministic=True
     30     )
---> 31     trainer.fit(docformer, datamodule)

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py](https://localhost:8080/#) in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    527         model = _maybe_unwrap_optimized(model)
    528         self.strategy._lightning_module = model
--> 529         call._call_and_handle_interrupt(
    530             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    531         )

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py](https://localhost:8080/#) in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     40         if trainer.strategy.launcher is not None:
     41             return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 42         return trainer_fn(*args, **kwargs)
     43 
     44     except _TunerExitException:

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py](https://localhost:8080/#) in _fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    566             model_connected=self.lightning_module is not None,
    567         )
--> 568         self._run(model, ckpt_path=ckpt_path)
    569 
    570         assert self.state.stopped

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py](https://localhost:8080/#) in _run(self, model, ckpt_path)
    919         self._callback_connector._attach_model_logging_functions()
    920 
--> 921         _verify_loop_configurations(self)
    922 
    923         # hook

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py](https://localhost:8080/#) in _verify_loop_configurations(trainer)
     34         raise ValueError("Unexpected: Trainer state fn must be set before validating loop configuration.")
     35     if trainer.state.fn == TrainerFn.FITTING:
---> 36         __verify_train_val_loop_configuration(trainer, model)
     37         __verify_manual_optimization_support(trainer, model)
     38         __check_training_step_requires_dataloader_iter(model)

[/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py](https://localhost:8080/#) in __verify_train_val_loop_configuration(trainer, model)
     82         )
     83     if callable(getattr(model, "validation_epoch_end", None)):
---> 84         raise NotImplementedError(
     85             f"Support for `validation_epoch_end` has been removed in v2.0.0. `{type(model).__name__}` implements this"
     86             " method. You can use the `on_validation_epoch_end` hook instead. To access outputs, save them in-memory as"

NotImplementedError: Support for `validation_epoch_end` has been removed in v2.0.0. `DocFormer` implements this method. You can use the `on_validation_epoch_end` hook instead. To access outputs, save them in-memory as instance attributes. You can find migration examples in https://github.com/Lightning-AI/lightning/pull/16520.

finetune the Docformer

whether I can fine-tune the model of Docformer?
can you give some instruction please. thanks

There is no SEP token appended

You have a [SEP] or an equivalent token at the end which I think is not what the authors used:

token_boxes = [[0, 0, 0, 0]] + token_boxes + [[1000, 1000, 1000, 1000]]
unnormalized_token_boxes = [[0, 0, 0, 0]] + unnormalized_token_boxes + [[1000, 1000, 1000, 1000]]

See the first paragraph of the sub-section "Language Features" in the section 3.1

NER task

@uakarsh @shabie Hello

Thank you for the great work.
Can you give some more insights on NER task?

Thanks and Regards.

DocFormer for Token Classification.

Hi,
First of all great work. I wanted to ask if DocFormer can be used for token classification like LayoutLM series models of Microsoft Research which support tasks like Token Classification, Document Image Classification and Visual Question-Answering and if it does how we can adapt the model to the task of token classification.

checkpoint of docformer

I wonder if you can provide the checkpoint of docformer. It's a difficult thing for me to train it from scratch. Thank you.

DocFormerv2

Hi @shabie!
Are there any plans to release the DocFormerv2 soon? Great work!
Thanks!

Unable to convert model to onnx

Hi, I have been meaning to convert the final o/p of docformer in the document classification notebook from .ckpt to onnx but am unable to do so. I have tried the standard torch.onnx.export method but the issue lies in the "dummy_variable" input in the code. Can you please point me in the right direction.

model = DocFormer.load_from_checkpoint(CKPT_PATH)
model.eval() 

    # Let's create a dummy input tensor  


    # Export the model   
torch.onnx.export(model,         # model being run 
     #for_conversion,  
     maintup,
      #(input_ids,resized_scaled_img, x_features, y_features),         # model input (or a tuple for multiple inputs) 
     "ImageClassifier.onnx",       # where to save the model  
     export_params=True,  # store the trained parameter weights inside the model file 
     opset_version=10,    # the ONNX version to export the model to 
     do_constant_folding=True,  # whether to execute constant folding for optimization 
     input_names = ['resized_scaled_img', 'x_features', 'y_features', 'input_ids', 'resized_and_aligned_bounding_boxes', 'label'],   # the model's input names 
 # the model's output names 
                 ) 
print(" ") 
print('Model has been converted to ONNX') 

How to find rvl_cdip_dataset.csv?

Hi,

Thank you for sharing your great work. Can you please also share your rvl_cdip_dataset.csv file or refer me to where I can find it?

Thanks,
Sayna

pre-training code tutorial

Hi I was wondering if you could prepare a tutorial for pre-training DocFormer from scratch. It would be a huge help.

Shape mismatch during sanity check

Can the target size be changed. Currently when I try to change it throws mat1 and mat2 shapes cannot be multiplied (3072x768 and 192x128) . I tried with target size (1000, 768). Tried setting max_position_embeddings to 768 and it throws stack expects each tensor to be equal size, but got [767, 8] at entry 0 and [768, 8] at entry 2. I am trying to replicate document classification task from the notebook provided.

Error in Example: Please provide the bounding box and words or pass the argument "use_ocr" = True

Ran into this error while running the example notebook.

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/tmp/ipykernel_33283/863471981.py in <cell line: 2>()
      1 ## Using a single batch for the forward propagation
----> 2 features = next(iter(train_data_loader))
      3 img,token,x_feat,y_feat, labels = features

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    679                 # TODO(https://github.com/pytorch/pytorch/issues/76750)
    680                 self._reset()  # type: ignore[call-arg]
--> 681             data = self._next_data()
    682             self._num_yielded += 1
    683             if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    719     def _next_data(self):
    720         index = self._next_index()  # may raise StopIteration
--> 721         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    722         if self._pin_memory:
    723             data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     47     def fetch(self, possibly_batched_index):
     48         if self.auto_collation:
---> 49             data = [self.dataset[idx] for idx in possibly_batched_index]
     50         else:
     51             data = self.dataset[possibly_batched_index]

~/anaconda3/envs/python3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     47     def fetch(self, possibly_batched_index):
     48         if self.auto_collation:
---> 49             data = [self.dataset[idx] for idx in possibly_batched_index]
     50         else:
     51             data = self.dataset[possibly_batched_index]

/tmp/ipykernel_33283/2543102337.py in __getitem__(self, index)
     22         If labels are not None, then labels also
     23         '''
---> 24         encoding = create_features(self.entries[index],self.tokenizer, apply_mask_for_mlm=self.use_mlm)
     25 
     26         if self.labels==None:

~/docformer/examples/../src/docformer/dataset.py in create_features(image, tokenizer, add_batch_dim, target_size, max_seq_length, path_to_save, save_to_disk, apply_mask_for_mlm, extras_for_debugging, use_ocr, bounding_box, words)
    190 
    191     if (use_ocr == False) and (bounding_box == None or words == None):
--> 192         raise Exception('Please provide the bounding box and words or pass the argument "use_ocr" = True')
    193 
    194     if use_ocr == True:

Exception: Please provide the bounding box and words or pass the argument "use_ocr" = True

not able to import docformer

I have tried to import docformer by using pip and setup install but did not able to import docformer in ubuntu

seems some device issue

RuntimeError: Expected tensor to have cuda DeviceType, but got tensor with cpu DeviceType (while checking arguments for einsum())

Detail information about pre-training dataset

Thank you for the great work! I was wondering what criteria (ex. document words) were used to select the subset of 5 million pages from IIT-CDIP dataset in pre-training. To compare our model with docformer under the same condition, please share information or release a pre-process code.

Again Device issue

I am trying to the code. But I face problem

when I execute below line:
output = docformer(v_bar, t_bar, v_bar_s, t_bar_s) # shape (1, 512, 768)

I get this error.

/usr/local/lib/python3.7/dist-packages/torch/functional.py in einsum(*args)
328 return einsum(equation, *_operands)
329
--> 330 return _VF.einsum(equation, operands) # type: ignore[attr-defined]
331
332 # Wrapper around _histogramdd and _histogramdd_bin_edges needed due to (Tensor, Tensor[]) return type
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_bmm)

This time, I face issue in wrapper_

Predictions are wrong.

I am training docformer, 6 epochs completed. The predictions are not good.
Can you please tell what is minimum number of epochs required to get better prediction on document classification?

Also, in hugging face demo, why the id2label is in this sequence?

id2label = ['scientific_report',
            'resume',
            'memo',
            'file_folder',
            'specification',
            'news_article',
            'letter',
            'form',
            'budget',
            'handwritten',
            'email',
            'invoice',
            'presentation',
            'scientific_publication',
            'questionnaire',
            'advertisement']

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.