Code Monkey home page Code Monkey logo

docentr's Introduction

DocEnTR

Use Python version 3.8.12

Replicate

Description

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

Degraded Images Our Binarization
1 2
1 2

Download Code

clone the repository:

git clone https://github.com/dali92002/DocEnTR
cd DocEnTr

Requirements

  • install requirements.txt

Process Data

Data Path

We gathered the DIBCO, H-DIBCO and PALM datasets and organized them in one folder. You can download it from this link. After downloading, extract the folder named DIBCOSETS and place it in your desired data path. Means: /YOUR_DATA_PATH/DIBCOSETS/

Data Splitting

Specify the data path, split size, validation and testing sets to prepare your data. In this example, we set the split size as (256 X 256), the validation set as 2016 and the testing as 2018 while running the process_dibco.py file.

python process_dibco.py --data_path /YOUR_DATA_PATH/ --split_size 256 --testing_dataset 2018 --validation_dataset 2016

Using DocEnTr

Training

For training, specify the desired settings (batch_size, patch_size, model_size, split_size and training epochs) when running the file train.py. For example, for a base model with a patch_size of (16 X 16) and a batch_size of 32 we use the following command:

python train.py --data_path /YOUR_DATA_PATH/ --batch_size 32 --vit_model_size base --vit_patch_size 16 --epochs 151 --split_size 256 --validation_dataset 2016

You will get visualization results from the validation dataset on each epoch in a folder named vis+"YOUR_EXPERIMENT_SETTINGS" (it will be created). In the previous case it will be named visbase_256_16. Also, the best weights will be saved in the folder named "weights".

Testing on a DIBCO dataset

To test the trained model on a specific DIBCO dataset (should be matched with the one specified in Section Process Data, if not, run process_dibco.py again). Download the model weights (In section Model Zoo), or use your own trained model weights. Then, run the following command. Here, I test on H-DIBCO 2018, using the Base model with 8X8 patch_size, and a batch_size of 16. The binarized images will be in the folder ./vis+"YOUR_CONFIGS_HERE"/epoch_testing/

python test.py --data_path /YOUR_DATA_PATH/ --model_weights_path  /THE_MODEL_WEIGHTS_PATH/  --batch_size 16 --vit_model_size base --vit_patch_size 8 --split_size 256 --testing_dataset 2018

Demo

In this demo, we show how we can use our pretrained models to binarize a single degraded image, this is detailed with comments in the file named demo.ipynb for simplicity we make it a jupyter notebook where you can modify all the code parts and visualize your progresssive results.

Model Zoo

In this section we release the pre-trained weights for all the best DocEnTr model variants trained on DIBCO benchmarks.

Testing data Models Patch size URL PSNR
0
DIBCO 2011
DocEnTr-Base 8x8 Unavailable 20.81
DocEnTr-Large 16x16 Unavailable 20.62
1
H-DIBCO 2012
DocEnTr-Base 8x8 model 22.29
DocEnTr-Large 16x16 model 22.04
2
DIBCO 2017
DocEnTr-Base 8x8 model 19.11
DocEnTr-Large 16x16 model 18.85
3
H-DIBCO 2018
DocEnTr-Base 8x8 model 19.46
DocEnTr-Large 16x16 model 19.47

Citation

If you find this useful for your research, please cite it as follows:

@inproceedings{souibgui2022docentr,
  title={DocEnTr: An end-to-end document image enhancement transformer},
  author={Souibgui, Mohamed Ali and Biswas, Sanket and  Jemni, Sana Khamekhem and Kessentini, Yousri and Forn{\'e}s, Alicia and Llad{\'o}s, Josep and Pal, Umapada},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)},
  year={2022}
}

Authors

Conclusion

Thank you for interesting in our work, and sorry if there is any bugs.

docentr's People

Contributors

biswassanket avatar chenxwh avatar dali92002 avatar kym6464 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

docentr's Issues

Unable to load the state_dict for BinModel in demo example

Working directly with the example code in https://github.com/dali92002/DocEnTR/blob/main/demo.ipynb

Tried the pretrained model params from the model zoo. when model.load_state_dict - Ran into the following error:

RuntimeError: Error(s) in loading state_dict for BinModel:
Missing key(s) in state_dict: "encoder.to_patch_embedding.2.weight", "encoder.to_patch_embedding.2.bias", "encoder.to_patch_embedding.3.weight", "encoder.to_patch_embedding.3.bias".

size mismatch for encoder.pos_embedding: copying a param with shape torch.Size([1, 1025, 768]) from checkpoint, the shape in current model is torch.Size([1, 257, 768]).

size mismatch for encoder.to_patch_embedding.1.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([768]).

size mismatch for patch_to_emb.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([768]).

size mismatch for decoder_pos_emb.weight: copying a param with shape torch.Size([1025, 768]) from checkpoint, the shape in current model is torch.Size([257, 768]).

size mismatch for to_pixels.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([768, 768]).

size mismatch for to_pixels.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([768]).

Process a single image or multiple images without GT

Dear Authors,

Your solution seems to be remarkably good and we would like to include in our tests for a new publication. I believe your solution has potential to be among the best tested.

If providing a way of running the code for a single image is too much for now, could you modify your dibco test code in a way that it reads several images without ground-truth?

Looking forward to ability to do Demo...

Hi
I read your paper on DocEnTR with interest and am looking forward to trying a demo (i.e., Using our Pretrained Models To Binarize A Single Degraded Image)... do you know when that will be available?
Thanks!

Validation set

What's the validation set you're training on, is it DIBCO2019?

[question] training epochs and GPU resource cost

I'd like to express my gratitude to the author for the paper and codes.

May I ask, how many GPU hours were actually utilized in training the model presented in the official example using this data? Additionally, how many epochs were trained?
Thanks anyway

What is the `masking_ratio`?

Hello! Could you help me figure out this part of code?
Here is created masking_ratio variable, but it not used anywhere else. What was the intended purpose of these lines? Maybe a different model was meant originally here?

DocEnTR/models/binae.py

Lines 21 to 22 in 2e09b9e

assert masking_ratio > 0 and masking_ratio < 1, 'masking ratio must be kept between 0 and 1'
self.masking_ratio = masking_ratio

Update the Demo

Hi, I'm very interested in your work. Could you update your demo code?

Links to the models do not work

Hi, I tried to download the models today using the links in the repo but it says that the file no longer exists. Were they taken down?

Multi core

Hi, great paper and thank you for sharing the code.

I was able to run test.py. I did some code correction for paths in utils.py. I needed to update path. My root folder for dataset was difrent:
gt_folder = 'data/DIBCOSETS/'+valid_data+'/gt_imgs'

I test it on my PC with NVIDIA GeForce GTX 1660 Ti (6GB GDDR6 memory). In the test folder were 266 images (255x255) and it took ~27sec to process.

The second test was on the CPU. And I can see that use only one core. The time for processing is ~12min=720sec.

Is there a way to run this model prediction on a multi-core CPU and optimize time?

And I'm hoping for code "Process a single image or multiple images without GT" (#2

Thanks again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.