Code Monkey home page Code Monkey logo

deep-learning-for-document-dewarping's Introduction

Docuwarp

Codacy Badge Python version

This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translation. The objective is to take images of documents that are warped, folded, crumpled, etc. and convert the image to a "dewarped" state by using pix2pixHD to train and perform inference. All of the model code is borrowed directly from the pix2pixHD official repository.

Some of the intuition behind doing this is inspired by these two papers:

  1. DocUNet: Document Image Unwarping via A Stacked U-Net (Ma et.al)
  2. Document Image Dewarping using Deep Learning (Ramanna et.al)

Prerequisites

This project requires Python and the following Python libraries installed:

Getting Started

Installation

pip install dominate
  • Clone this repo:
git clone https://github.com/thomasjhuang/deep-learning-for-document-dewarping
cd deep-learning-for-document-dewarping

Training

  • Train the kaggle model with 256x256 crops:
python train.py --name kaggle --label_nc 0 --no_instance --no_flip --netG local --ngf 32 --fineSize 256
  • To view training results, please checkout intermediate results in ./checkpoints/kaggle/web/index.html. If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/kaggle/logs by adding --tf_log to the training scripts.

Training with your own dataset

  • If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity --label_nc N during both training and testing.
  • If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
  • If you don't have instance maps or don't want to use them, please specify --no_instance.
  • The default setting for preprocessing is scale_width, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

Testing

  • Test the model:
python test.py --name kaggle --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256

The test results will be saved to a directory here: ./results/kaggle/test_latest/.

Dataset

  • I use the kaggle denoising dirty documents dataset. To train a model on the full dataset, please download it from the official website. After downloading, please put it under the datasets folder with warped images under the directory name train_A and unwarped images under the directory train_B. Your test images are warped images, and should be under the name test_A. Below is an example dataset directory structure.

        .
        ├── ...
        ├── datasets                  
        │   ├── train_A               # warped images
        │   ├── train_B               # unwarped, "ground truth" images
        │   └── test_A                # warped images used for testing
        └── ...
    

Multi-GPU training

  • Train a model using multiple GPUs (bash ./scripts/train_kaggle_256_multigpu.sh):
#!./scripts/train_kaggle_256_multigpu.sh
python train.py --name kaggle_256_multigpu --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256 --batchSize 32 --gpu_ids 0,1,2,3,4,5,6,7

Training with Automatic Mixed Precision (AMP) for faster speed

  • To train with mixed precision support, please first install apex from: https://github.com/NVIDIA/apex
  • You can then train the model by adding --fp16. For example,
#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16

In my test case, it trains about 80% faster with AMP on a Volta machine.

More Training/Test Details

  • Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.
  • Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag --no_instance.

deep-learning-for-document-dewarping's People

Contributors

codacy-badger avatar rmporsch avatar thomasjhuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-learning-for-document-dewarping's Issues

a problem about the size of the tensor

excuse me, when i try to train this model, i meet a problem, can you help me to solve it?
The problem: "RuntimeError: The size of tensor a (245) must match the size of tensor b (256) at non-singleton dimension 2"
thank you!!!

about training on kaggle dataset

Good day!;)
I try to train model on kaggle dataset, i use a train options from readme ... but get a error ... =(
create web directory ./checkpoints/kaggle/web...
Traceback (most recent call last):
File "train.py", line 71, in
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/Shared drives/UNLIMITED/ducunet/deep-learning-for-document-dewarping-master/models/pix2pixHD_model.py", line 163, in forward
fake_image = self.netG.forward(input_concat)
File "/content/drive/Shared drives/UNLIMITED/ducunet/deep-learning-for-document-dewarping-master/models/networks.py", line 180, in forward
output_prev = model_upsample(model_downsample(input_i) + output_prev)
RuntimeError: The size of tensor a (398) must match the size of tensor b (400) at non-singleton dimension 2

How i can solve it?;)

Is this code applicable to handwritten Chinese character dataset?

I use the script to generate some distorted handwritten Chinese character data sets, and then modify the size of training and original image, but the effect is not good. Does the principle of this code apply to handwritten Chinese character data sets? If so, do you need to modify some parameters?

No data folder

Hi! I have a bit of a struggle trying to reproduce training procedure. As I am aware, data folder is in gitignore file. I've tried to take the same folder from official pix2pix repository that you provided, but it didn't work. May be this is not due to data folder, but anyway help is appreciated

Some questions?

Thanks for this repo!
I have some questions about the results of different methods.
(1) Can pix2pixHD method gets a better result than DocUNet ?
(2) Have you changed some things about the original pix2pixHD method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.