Code Monkey home page Code Monkey logo

uvdoc's Introduction

UVDoc: Neural Grid-based Document Unwarping

Header

This repository contains the code for the "UVDoc: Neural Grid-based Document Unwarping" paper. If you are looking for (more information about) the UVDoc dataset, you can find it here. The full UVDoc paper can be found here.

Three requirements files are provided for the three use cases made available in this repo. Each use case is detailed below.

Demo

Note : Requirements

Before trying to unwarp a document using our model, you need to install the requirements. To do so, we advise you to create a virtual environment. Then run pip install -r requirements_demo.txt.

To try our model (available in this repo at model/best_model.pkl) on your custom images, run the following:

python demo.py --img-path [PATH/TO/IMAGE] 

You can also use a model you trained yourself by specifying the path to the model like this:

python demo.py --img-path [PATH/TO/IMAGE] --ckpt-path [PATH/TO/MODEL]

Model training

Note : Requirements

Before training a model, you need to install the requirements. To do so, we advise you to create a virtual environment. Then run pip install -r requirements_train.txt.

To train a model, you first need to get the data:

  • UVDoc dataset can be accessed here.
  • The Doc3D dataset can be downloaded from here. We augmented this dataset with 2D grids and 3D grids that are available here.

Then, unzip the downloaded archive into the data folder. The final structure of the data folder should be as follows:

data/
├── doc3D
│   ├── grid2D
│   ├── grid3D
│   ├── bm
│   └── img
└── UVDoc
    ├── grid2d
    ├── grid3d
    ├── img
    ├── img_geom
    ├── metadata_geom
    ├── metadata_sample
    ├── seg
    ├── textures
    ├── uvmap
    ├── warped_textures
    └── wc

Once this is done, run the following:

python train.py

Several hyperparameters, such as data augmentations, number of epochs, learning rate, or batch size can be tuned. To learn about them, please run the following:

python train.py --help

Evaluation

Note : Requirements

Before evaluating a model, you need to install the requirements. To do so, we advise you to create a virtual environment. Then run pip install -r requirements_eval.txt.

You will also need to install matlab.engine, to allow interfacing matlab with python. To do so, you first need to find the location of your matlab installation (for instance, by running matlabroot from within matlab). Then go to <matlabroot>/extern/engines/python and run python setup.py install. You can open a python prompt and run import matlab.engine followed by eng = matlab.engine.start_matlab() to see if it was successful.

Finally you might need to install tesseract via sudo apt install tesseract-ocr libtesseract-dev.

You can easily evaluate our model or a model you trained yourself using the provided script. Our model is available in this repo at model/best_model.pkl.

DocUNet benchmark

To make predictions using a model on the DocUNet benchmark, please first download the DocUNet Benchmark (available here) and place it under data to have the following structure:

data/
└── DocUNet
    ├── crop
    ├── original
    └── scan

Then run:

python docUnet_pred.py --ckpt-path [PATH/TO/MODEL]

This will create a docunet folder next to the model, containing the unwarped images.

Then to compute the metrics over these predictions, please run the following:

python docUnet_eval.py --pred-path [PATH/TO/UNWARPED]

UVDoc benchmark

To make predictions using a model on the UVDoc benchmark, please first download the UVDoc Benchmark (available here) and place it under data to have the following structure:

data/
└── UVDoc_benchmark
    ├── grid2d
    ├── grid3d
    └── ...

Then run:

python uvdocBenchmark_pred.py --ckpt-path [PATH/TO/MODEL]

This will create a output_uvdoc folder next to the model, containing the unwarped images.

Then to compute the metrics over these predictions, please run the following:

python uvdocBenchmark_eval.py --pred-path [PATH/TO/UNWARPED]

❗ Erratum

The MS-SSIM and AD values for the UVDoc benchmark reported in our paper mistakenly were calculated based on only half of the UVDoc benchmark (for our method as well as related works). We here report the old and the corrected values on the entire UVDoc benchmark:

✅ New ✅ MS-SSIM AD
DewarpNet 0.589 0.193
DocTr 0.697 0.160
DDCP 0.585 0.290
RDGR 0.610 0.280
DocGeoNet 0.706 0.168
Ours 0.785 0.119
❌ Old ❌ MS-SSIM AD
DewarpNet 0.6 0.189
DocTr 0.684 0.176
DDCP 0.591 0.334
RDGR 0.603 0.314
DocGeoNet 0.714 0.167
Ours 0.784 0.122

Resulting images

You can download the unwarped images that we used in our paper:

Citation

If you used this code or the UVDoc dataset, please consider citing our work:

@inproceedings{UVDoc,
title={{UVDoc}: Neural Grid-based Document Unwarping},
author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
booktitle = {SIGGRAPH ASIA, Technical Papers},
year = {2023},
url={https://doi.org/10.1145/3610548.3618174}
}

uvdoc's People

Contributors

tanguymagne avatar floorverhoeven avatar

Stargazers

CIH avatar Mark Baumann avatar Xelawk avatar lismin avatar George avatar German Novikov avatar Branislav Hesko avatar  avatar wzhu avatar HERIUN avatar RoadoneP avatar  avatar Zhang Chang avatar  avatar han miaolin avatar  avatar  avatar NANKI Haruo avatar  avatar yuanxu avatar  avatar xuyouzheng avatar Shao-Feng Zeng avatar  avatar  avatar Seanghay Yath (上海) avatar goofy.gf avatar  avatar Jd Kim avatar moon_river avatar Ruilu Wang avatar ferry avatar  avatar Sota Ikejima avatar YvanKOB avatar Felix Dittrich avatar T2K-Felix avatar Chaodong Zhang avatar Phạm Văn Lĩnh avatar  avatar Hieu Bui avatar Luong Huu Thanh Nam avatar Phan Hoang avatar  avatar  avatar  avatar Wuyue avatar  avatar Bongseok Yang avatar kllis avatar phongngrbl avatar Frank Liu avatar  avatar  avatar fireae avatar LiuZhuang avatar whiteless9 avatar Hao Feng avatar Steptian avatar W.G.Zhang avatar Jiahao Shao avatar Jisu Hwang avatar  avatar  avatar Max avatar  avatar Chaoyun avatar WWmore avatar  avatar lm avatar

Watchers

 avatar

uvdoc's Issues

D435 camera

Hello author, can we use Intel Realsense D415 camera instead of D435? Thanks for your help!

2D and 3D grid data of Doc3D dataset

Thank you for your excellent work. I really appreciate that you have open-sourced your project. I'd like to inquire about how the additional 2D and 3D grid data for the Doc3D dataset was obtained. If possible, I would appreciate it if you could share the code used to derive the 2D and 3D data you proposed from the original Doc3D dataset. Thank you very much.

Train test split

Thank you for your work!
I received the following error. "FileNotFoundError: [Errno 2] No such file or directory: './data/doc3D/traindoc.txt' when trying to replicate the training of your model. In the dataloader there are separate files for specifying the training and validation data for Doc3D. Can you please specify the split you used? Or could you please share the .txt file format?

Custom data preparation

Hello
Thank you for your great work>
Could you please let me know how can I train my custom dataset ?
I have some samples documents images. so I wants to make training data using sample images.

Custom data

Thank you for sharing such an amazing project with the open-source community.
I'd like to ask a question regarding the creation of grid2D and grid3D from the Doc3D dataset. How did you generate them? I'm currently experimenting with training your model using a custom dataset, however, my dataset only has BM labels, and I'm encountering issues with generating grid3D labels.

Once again, thank you for your attention and time to answer my question.

Reproduce training results

Has anyone trained to reproduce the author's results? When I trained, I found that the results were very poor. It was difficult for the model to learn the change results of [-1,1].

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.