Code Monkey home page Code Monkey logo

royalvice / docdiff Goto Github PK

View Code? Open in Web Editor NEW
189.0 4.0 19.0 79.26 MB

ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along with their corresponding binary masks.

Home Page: https://www.aibupt.com/

License: MIT License

Python 100.00%
deblurring diffusion-models document-binarization image-translation img2img image-to-image low-level-vision super-resolution documentation-tool ocr

docdiff's Introduction

Hi there 👋

I'm YZY

Research Interests

  • Low-level Computer Vision
    • Image Generation
    • Image Restoration
    • Diffusion Models & GANs
  • Computer Graphics
    • 3D Rendering
    • 3D Human
    • Neural Rendering
  • Hardware
    • Light Field Displays

LEETCODE Rank

docdiff's People

Contributors

eltociear avatar royalvice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

docdiff's Issues

推理时长?

请问一下推理时长有测过么,和有和Gan对比过嘛?

关于论文里扩散学习的优化目标问题

你好, 很棒的工作!
论文里 $L_{DM}$ 里是去预测 $x_0$, 也就是 $x_{res} $, 而不是去预测噪声. 论文里也解释了为什么怎么做. 但我感觉二者还是等价的, 先前的一些论文的这种 channel-wise concatenation conditioning 也是去预测 噪声的.

关于图片的输入大小

请问inference中的图片时必须输入是304304的大小吗,假如我有1024768需要先将他切成多个304304的patch,过模型之后再将它拼起来吗。假如我想训练自己的数据集,是先要将他们都切成304304吗?

Inference on variable image size

Hi,
First off thanks for sharing your work. My question is that does DocDiff work on different document image sizes, e.g: image of an entire document page or it only works on small square patches?

Thanks!

Inference Issue

This issue is coming during inference phase of this model for every image

File "/home/mepluser1/rahul_hanot/try_new/DocDiff/model/DocDiff.py", line 315, in forward
x = torch.cat((x, s), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list.
I

关于预训练模型

请问放出来的预训练模型只是去模糊任务的吗?其他的子任务的预训练模型会放出来吗?

请教一下训练时长

我现在用您给定的conf.yaml 在V100单卡上训练(目前好像多卡的代码还没有上传),目前速度大概是8个小时 10000iterations,这个速度正常吗,按照这个速度训练完 1000000iteration需要几十天?

# model
IMAGE_SIZE : [304, 304]   # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3             # input channel
CHANNEL_Y : 3             # output channel
TIMESTEPS : 100           # diffusion steps
SCHEDULE : 'linear'       # linear or cosine
MODEL_CHANNELS : 32       # basic channels of Unet
NUM_RESBLOCKS : 1         # number of residual blocks
CHANNEL_MULT : [1,2,3,4]  # channel multiplier of each layer
NUM_HEADS : 1

MODE : 1                  # 1 Train, 0 Test
PRE_ORI : 'True'          # if True, predict $x_0$, else predict $\epsilon$.


# train
PATH_GT :        # path of ground truth
PATH_IMG :     # path of input
BATCH_SIZE : 32           # training batch size
NUM_WORKERS : 8           # number of workers
ITERATION_MAX : 1000000   # max training iteration
LR : 0.0001               # learning rate
LOSS : 'L2'               # L1 or L2
EMA_EVERY : 100           # update EMA every EMA_EVERY iterations
START_EMA : 2000          # start EMA after START_EMA iterations
SAVE_MODEL_EVERY : 10000  # save model every SAVE_MODEL_EVERY iterations
EMA: 'True'               # if True, use EMA
CONTINUE_TRAINING : 'False'               # if True, continue training
CONTINUE_TRAINING_STEPS : 10000           # continue training from CONTINUE_TRAINING_STEPS
PRETRAINED_PATH_INITIAL_PREDICTOR : /home/lcw/DocDiff/checksave/init.pth'    # path of pretrained initial predictor
PRETRAINED_PATH_DENOISER : '/home/lcw/DocDiff/checksave/denoiser.pth'             # path of pretrained denoiser
WEIGHT_SAVE_PATH : './checksave_LCW'          # path to save model
TRAINING_PATH : './Training'              # path of training data
BETA_LOSS : 50            # hyperparameter to balance the pixel loss and the diffusion loss
HIGH_LOW_FREQ : 'True'    # if True, training with frequency separation

Issue with inference

Hi, can you help me out with the inference code for super resolution of my test images, I am trying to implement your inference.py in Colab, will it work there, any particular comments or suggestions to implement this? @Royalvice @eltociear
Thanks.

训练集规模以及学习率

作者您好,我想问一下训练去印章的时候数据集的规模大概是多少呢?以及学习率和batchsize相关吗?我如果调小batchsize的话,学习率是否需要在conf.yml中手动缩放?谢谢!

Questions about data

Hi, When I train model on my dataset, I found there is a question.
I wonder what the difference between the input and training data.
In my view, input is a blurred one and ground truth is a sharp one.
So what about the training data?Is it same with ground truth?

watermark removal

Hi,
I tested the denoiser model on a watermarked document image(using the demo notebook) and the results show the watermark is not removed:
Left to right: img, init_predict.cpu(), min_max(sampledImgs), and finalImgs

Native:
native

Non-native:
non

Is there a different model for watermark removal task?

去印章的预训练模型

Hi,
First off thanks for sharing your work. 请问一下去印章的预训练模型大概多久会share出来?

License

Hi
Thanks for sharing your work. Can you please share its license with us?

Thanks!
@Royalvice

Inference time

Can you please add inference time information for different configurations and video cards?

图片降噪问题

大佬好,我这边用了这张图片进行降噪,发现好像比之前还模糊的。
原图:
608_608
结果:
Uploading finalImgs.png…

RuntimeError: Sizes of tensors

I tried to test your model and got an inference in Colab, but I got an error:
File "/content/DocDiff/model/DocDiff.py", line 315, in forward x = torch.cat((x, s), dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 440 but got size 439 for tensor number 1 in the list.

 315        print(x.shape, s.shape)
 316        x = torch.cat((x, s), dim=1)
------------------------------------------------------------------------------
torch.Size([1, 128, 220, 160]) torch.Size([1, 128, 220, 160])
torch.Size([1, 128, 220, 160]) torch.Size([1, 96, 220, 160])
torch.Size([1, 128, 440, 320]) torch.Size([1, 96, 439, 319])

My conf.yml

# model
IMAGE_SIZE : [128, 128]   # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3             # input channel
CHANNEL_Y : 3             # output channel
TIMESTEPS : 100           # diffusion steps
SCHEDULE : 'linear'       # linear or cosine
MODEL_CHANNELS : 32       # basic channels of Unet
NUM_RESBLOCKS : 1         # number of residual blocks
CHANNEL_MULT : [1,2,3,4]  # channel multiplier of each layer
NUM_HEADS : 1

MODE : 0                  # 1 Train, 0 Test
PRE_ORI : 'True'          # if True, predict $x_0$, else predict $\epsilon$.

# test
NATIVE_RESOLUTION : 'False'               # if True, test with native resolution
DPM_SOLVER : 'False'      # if True, test with DPM_solver
DPM_STEP : 20             # DPM_solver step
BATCH_SIZE_VAL : 1        # test batch size
TEST_PATH_GT : '/content/drive/MyDrive/wight/data/'         # path of ground truth
TEST_PATH_IMG : '/content/drive/MyDrive/wight/data/'        # path of input
TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/content/drive/MyDrive/wight/init_predictor_document_deblurring.pth'   # path of initial predictor
TEST_DENOISER_WEIGHT_PATH : '/content/drive/MyDrive/wight/denoiser_document_deblurring.pth'            # path of denoiser
TEST_IMG_SAVE_PATH : './results'  

Input image has size [1654, 2339, 3]

Could you help me solve the problem?

SYNTHETIC DATASETS

Thanks a lot for your work! Will you release the synthetic datasets you used in the paper

About config.PRE_ORI

Hi Yang, thank you for your kind words about DocDiff! As mentioned in the readme under "Notes!", we have been working on applying DocDiff to natural scenes with pattern diversity. We made modifications to the config.yml file by setting PRE_ORI: 'False' and TIMESTEPS: 1000. However, we encountered some problems.

In the trainer.py file of the DocDiff code, specifically lines 189 to 194, we have the following code snippet:

if self.pre_ori == 'True':
if self.high_low_freq == 'True':
residual_high = self.high_filter(gt.to(self.device) - init_predict)
ddpm_loss = 2*self.loss(self.high_filter(noise_pred), residual_high) + self.loss(noise_pred, gt.to(self.device) - init_predict)
else:
ddpm_loss = self.loss(noise_pred, gt.to(self.device) - init_predict)
else:
ddpm_loss = self.loss(noise_pred, noise_ref.to(self.device))
When self.pre_ori is set to 'False', the ddpm_loss causes noise_pred to learn noise_ref. However, noise_ref represents the added noise. During the training stage, the visualization of 'noise_pred.cpu() + init_predict.cpu()' will result in a noisy init_prediction!

It seems that the issue lies in the visualization step, where the noisy init_prediction is being displayed.

License

Hi
Thanks for sharing your work. Can you please share its license with us?

@Royalvice

Model deployment issue

How can we convert these models into ONNX format and deploy them could you please provide code for ONNX inference

有关梯度回传的问题

作者您好,有个地方想向您请教一下。
在DocDiff类的forward中,有x__ = self.denoiser(torch.cat((noisy_image, x_.clone().detach()), dim=1), t),从代码来看,我目前理解的是noisy_image依然是带有x_的信息的,即最终的预测还是会将扩散的loss回传到第一个Unet,但是后面的x_却detach了一下,所以想问一下这里是有特意设计吗?谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.