royalvice / docdiff Goto Github PK

ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along with their corresponding binary masks.

Home Page: https://www.aibupt.com/

License: MIT License

Python 100.00%

deblurring diffusion-models document-binarization image-translation img2img image-to-image low-level-vision super-resolution documentation-tool ocr

docdiff's Introduction

Hi there 👋

I'm YZY

Research Interests

Low-level Computer Vision
- Image Generation
- Image Restoration
- Diffusion Models & GANs
Computer Graphics
- 3D Rendering
- 3D Human
- Neural Rendering
Hardware
- Light Field Displays

LEETCODE Rank

docdiff's People

Contributors

Stargazers

Watchers

Forkers

jinxiqinghuan thinh-huynh-re samaritan1998 dtiku-cn guoqingru0911 yys674 nguyenlecong chenhuayou ryan315 chirag-mphasis eltociear le1surels greatv huangweiboy2 songuyenerza hbulaoma wujinlonglovezhangmiao1314 duyuankai1992 aniketgurav

docdiff's Issues

关于去除印章数据集背景选取的问题

请问在印章去除任务中选择的是怎样大小的背景呢？背景的大小是和印章相同，还是比印章稍微大一点？

关于论文里扩散学习的优化目标问题

你好, 很棒的工作!
论文里 $L_{DM}$ 里是去预测 $x_0$, 也就是 $x_{res} $, 而不是去预测噪声. 论文里也解释了为什么怎么做. 但我感觉二者还是等价的, 先前的一些论文的这种 channel-wise concatenation conditioning 也是去预测噪声的.

关于图片的输入大小

请问inference中的图片时必须输入是304304的大小吗，假如我有1024768需要先将他切成多个304304的patch，过模型之后再将它拼起来吗。假如我想训练自己的数据集，是先要将他们都切成304304吗？

Inference on variable image size

Hi,
First off thanks for sharing your work. My question is that does DocDiff work on different document image sizes, e.g: image of an entire document page or it only works on small square patches?

Thanks!

Inference Issue

This issue is coming during inference phase of this model for every image

File "/home/mepluser1/rahul_hanot/try_new/DocDiff/model/DocDiff.py", line 315, in forward
x = torch.cat((x, s), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list.
I

关于预训练模型

请问放出来的预训练模型只是去模糊任务的吗？其他的子任务的预训练模型会放出来吗？

您好，看到文档里没有提及印章去除模型的推理，若想使用该功能该如何修改源代码

请教一下训练时长

我现在用您给定的conf.yaml 在V100单卡上训练(目前好像多卡的代码还没有上传），目前速度大概是8个小时 10000iterations，这个速度正常吗，按照这个速度训练完 1000000iteration需要几十天？

# model
IMAGE_SIZE : [304, 304]   # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3             # input channel
CHANNEL_Y : 3             # output channel
TIMESTEPS : 100           # diffusion steps
SCHEDULE : 'linear'       # linear or cosine
MODEL_CHANNELS : 32       # basic channels of Unet
NUM_RESBLOCKS : 1         # number of residual blocks
CHANNEL_MULT : [1,2,3,4]  # channel multiplier of each layer
NUM_HEADS : 1

MODE : 1                  # 1 Train, 0 Test
PRE_ORI : 'True'          # if True, predict $x_0$, else predict $\epsilon$.


# train
PATH_GT :        # path of ground truth
PATH_IMG :     # path of input
BATCH_SIZE : 32           # training batch size
NUM_WORKERS : 8           # number of workers
ITERATION_MAX : 1000000   # max training iteration
LR : 0.0001               # learning rate
LOSS : 'L2'               # L1 or L2
EMA_EVERY : 100           # update EMA every EMA_EVERY iterations
START_EMA : 2000          # start EMA after START_EMA iterations
SAVE_MODEL_EVERY : 10000  # save model every SAVE_MODEL_EVERY iterations
EMA: 'True'               # if True, use EMA
CONTINUE_TRAINING : 'False'               # if True, continue training
CONTINUE_TRAINING_STEPS : 10000           # continue training from CONTINUE_TRAINING_STEPS
PRETRAINED_PATH_INITIAL_PREDICTOR : /home/lcw/DocDiff/checksave/init.pth'    # path of pretrained initial predictor
PRETRAINED_PATH_DENOISER : '/home/lcw/DocDiff/checksave/denoiser.pth'             # path of pretrained denoiser
WEIGHT_SAVE_PATH : './checksave_LCW'          # path to save model
TRAINING_PATH : './Training'              # path of training data
BETA_LOSS : 50            # hyperparameter to balance the pixel loss and the diffusion loss
HIGH_LOW_FREQ : 'True'    # if True, training with frequency separation

Issue with inference

Hi, can you help me out with the inference code for super resolution of my test images, I am trying to implement your inference.py in Colab, will it work there, any particular comments or suggestions to implement this? @Royalvice @eltociear
Thanks.

训练集规模以及学习率

作者您好，我想问一下训练去印章的时候数据集的规模大概是多少呢？以及学习率和batchsize相关吗？我如果调小batchsize的话，学习率是否需要在conf.yml中手动缩放？谢谢！

Code for the synthetic dataset

Hi,
I wonder when you release the synthesis code for watermark masks and the mask images for the seals?

Thanks!

Questions about data

Hi, When I train model on my dataset, I found there is a question.
I wonder what the difference between the input and training data.
In my view, input is a blurred one and ground truth is a sharp one.
So what about the training data？Is it same with ground truth?

您好，火眼OCR APP的redirect_uri参数错误

增加有更多噪声的数据集

这个开源项目提供了一些生成噪声的预处理流程。
https://github.com/sparkfish/shabby-pages
他提供了一个pipeline，用于合成各种噪声的图片。

watermark removal

Hi,
I tested the denoiser model on a watermarked document image(using the demo notebook) and the results show the watermark is not removed:
Left to right: img, init_predict.cpu(), min_max(sampledImgs), and finalImgs

Native:

Non-native:

Is there a different model for watermark removal task?

去印章的预训练模型

Hi,
First off thanks for sharing your work. 请问一下去印章的预训练模型大概多久会share出来？

License

Hi
Thanks for sharing your work. Can you please share its license with us?

Thanks!
@Royalvice

尊敬的作者您好！我想请问一下关于去模糊数据集的划分问题

您好，关于文章中提到的：“We randomly select 30,000 patches for training and 10,000 patches for testing.”，请问数据集具体划分方式能否提供，或者这个随机划分会对恢复的效果有较大影响吗？谢谢！

Inference time

Can you please add inference time information for different configurations and video cards?

图片降噪问题

大佬好，我这边用了这张图片进行降噪，发现好像比之前还模糊的。
原图：

结果：

RuntimeError: Sizes of tensors

I tried to test your model and got an inference in Colab, but I got an error:
File "/content/DocDiff/model/DocDiff.py", line 315, in forward x = torch.cat((x, s), dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 440 but got size 439 for tensor number 1 in the list.

 315        print(x.shape, s.shape)
 316        x = torch.cat((x, s), dim=1)
------------------------------------------------------------------------------
torch.Size([1, 128, 220, 160]) torch.Size([1, 128, 220, 160])
torch.Size([1, 128, 220, 160]) torch.Size([1, 96, 220, 160])
torch.Size([1, 128, 440, 320]) torch.Size([1, 96, 439, 319])

My conf.yml

# model
IMAGE_SIZE : [128, 128]   # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.
CHANNEL_X : 3             # input channel
CHANNEL_Y : 3             # output channel
TIMESTEPS : 100           # diffusion steps
SCHEDULE : 'linear'       # linear or cosine
MODEL_CHANNELS : 32       # basic channels of Unet
NUM_RESBLOCKS : 1         # number of residual blocks
CHANNEL_MULT : [1,2,3,4]  # channel multiplier of each layer
NUM_HEADS : 1

MODE : 0                  # 1 Train, 0 Test
PRE_ORI : 'True'          # if True, predict $x_0$, else predict $\epsilon$.

# test
NATIVE_RESOLUTION : 'False'               # if True, test with native resolution
DPM_SOLVER : 'False'      # if True, test with DPM_solver
DPM_STEP : 20             # DPM_solver step
BATCH_SIZE_VAL : 1        # test batch size
TEST_PATH_GT : '/content/drive/MyDrive/wight/data/'         # path of ground truth
TEST_PATH_IMG : '/content/drive/MyDrive/wight/data/'        # path of input
TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/content/drive/MyDrive/wight/init_predictor_document_deblurring.pth'   # path of initial predictor
TEST_DENOISER_WEIGHT_PATH : '/content/drive/MyDrive/wight/denoiser_document_deblurring.pth'            # path of denoiser
TEST_IMG_SAVE_PATH : './results'

Input image has size [1654, 2339, 3]

Could you help me solve the problem?

SYNTHETIC DATASETS

Thanks a lot for your work! Will you release the synthetic datasets you used in the paper

About config.PRE_ORI

Hi Yang, thank you for your kind words about DocDiff! As mentioned in the readme under "Notes!", we have been working on applying DocDiff to natural scenes with pattern diversity. We made modifications to the config.yml file by setting PRE_ORI: 'False' and TIMESTEPS: 1000. However, we encountered some problems.

In the trainer.py file of the DocDiff code, specifically lines 189 to 194, we have the following code snippet:

if self.pre_ori == 'True':
if self.high_low_freq == 'True':
residual_high = self.high_filter(gt.to(self.device) - init_predict)
ddpm_loss = 2*self.loss(self.high_filter(noise_pred), residual_high) + self.loss(noise_pred, gt.to(self.device) - init_predict)
else:
ddpm_loss = self.loss(noise_pred, gt.to(self.device) - init_predict)
else:
ddpm_loss = self.loss(noise_pred, noise_ref.to(self.device))
When self.pre_ori is set to 'False', the ddpm_loss causes noise_pred to learn noise_ref. However, noise_ref represents the added noise. During the training stage, the visualization of 'noise_pred.cpu() + init_predict.cpu()' will result in a noisy init_prediction!

It seems that the issue lies in the visualization step, where the noisy init_prediction is being displayed.