cswry / osediff Goto Github PK

View Code? Open in Web Editor NEW

119.0 8.0 5.0 34.6 MB

Python 99.94% Shell 0.06%

osediff's Introduction

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

Rongyuan Wu^1,2,* Lingchen Sun^1,2,* Zhiyuan Ma^1,* Lei Zhang^1,2,†

¹The Hong Kong Polytechnic University, ²OPPO Research Institute

[paper]

🔥 News

[2024.07] Release OSEDiff-SD21base.
[2024.06] This repo is created.

🎬 Overview

🔧 Dependencies and Installation

Clone repo

git clone https://github.com/cswry/OSEDiff.git
cd OSEDiff

Install dependent packages

conda create -n OSEDiff python=3.10 -y
conda activate OSEDiff
pip install --upgrade pip
pip install -r requirements.txt

Download Models

Dependent Models

⚡ Quick Inference

python test_osediff.py \
-i preset/datasets/test_dataset/input \
-o preset/datasets/test_dataset/output \
--osediff_path preset\models\osediff.pkl \
--pretrained_model_name_or_path SD21BASE_PATH \
--ram_ft_path DAPE_PATH \
--ram_path RAM_PATH

📷 Results

Quantitative Comparisons (click to expand)

Visual Comparisons (click to expand)

📧 Contact

If you have any questions, please feel free to contact: [email protected]

🎓Citations

@article{wu2024one,
  title={One-Step Effective Diffusion Network for Real-World Image Super-Resolution},
  author={Wu, Rongyuan and Sun, Lingchen and Ma, Zhiyuan and Zhang, Lei},
  journal={arXiv preprint arXiv:2406.08177},
  year={2024}
}

statistics

osediff's People

Contributors

Stargazers

Watchers

Forkers

lijingle-coder dyan-dy brucelil zhtjtcz quas-modo

osediff's Issues

About the implementation of the method

Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.

Hi,
Thank you for wonderfull work.
On windows system, this error is present during code execution:

python test_osediff.py -i input -o output --osediff_path preset/models/osediff.pkl --pretrained_model_name_or_path preset/models/stable-diffusion-2-1-base/ --ram_ft_path preset/models/DAPE.pth --ram_path preset/models/ram_swin_large_14m.pth
Traceback (most recent call last):
  File "C:\Users\Miki\OSEDiff\test_osediff.py", line 68, in <module>
    DAPE = ram(pretrained=args.ram_path,
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 329, in ram
    model = RAMLora(**kwargs)
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 109, in __init__
    self.tokenizer = init_tokenizer()
  File "C:\Users\Miki\OSEDiff\ram\models\utils.py", line 132, in init_tokenizer
    tokenizer = BertTokenizer.from_pretrained('/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased', local_files_only=True)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\tokenization_utils_base.py", line 1770, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.

as a temporarily solution I use utils.py from your previous repo:
https://github.com/cswry/SeeSR/blob/main/ram/models/utils.py

And everything works. :)

fp16inference + tile option?

Is it possible to implement?
I can upscale images only up to 640x480pixels. (PC specs RTX 3090 24GB, 64GB RAM, Ryzen 7950X)
Thanks

Output results not as good as SeeSR?

Thought the output should be better, or at least the same but faster? Tried with many images that I tested SeeSR with and all results with OSEDiff are worse. :(

Input:

SeeSR: (Using SD-Turbo)

OSEDiff:

Facial details are not very good and texture of the wall is gone. Only the gray part of the outfit is much better. Not sure why it isn't as good?

Thank you for your hard work on this, just not understanding why it isn't as good as SeeSR. It is faster however!

How these indicators such as PSNR, SSIM and FID are calculated?

Hi, I would like to ask if there is a calculation code for PSNR, SSIM, FID, LPIPS, DISTS, MUSIQ and other related evaluation indicators, can you share it，thank you.

Calculate the metrics problem

I read the quantitative comparison in your paper, why are some of the evaluation metrics of the comparison method different from those in the original paper, such as the FID score of the PASD model, and the values are not a little different, but an order of magnitude difference. I found that you all use pyiqa to calculate the metrics, so why are the calculated results so different?

Whether release training code or not?

Thanks for your wonderful work!!!
The balance of effectiveness and efficiency of OSEDiff is very shocked!!!
I trained our network with vsd loss, but maybe my training code have some difference with you, the results are terrible, so would you release training code in the future?
Hope your reply!

AttributeError: 'UNet2DConditionModel' object has no attribute 'add_adapter'.

i have a question that when i run test_osediff.py, an error occurred: AttributeError: 'UNet2DConditionModel' object has no attribute 'add_adapter'. Did you mean: 'set_adapters'?
according to the error log, the error line is self.unet.add_adapter(lora_conf_encoder, adapter_name="default_encoder") in the following method:
def load_ckpt(self, model):
# load unet lora
lora_conf_encoder = LoraConfig(r=model["rank_unet"], init_lora_weights="gaussian",
target_modules=model["unet_lora_encoder_modules"])
lora_conf_decoder = LoraConfig(r=model["rank_unet"], init_lora_weights="gaussian",
target_modules=model["unet_lora_decoder_modules"])
lora_conf_others = LoraConfig(r=model["rank_unet"], init_lora_weights="gaussian",
target_modules=model["unet_lora_others_modules"])
self.unet.add_adapter(lora_conf_encoder, adapter_name="default_encoder")
self.unet.add_adapter(lora_conf_decoder, adapter_name="default_decoder")
self.unet.add_adapter(lora_conf_others, adapter_name="default_others")

why the UNet2DConditionModel object does not define the 'add_adapter' method? My diffusers version is 0.25.0.
please help, thanks!

Support for FLUX?

Could support be added for FLUX in both OSEDiff and SeeSR?

https://huggingface.co/black-forest-labs

Thank you!

Is there any difference between your LoRA and img2img turbo?

Nice work! Is there any difference between your LoRA and img2img turbo? The code is different.

Question about MANIQA Metric

Great job! But when I reproduced it, I found that there was a significant difference in the MANIQA metric. May I ask which version of MANIQA you are using? Others,could you provide the code for the metric testing.

Excited but hope issues from SeeSR are addressed

Primarily this one, results looking light a painting, and color banding on results: cswry/SeeSR#50

OSEDiff video upscaling script

Hi. I made a batch video upscaling script using OSEDiff. The script assumes the models are in the default location (you can specify them at runtime or edit the path as needed). The script loads video files from the input folder, extracts frames, upscales them using OSEDiff, then merges the frames and the original audio file into an output video file in the output directory and then deletes the temporary files. Keep in mind that the ffv1 codec is used with maximum quality so the output file can be larger. You can change the codec yourself if you wish. To run the script you need ffmpeg and the added path variable in the variables so that ffmpeg can be executed directly.

The results are not perfect because the model is not intended for video files, but it can help others test the script. Cheers

https://pastebin.com/XdNZAKr3

关于网络使用的问题

您好，感谢您的贡献。想咨询下使用超分网络做domain transfer任务是否合适，比如将夜间的图像转换成白天，或者将A手机拍摄的图片转换成B手机

May I ask when the author will upload the complete code

questions about the output.

Hello, I'm sorry to bother you. When I use landscape or flower images, there is no output result.

For human images, the colors of the eyes and lips become vivid. Is there any way to adjust this?

Is Lora training necessary?

Is Lora training necessary? What would happen if it were changed to full parameter fine-tuning? How do you view this

SDXL-Turbo?

Will OSEDiff support SDXL-Turbo?

Thanks. Sorry for many questions

when trying to adapt to sdxl I am unable to properly add time_ids to UNet2DConditionModel

    elif self.config.addition_embed_type == "text_time":
        # SDXL - style
        if "text_embeds" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `text_embeds` to be passed in `added_cond_kwargs`"
            )
        text_embeds = added_cond_kwargs.get("text_embeds")
        if "time_ids" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `time_ids` to be passed in `added_cond_kwargs`"
            )
        time_ids = added_cond_kwargs.get("time_ids")
        time_embeds = self.add_time_proj(time_ids.flatten())
        time_embeds = time_embeds.reshape((text_embeds.shape[0], -1))
        add_embeds = torch.concat([text_embeds, time_embeds], dim=-1)
        add_embeds = add_embeds.to(emb.dtype)
        aug_emb = self.add_embedding(add_embeds)
        
      
        model_pred = self.unet(
            lq_latent,
            self.timesteps,
            encoder_hidden_states=prompt_embeds,
            added_cond_kwargs={
                "text_embeds": prompt_embeds,
                "time_ids": self.timesteps,
            },
        ).sample
        
        
        self.timesteps as placeholder as this is not the correct value

windows error：expected str, bytes or os.PathLike object, not NoneType

Hi anyone meets this error？
include_dir += [os.path.join(os.environ.get("CUDA_PATH"), "include")]
File "C:\Users\Demo-NT\AppData\Local\anaconda3\envs\OSEDIFF\lib\ntpath.py", line 104, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Some confusion about VSD loss implementation

Hi, thanks for your wonderful work~
I'm a little confused about the implemention of vsd loss,
I followed your paper and read ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
I thought vsd loss is pixel-wise grad by net to input LQ, hence it's pixel-wise calcalation between pretrained_regularizer's output and finetuned regularizer's output, however lpips and mse loss is a scalar, i'm really confused about the implementation of vsd loss and how to apply with data loss?
Hope for your reply~
ps: the pic is ProlificDreamer's implementation

请问支持diffusers包升级到最新吗？

之前按照requirements.txt指定版本装的，请问diffusers可以升级到最新么？谢谢！

Question: SD3 support

Will either OSEDiff or SeeSR at some point support SD3?

Thank you.

code and weights

Hi, could you please share the code and weights? Thx! @cswry

Reproduce the training process for multiple losses encounters "RuntimeError: Expected to mark a variable ready only once"

This is a very solid job, I reproduce the training process based on the paper and previous issues.

Among them, I use the following method to implement multiple losses, including loss_data, vsd_loss and loss_diff. Then by controlling whether the gradient is trainable, loss_diff only updates unet_reg.

`
loss_data_mse = F.mse_loss(output_image, pixel_values)
loss_data_lpips = lpips_loss(output_image, pixel_values).mean()

loss_data = loss_data_mse + 2*loss_data_lpips

w = (1 - alphas[timesteps])
w = w.view(bsz, 1, 1, 1)
grad = w*(noise_pred_reg_lora.detach()- noise_pred_reg_frozen)
grad = torch.nan_to_num(grad)

loss_vds = SpecifyGradient.apply(latents, grad)

loss_diff = F.mse_loss(noise_pred_reg_lora.float(), target.float(), reduction="mean")

loss_lora = loss_data + loss_vds

for param in unet_reg.parameters():
param.requires_grad = False

for param in unet_lora.parameters():
param.requires_grad = original_trainable_status['unet_lora'][param]
for param in vae.parameters():
param.requires_grad = original_trainable_status['vae'][param]

accelerator.backward(loss_lora, retain_graph=True)

for param in unet_reg.parameters():
param.requires_grad = original_trainable_status['unet_reg'][param]
for param in unet_lora.parameters():
param.requires_grad = False
for param in vae.parameters():
param.requires_grad = False

accelerator.backward(loss_diff)

if accelerator.sync_gradients:
params_to_clip = params_to_optimize
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)

if accelerator.sync_gradients:
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad(set_to_none=args.set_grads_to_none)

for param in unet_lora.parameters():
param.requires_grad = original_trainable_status['unet_lora'][param]
for param in unet_reg.parameters():
param.requires_grad = original_trainable_status['unet_reg'][param]
for param in vae.parameters():
param.requires_grad = original_trainable_status['vae'][param]

However, when training with accelerator’s DDP, I keep encountering the following error. Can anyone please guide me on how to solve it? Thank you very much.

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.