Code Monkey home page Code Monkey logo

osediff's Introduction

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

1The Hong Kong Polytechnic University, 2OPPO Research Institute 

[paper]


🔥 News

  • [2024.07] Release OSEDiff-SD21base.
  • [2024.06] This repo is created.

🎬 Overview

overview

🔧 Dependencies and Installation

  1. Clone repo

    git clone https://github.com/cswry/OSEDiff.git
    cd OSEDiff
  2. Install dependent packages

    conda create -n OSEDiff python=3.10 -y
    conda activate OSEDiff
    pip install --upgrade pip
    pip install -r requirements.txt
  3. Download Models

Dependent Models

⚡ Quick Inference

python test_osediff.py \
-i preset/datasets/test_dataset/input \
-o preset/datasets/test_dataset/output \
--osediff_path preset\models\osediff.pkl \
--pretrained_model_name_or_path SD21BASE_PATH \
--ram_ft_path DAPE_PATH \
--ram_path RAM_PATH

📷 Results

benchmark

Quantitative Comparisons (click to expand)

Visual Comparisons (click to expand)

📧 Contact

If you have any questions, please feel free to contact: [email protected]

🎓Citations

@article{wu2024one,
  title={One-Step Effective Diffusion Network for Real-World Image Super-Resolution},
  author={Wu, Rongyuan and Sun, Lingchen and Ma, Zhiyuan and Zhang, Lei},
  journal={arXiv preprint arXiv:2406.08177},
  year={2024}
}
statistics

visitors

osediff's People

Contributors

cswry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osediff's Issues

About the implementation of the method

Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.

Hi,
Thank you for wonderfull work.
On windows system, this error is present during code execution:

python test_osediff.py -i input -o output --osediff_path preset/models/osediff.pkl --pretrained_model_name_or_path preset/models/stable-diffusion-2-1-base/ --ram_ft_path preset/models/DAPE.pth --ram_path preset/models/ram_swin_large_14m.pth
Traceback (most recent call last):
  File "C:\Users\Miki\OSEDiff\test_osediff.py", line 68, in <module>
    DAPE = ram(pretrained=args.ram_path,
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 329, in ram
    model = RAMLora(**kwargs)
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 109, in __init__
    self.tokenizer = init_tokenizer()
  File "C:\Users\Miki\OSEDiff\ram\models\utils.py", line 132, in init_tokenizer
    tokenizer = BertTokenizer.from_pretrained('/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased', local_files_only=True)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\tokenization_utils_base.py", line 1770, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.



as a temporarily solution I use utils.py from your previous repo:
https://github.com/cswry/SeeSR/blob/main/ram/models/utils.py

And everything works. :)

fp16inference + tile option?

Is it possible to implement?
I can upscale images only up to 640x480pixels. (PC specs RTX 3090 24GB, 64GB RAM, Ryzen 7950X)
Thanks

Output results not as good as SeeSR?

Thought the output should be better, or at least the same but faster? Tried with many images that I tested SeeSR with and all results with OSEDiff are worse. :(

Input:
image

SeeSR: (Using SD-Turbo)
image

OSEDiff:
Star Trek Ds9 - 6x13 - Far Beyond The Stars x264 ac3 03
Facial details are not very good and texture of the wall is gone. Only the gray part of the outfit is much better. Not sure why it isn't as good?

Thank you for your hard work on this, just not understanding why it isn't as good as SeeSR. It is faster however!

Calculate the metrics problem

I read the quantitative comparison in your paper, why are some of the evaluation metrics of the comparison method different from those in the original paper, such as the FID score of the PASD model, and the values are not a little different, but an order of magnitude difference. I found that you all use pyiqa to calculate the metrics, so why are the calculated results so different?

Whether release training code or not?

Thanks for your wonderful work!!!
The balance of effectiveness and efficiency of OSEDiff is very shocked!!!
I trained our network with vsd loss, but maybe my training code have some difference with you, the results are terrible, so would you release training code in the future?
Hope your reply!

AttributeError: 'UNet2DConditionModel' object has no attribute 'add_adapter'.

i have a question that when i run test_osediff.py, an error occurred: AttributeError: 'UNet2DConditionModel' object has no attribute 'add_adapter'. Did you mean: 'set_adapters'?
according to the error log, the error line is self.unet.add_adapter(lora_conf_encoder, adapter_name="default_encoder") in the following method:
def load_ckpt(self, model):
# load unet lora
lora_conf_encoder = LoraConfig(r=model["rank_unet"], init_lora_weights="gaussian",
target_modules=model["unet_lora_encoder_modules"])
lora_conf_decoder = LoraConfig(r=model["rank_unet"], init_lora_weights="gaussian",
target_modules=model["unet_lora_decoder_modules"])
lora_conf_others = LoraConfig(r=model["rank_unet"], init_lora_weights="gaussian",
target_modules=model["unet_lora_others_modules"])
self.unet.add_adapter(lora_conf_encoder, adapter_name="default_encoder")
self.unet.add_adapter(lora_conf_decoder, adapter_name="default_decoder")
self.unet.add_adapter(lora_conf_others, adapter_name="default_others")

why the UNet2DConditionModel object does not define the 'add_adapter' method? My diffusers version is 0.25.0.
please help, thanks!

Question about MANIQA Metric

Great job! But when I reproduced it, I found that there was a significant difference in the MANIQA metric. May I ask which version of MANIQA you are using? Others,could you provide the code for the metric testing.

OSEDiff video upscaling script

Hi. I made a batch video upscaling script using OSEDiff. The script assumes the models are in the default location (you can specify them at runtime or edit the path as needed). The script loads video files from the input folder, extracts frames, upscales them using OSEDiff, then merges the frames and the original audio file into an output video file in the output directory and then deletes the temporary files. Keep in mind that the ffv1 codec is used with maximum quality so the output file can be larger. You can change the codec yourself if you wish. To run the script you need ffmpeg and the added path variable in the variables so that ffmpeg can be executed directly.

The results are not perfect because the model is not intended for video files, but it can help others test the script. Cheers

https://pastebin.com/XdNZAKr3

关于网络使用的问题

您好,感谢您的贡献。想咨询下使用超分网络做domain transfer任务是否合适,比如将夜间的图像转换成白天,或者将A手机拍摄的图片转换成B手机

questions about the output.

Hello, I'm sorry to bother you. When I use landscape or flower images, there is no output result.

For human images, the colors of the eyes and lips become vivid. Is there any way to adjust this?
Capture
Capture2

Is Lora training necessary?

Is Lora training necessary? What would happen if it were changed to full parameter fine-tuning? How do you view this

SDXL-Turbo?

Will OSEDiff support SDXL-Turbo?

Thanks. Sorry for many questions

when trying to adapt to sdxl I am unable to properly add time_ids to UNet2DConditionModel

    elif self.config.addition_embed_type == "text_time":
        # SDXL - style
        if "text_embeds" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `text_embeds` to be passed in `added_cond_kwargs`"
            )
        text_embeds = added_cond_kwargs.get("text_embeds")
        if "time_ids" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `time_ids` to be passed in `added_cond_kwargs`"
            )
        time_ids = added_cond_kwargs.get("time_ids")
        time_embeds = self.add_time_proj(time_ids.flatten())
        time_embeds = time_embeds.reshape((text_embeds.shape[0], -1))
        add_embeds = torch.concat([text_embeds, time_embeds], dim=-1)
        add_embeds = add_embeds.to(emb.dtype)
        aug_emb = self.add_embedding(add_embeds)
        
      
        model_pred = self.unet(
            lq_latent,
            self.timesteps,
            encoder_hidden_states=prompt_embeds,
            added_cond_kwargs={
                "text_embeds": prompt_embeds,
                "time_ids": self.timesteps,
            },
        ).sample
        
        
        self.timesteps as placeholder as this is not the correct value

windows error:expected str, bytes or os.PathLike object, not NoneType

Hi anyone meets this error?
include_dir += [os.path.join(os.environ.get("CUDA_PATH"), "include")]
File "C:\Users\Demo-NT\AppData\Local\anaconda3\envs\OSEDIFF\lib\ntpath.py", line 104, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Some confusion about VSD loss implementation

Hi, thanks for your wonderful work~
I'm a little confused about the implemention of vsd loss,
I followed your paper and read ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
I thought vsd loss is pixel-wise grad by net to input LQ, hence it's pixel-wise calcalation between pretrained_regularizer's output and finetuned regularizer's output, however lpips and mse loss is a scalar, i'm really confused about the implementation of vsd loss and how to apply with data loss?
Hope for your reply~
ps: the pic is ProlificDreamer's implementation
6f55b7a1-8c61-4854-b97c-56a44f1524af

Reproduce the training process for multiple losses encounters "RuntimeError: Expected to mark a variable ready only once"

This is a very solid job, I reproduce the training process based on the paper and previous issues.

Among them, I use the following method to implement multiple losses, including loss_data, vsd_loss and loss_diff. Then by controlling whether the gradient is trainable, loss_diff only updates unet_reg.

`
loss_data_mse = F.mse_loss(output_image, pixel_values)
loss_data_lpips = lpips_loss(output_image, pixel_values).mean()

loss_data = loss_data_mse + 2*loss_data_lpips

w = (1 - alphas[timesteps])
w = w.view(bsz, 1, 1, 1)
grad = w*(noise_pred_reg_lora.detach()- noise_pred_reg_frozen)
grad = torch.nan_to_num(grad)

loss_vds = SpecifyGradient.apply(latents, grad)

loss_diff = F.mse_loss(noise_pred_reg_lora.float(), target.float(), reduction="mean")

loss_lora = loss_data + loss_vds

for param in unet_reg.parameters():
param.requires_grad = False

for param in unet_lora.parameters():
param.requires_grad = original_trainable_status['unet_lora'][param]
for param in vae.parameters():
param.requires_grad = original_trainable_status['vae'][param]

accelerator.backward(loss_lora, retain_graph=True)

for param in unet_reg.parameters():
param.requires_grad = original_trainable_status['unet_reg'][param]
for param in unet_lora.parameters():
param.requires_grad = False
for param in vae.parameters():
param.requires_grad = False

accelerator.backward(loss_diff)

if accelerator.sync_gradients:
params_to_clip = params_to_optimize
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)

if accelerator.sync_gradients:
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad(set_to_none=args.set_grads_to_none)

for param in unet_lora.parameters():
param.requires_grad = original_trainable_status['unet_lora'][param]
for param in unet_reg.parameters():
param.requires_grad = original_trainable_status['unet_reg'][param]
for param in vae.parameters():
param.requires_grad = original_trainable_status['vae'][param]

`

However, when training with accelerator’s DDP, I keep encountering the following error. Can anyone please guide me on how to solve it? Thank you very much.

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.

When will you release training codes?

I'm really curious about the effectiveness of VSD Loss, and I want to retrain this model for demonstration. It will be so kind of you to release the training code in the most early recent, thx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.