cswry / seesr Goto Github PK
View Code? Open in Web Editor NEW[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
License: Apache License 2.0
[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
License: Apache License 2.0
Why disable shuffle in train dataloader? It is strange since random shuffle in training is standard practice. Furthermore, I try to finetune SeeSR in my data, the results become even worse than pretrained SeeSR, I use same training as yours, except with my own data.
hello
i followed your instructions for installation however when i got to run python gradio_seesr.py
i get the following:
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu)
Python 3.8.10 (you have 3.8.18)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Traceback (most recent call last):
File "gradio_seesr.py", line 19, in <module>
from pipelines.pipeline_seesr import StableDiffusionControlNetPipeline
File "E:\AI\SeeSR\pipelines\pipeline_seesr.py", line 25, in <module>
from torchvision.utils import save_image
ModuleNotFoundError: No module named 'torchvision'
i tried reinstalling xformers, install torchvision, upgrading pytorch, upgrading python to 3.8.18 and also to 3.10.11, also after doing some of the steps i mentioned, it starts to ask me to install missing modules and so i install them and i keep installing until i get to a point where it say torch.cuda.is_available()
should be true but it is false and then i couldn't do anything after that. (of course all of this with the conda environment is active)
i have a 4080 desktop pc and i use other ai interfaces such as automatic1111 and comfyui without an issue.
i am using Miniconda3-latest
I appreciate all the help you can give to enable me to run the demo locally.
I encountered the following problems when running the webui of sd-turbo:
error : OSError: Incorrect path_or_model_id: 'preset/models/models--stabilityai--sd-turbo/snapshots/1681ed09e0cff58eeb41e878a49893228b78b94c/feature_extractor'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
The screenshot of the sd-turbo file on huggingface is as follows, there is no feature_extractor folder
感谢你精彩的工作。我尝试微调SeeSR。由于显存限制,我将图片大小缩小为256×256。在单卡上使用
python train_seesr.py ##省略参数
会占据23GB的显存,可以在4090上运行。但当我尝试多卡
CUDA_VISIBLE_DEVICES="0,1" accelerate launch train_seesr.py ##省略参数
时总是发生OOM错误。我已经仔细检查输入tensor的形状,确保与单GPU时一致,但是找不到原因。感谢您的帮助!
The work you've done on this article is truly commendable, providing me with a wealth of inspiration and insight. I have a question. Regarding the "TCA module" discussed in your article, I haven't been able to locate the exact position and module within the code. Could you assist me in identifying its location?
Hello
I followed the Quick Inference however when I got to run python test_seesr.py I get the following:
Traceback (most recent call last):
File "test_seesr.py", line 268, in <module>
main(args)
File "test_seesr.py", line 167, in main
pipeline = load_seesr_pipeline(args, accelerator, enable_xformers_memory_efficient_attention)
File "test_seesr.py", line 83, in load_seesr_pipeline
unet = UNet2DConditionModel.from_pretrained(args.seesr_model_path, subfolder="unet")
File "D:\Compiler\anaconda3\envs\seesrf\lib\site-packages\diffusers\models\modeling_utils.py", line 618, in from_pretrained
model_file = _get_model_file(
File "D:\Compiler\anaconda3\envs\seesrf\lib\site-packages\diffusers\utils\hub_utils.py", line 284, in _get_model_file
raise EnvironmentError(
OSError: Error no file named diffusion_pytorch_model.bin found in directory preset/models/seesr.
I checked the folder and found that the file was not in Google Drive either.
Great work on this - lots of improvements - this is a great, and memory efficient approach.
when I navigate to the sd-turbo model using the provided link there is no feature_extractor. I used the sd 2.1 feature extractor.
I think the below is a bug? Seems to be looking for unet in two places. I assume the correct path is args.seesr_model_path?
unet = UNet2DConditionModel.from_pretrained_orig(args.pretrained_model_path, args.seesr_model_path, subfolder="unet", use_image_cross_attention=True)
Very good results, but I am noticing color banding not present on the input images. Added "color banding" and "oil painting" to negative prompts, but it still appears. Happens even at 12 steps. Doesn't matter if I use "wavelet" "adain" or "nofix". Anything that can be done? Thank you.
Even on a color that is the same but different shades:
Also, so I don't open another issue, was wondering if there are any ways to get the output to look closer to the input. The upscale looks 95% close, and much cleaner, but still some slight differences happen. Basically I am wondering if it is possible to get 99% close to a 1:1 super resolution of the input image but with the enhanced clarity and details.
Sometimes areas of the input that are out of focus on purpose, get forced into focus, any way to prevent that?
Hello!
When will the code be released?
thanks
hi,I want know how you make the seesr train data. about epoch and added face ratio.
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory C:/SeeSR-main/preset/models/stable-diffusion-2-base.
python utils_data/make_paired_data.py
--gt_path PATH_1 PATH_2 ...
--save_dir preset/datasets/train_datasets/training_for_dape
--epoch 1
这里保存目录是不是应该改成preset/datasets/train_datasets/training_for_seesr,否则会覆盖DPAE的数据
我按照作者实验设置,将DIV2K,Flickr2K,FFHQ1w张,OST,共计2.3w张做成配对图片后放到DAPE中微调,设置也按照dape.yaml的配置做的,然后模型收敛不了(l_logits在0.5左右),放在推理部分也不能产生有效标签信息(全是null)。想问问大家是怎么解决收敛问题的?
The model imports the module ‘triton’, but this module only has a Linux version and is not compatible with Windows or Mac. Has anyone succeeded in running it on Win10? What should I do?
Error information:
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\SeeSR\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available
from xformers.triton.softmax import softmax as triton_softmax # noqa
File "D:\ProgramData\anaconda3\envs\SeeSR\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>
import triton
ModuleNotFoundError: No module named 'triton'
Trying to run with 8GB VRAM
All models appear to load as expected and code runs up until the time the image is passed into the pipeline (ie right up to the inference point)
to avoid OOM issues have set:
--vae_decoder_tiled_size=64
--vae_encoder_tiled_size=512
--latent_tiled_size=40
--latent_tiled_overlap=2
Issue seems to be at the tokenizer:
Traceback (most recent call last):
File "/home/outsider/Desktop/coding/SeeSR/test_seesr.py", line 284, in
main(args)
File "/home/outsider/Desktop/coding/SeeSR/test_seesr.py", line 233, in main
image = pipeline(
File "/home/outsider/Desktop/coding/SeeSR/utils/vaehook.py", line 440, in wrapper
ret = fn(*args, **kwargs)
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/outsider/Desktop/coding/SeeSR/pipelines/pipeline_seesr.py", line 944, in call
prompt_embeds, ram_encoder_hidden_states = self._encode_prompt(
File "/home/outsider/Desktop/coding/SeeSR/pipelines/pipeline_seesr.py", line 356, in _encode_prompt
text_inputs = self.tokenizer(
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2561, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2667, in _call_one
return self.encode_plus(
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2740, in encode_plus
return self._encode_plus(
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 652, in _encode_plus
return self.prepare_for_model(
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3219, in prepare_for_model
encoded_inputs = self.pad(
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3024, in pad
encoded_inputs = self._pad(
File "/home/outsider/anaconda3/envs/sd2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3409, in _pad
encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference
OverflowError: cannot fit 'int' into an index-sized integer
你好,请问sd-turbo版本是需要重新训练controlNET吗?(时间步设置为2)还是说只需要把之前SD2.1训练好的用上去就行(只是更换SD的部分)
Hi,想请教一下基于sd-turbo的训练具体是怎么做的呢?我尝试过仅把sd2-base换成turbo,发现训出来的结果相比baseline要更模糊一些
为了方便,直接用中文了~
seesr相比PASD的改进是不是主要是在representation branch上(针对低质量图片做了训练),sd和controlnet部分差不多?
另外就是seesr没有像PASD那样在controlnet的输入上做显式地增强?
The config attributes {'sigma_max': None, 'sigma_min': None, 'timestep_type': 'discrete'} were passed to DDPMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
The config attributes {'dropout': 0.0, 'reverse_transformer_layers_per_block': None} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Do these two error messages have any impact on the output?
hi, I am training my own model, I want to load pretrain controlnet use "--controlnet_model_name_or_path", but there is an error:
Traceback (most recent call last):
File "train_seesr.py", line 1000, in
down_block_res_samples, mid_block_res_sample = controlnet(
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/accelerate/utils/operations.py", line 687, in forward
return model_forward(*args, **kwargs)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/accelerate/utils/operations.py", line 675, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/XXX//SeeSR-main/models/controlnet.py", line 766, in forward
sample, res_samples = downsample_block(
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/XXX//SeeSR-main/models/unet_2d_blocks.py", line 1238, in forward
hidden_states = attn(
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 315, in forward
hidden_states = block(
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/diffusers/models/attention.py", line 218, in forward
attn_output = self.attn2(
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 420, in forward
return self.processor(
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 948, in call
key = attn.to_k(encoder_hidden_states, scale=scale)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/diffusers/models/lora.py", line 224, in forward
out = super().forward(hidden_states)
File "/opt/conda/envs/seesr/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1232x1024 and 768x320)
Steps: 0%| | 0/8000 [00:02<?, ?it/s]
It's too slow for big size image. What can I do?
我有个问题,为什么在处理SeeSR数据的时候发现,文件夹创建了,但是图像并没有保存到对应文件夹中?
Originally posted by @aulaywang in #20 (comment)
我点击了HuggingFace连接,在里面下载了512-base-ema.ckpt的文件,也放在项目的preset/models/stable-diffusion-2-base文件夹下,但是为什么运行时会有一个缺少配置文件的错误
即:OSError: Error no file named scheduler_config.json found in directory preset/models/stable-diffusion-2-base.
我在HuggingFace连接里面也没有找到关于scheduler_config.json的文件啊
Hi, I use 500 human face images(only contain 6 id) to train this model, but when I test the trained data, the result is strange, I test using A people, but the result is B people,why?
As it was done with SD-Turbo, could it also be done with SUPIR as well?
Hello, thank you for sharing the code of SeeSR!
When I read it, I found it did not seem to perform cross attention between the 'ram_encoder_hidden_states' and the resnet output during training.
The screen shot is as follows, please give me some advice.
From your command
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7," accelerate launch train_seesr.py \ --pretrained_model_name_or_path="preset/models/stable-diffusion-2-base" \ --output_dir="./experience/seesr" \ --root_folders 'preset/datasets/training_datasets' \ --ram_ft_path 'preset/models/DAPE.pth' \ --enable_xformers_memory_efficient_attention \ --mixed_precision="fp16" \ --resolution=512 \ --learning_rate=5e-5 \ --train_batch_size=2 \ --gradient_accumulation_steps=2 \ --null_text_ratio=0.5 --dataloader_num_workers=0 \ --checkpointing_steps=10000
,
there's no ----use_ram_encoder
.
Thus,
use_image_cross_attention
will be False
.
While forwarding,
it will skip to else
, omitting the image_encoder_hidden_states
.
Please give me some instructions, thanks!
In the utils_data/make_tags.py, why not use the lora finetuned model? I found in the training, you just encode the tag directly, and did not make any change.
Hi. Thank you for your wonderful project.
However I dhave not success to run it:
C:\Users\mikazmaj\SeeSR\pipelines\pipeline_seesr.py:42: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead.
from diffusers.pipeline_utils import DiffusionPipeline
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
Loading default thretholds from .txt....
--------------
preset/models/ram_swin_large_14m.pth
--------------
Traceback (most recent call last):
File "C:\Users\mikazmaj\SeeSR\test_seesr.py", line 265, in <module>
main(args)
File "C:\Users\mikazmaj\SeeSR\test_seesr.py", line 168, in main
model = load_tag_model(args, accelerator.device)
File "C:\Users\mikazmaj\SeeSR\test_seesr.py", line 125, in load_tag_model
model = ram(pretrained='preset/models/ram_swin_large_14m.pth',
File "C:\Users\mikazmaj\SeeSR\ram\models\ram_lora.py", line 325, in ram
model, msg = load_checkpoint_swinlarge(model, pretrained, kwargs)
File "C:\Users\mikazmaj\SeeSR\ram\models\utils.py", line 296, in load_checkpoint_swinlarge
raise RuntimeError('checkpoint url or path is invalid')
RuntimeError: checkpoint url or path is invalid
Hello, thank you for sharing the code of SeeSR! That's an amazing work indeed!
When I tried to calculate the metric FID using the code proposed in basicsr/metrics/fid.py
, the result was quite confusing.
Here is how I calculate the metric FID. If any problem, feel free to point out.
First, I defined a function which outputs a list composed of all images.
Then, I processed SR images and GT images, respectively, using inception_v3
to extract features and calculating FID.
As for the results, when the param normalize_input
set to False, I got FID = 118.71415109991415
.
And when the param normalize_input
set to True, I got FID = 126.1874280351401
.
Both of them are quite higher than the paper mentioned, this makes me realize which step seems to have gone wrong.
Could you please give me some advice? And whether the param normalize_input
should be set to True or not?
Thanks a lot!
Excellent work!!, I follow the README to infer results, however, I found the MANIQA is lower than the original paper (0.5050 vs 0.6198). I would greatly appreciate it if you could provide the calculation code.
The input to vae.encode is 'pixel_values' [2, 3, 512, 512], however, the output 'latents' is [2, 4, 64, 64]. Why is the channel dimension different?
你给的链接里面没有dape模型啊
Hellow !
I follow the following settings, and I used the NVIDIA GeForce RTX 3090 (24GB) to run the trianing code. However, I met the problem of cuda out of memory. Is it because the VRAM of the 3090ti graphics card is insufficient for training?
CUDA_VISIBLE_DEVICES="0," accelerate launch train_seesr.py
--pretrained_model_name_or_path="preset/models/stable-diffusion-2-base"
--output_dir="./experience/seesr"
--root_folders 'preset/datasets/train_datasets/training_for_seesr'
--ram_ft_path 'preset/models/DAPE.pth'
--enable_xformers_memory_efficient_attention
--mixed_precision="fp16"
--resolution=512
--learning_rate=5e-5
--train_batch_size=1
--gradient_accumulation_steps=2
--null_text_ratio=0.5
--dataloader_num_workers=0
--checkpointing_steps=10000
SeeSR-main/test_seesr.py", line 125, in load_tag_model
model = ram(pretrained='preset/models/ram_swin_large_14m.pth',
SeeSR-main/ram/models/ram_lora.py", line 319, in ram
model = RAMLora(**kwargs)
SeeSR-main/ram/models/ram_lora.py", line 107, in init
self.tokenizer = init_tokenizer()
SeeSR-main/ram/models/utils.py", line 131, in init_tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
python3.10/site-packages/transformers/tokenization_utils_base.py", line 1785, in from_pretrained
raise EnvironmentError( OSError: Can't load tokenizer for 'bert-base-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer.
我下载了整个bert-base-uncased目录中的所有文件和ram_swin_large_14m.pth,但一直报这个错!不知道是什么问题
Hello! Your work is excellent! In the process of replicating the experiment, I loaded the pre-training file on google drive, but the loading failed, I want to know how to solve it, thank you!
Traceback (most recent call last):
File "test_seesr.py", line 265, in
main(args)
File "test_seesr.py", line 167, in main
pipeline = load_seesr_pipeline(args, accelerator, enable_xformers_memory_efficient_attention)
File "test_seesr.py", line 83, in load_seesr_pipeline
unet = UNet2DConditionModel.from_pretrained(args.seesr_model_path, subfolder="unet")
File "/root/miniconda3/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 646, in from_pretrained
raise ValueError(
ValueError: Cannot load <class 'models.unet_2d_condition.UNet2DConditionModel'> from preset/models/seesr because the following keys are missing:
The seesr-model_path is ''Seesr-main/preset/models/seesr''
I don't know if it's because of the safetensor file format.The command line parameters are the same as those in readme.
Thanks for your wonderful work! I'm encountering an error when trying to load the 'unet' or 'controlnet' models. The error mentions that it couldn't connect to 'https://huggingface.co/' to load the model, even though I believe my internet connection is working fine. Do you have any suggestions on how to fix this issue?
The error message is as follows:
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like preset/models/seesr is not the path to a directory containing a config.json file. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.
Thanks for your greatwork. I put the weight in the dir like this:
/SeeSR-main/preset/models
--DAPE.pth
--seesr
--stable-diffusion-2-base
And run the command:
python test_seesr.py
--pretrained_model_path preset/models/stable-diffusion-2-base
--prompt None
--seesr_model_path preset/models/seesr
--ram_ft_path preset/models/DAPE.pth
--image_path preset/datasets/test_datasets
--output_dir preset/datasets/output
--start_point lr
--num_inference_steps 50
--guidance_scale 5.5
--process_size 512
However, it turns out to be OSError:Error no file named diffusion_pytorch_model.bin found in directory preset/models/seesr.
How can i fix this problem? Thanks a lot.
The default setting is 1; is this sufficient for fine-tuning the model?
I get this error when trying to use from_pretrained_orig()
:
File "C:\Stuff\Apps\SeeSR\gradio_seesr_turbo.py", line 48, in <module> unet = UNet2DConditionModel.from_pretrained_orig(seesr_model_path, subfolder="unet") TypeError: UNet2DConditionModel.from_pretrained_orig() missing 1 required positional argument: 'seesr_model_path'
Changed it to from_pretrained()
and it appears to work fine, however the turbo output is blurrier than I would have expected.
https://imgsli.com/MjY3ODY4/0/1
Is the blurriness just the nature of the turbo model or is it because I'm loading something wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.