nota-netspresso / bk-sdm Goto Github PK
View Code? Open in Web Editor NEWA Compressed Stable Diffusion for Efficient Text-to-Image Generation [ICCV'23 Demo] [ICML'23 Workshop]
License: Other
A Compressed Stable Diffusion for Efficient Text-to-Image Generation [ICCV'23 Demo] [ICML'23 Workshop]
License: Other
Hi there!
I'd like to ask, do you have or plan to have support for the SDXL model? It's quite heavy and the process of making it more fast and lightweights would have insane benefits to the community.
Thanks for your work!
Could the author share the code for calculating the model parameters(Param.) and the model computational complexity(MACs) of the pipeline. very thank you!
Hi there,
It's me again, I am curious about whether you guys tried different combination of lambda for feat_loss and out_loss or maybe add a lambda for the task_loss?
From my training process, it seems that the feat_loss contributes most part of the total loss.
i download the stable-diffusion-v1-4 ckpt in compvis,but still have this problem, i have triied to install transformers==4.25 4.27 and so on,but didn't work, this is the error details
bash scripts/kd_train_toy.sh
The following values were not passed to accelerate launch
and had defaults used instead:
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir
is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir
instead.
warnings.warn(
./results/toy_bk_small/log_loss.csv
03/11/2024 21:34:33 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
Traceback (most recent call last):
File "src/kd_train_text_to_image.py", line 914, in
main()
File "src/kd_train_text_to_image.py", line 429, in main
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/diffusers/schedulers/scheduling_utils.py", line 139, in from_pretrained
config, kwargs, commit_hash = cls.load_config(
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/diffusers/configuration_utils.py", line 331, in load_config
raise EnvironmentError(
OSError: Error no file named scheduler_config.json found in directory CompVis/stable-diffusion-v1-4.
Traceback (most recent call last):
File "/home/lzj/miniconda3/envs/bk-sdm/bin/accelerate", line 8, in
sys.exit(main())
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/commands/launch.py", line 923, in launch_command
simple_launcher(args)
File "/home/lzj/miniconda3/envs/bk-sdm/lib/python3.8/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/lzj/miniconda3/envs/bk-sdm/bin/python', 'src/kd_train_text_to_image.py', '--pretrained_model_name_or_path', 'CompVis/stable-diffusion-v1-4', '--train_data_dir', '/home/lzj/work/data/preprocessed_11k', '--use_ema', '--resolution', '512', '--center_crop', '--random_flip', '--train_batch_size', '2', '--gradient_checkpointing', '--mixed_precision=fp16', '--learning_rate', '5e-05', '--max_grad_norm', '1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--report_to=all', '--max_train_steps=20', '--seed', '1234', '--gradient_accumulation_steps', '4', '--checkpointing_steps', '5', '--valid_steps', '5', '--lambda_sd', '1.0', '--lambda_kd_output', '1.0', '--lambda_kd_feat', '1.0', '--use_copy_weight_from_teacher', '--unet_config_path', './src/unet_config', '--unet_config_name', 'bk_small', '--output_dir', './results/toy_bk_small']' returned non-zero exit status 1.
[Question]
I have another question.
I split the LAION-aesthetic V2 5+ dataset into several subsets, e.g., 5M, 10M, 89M, etc, and I made metadata.csv for each subset.
Then, when I tried to train with multi-gpus using the subset dataset, I faced the below error.
I guess that the problem was caused by the data itself.
FYI, I didn't pre-process the data except for resolution (512x512) when I downloaded data.
Did you also face this problem?
Or did you conduct any pre-processing of the LAION data??
Steps: 0%| | 283/400000 [35:52<813:24:06, 7.33s/it, kd_feat_loss=58.6, kd_output_loss=0.0447, lr=5e-5, sd_loss=0.185, step_loss=58.9]
Traceback (most recent call last):
File "/home/user01/bk-sdm/src/kd_train_text_to_image.py, line 1171, in
main()
File "/home/user01/bk-sdm/src/kd_train_text_to_image.py", line 961, in main
for step, batch in enumerate(train_dataloader):
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/accelerate/data_loader.py", line 388, in iter
next_batch = next(dataloader_iter)
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch
data = self.dataset.getitems(possibly_batched_index)
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2715, in getitems
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2715, in
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
File "/home/user01/anaconda3/envs/kd-sdm/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2715, in
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
IndexError: index 63 is out of bounds for dimension 0 with size 63
Hi there,
I am trying to distill the Unet in SD inpainting 1.5 to a smaller Unet by using your code. (I replaced the pipeline to inpainting and the input data)
I have trained for 130K steps with batch size 64.
Right now the kd_feat_loss is around 20.
I am wondering what kd_feat_loss you have when you finished distill the Unet in your experiment?
Thank you.
To incorporate the below feature
In addition, the base training script src/kd_train_text_to_image.py logs only the total loss to W&B and one may be interested in each particular contribution. I added image logging to W&B as well.
Hi, I'm really impressed by your work and nice code.
When I ran the training code with multi-gpu setting, I encountered this error.
Traceback (most recent call last):
File "/home/user01/BK-SDM/src/kd_train_text_to_image.py", line 891, in
main()
File "/home/user01/BK-SDM/src/kd_train_text_to_image.py", line 766, in main
a_stu = acts_stu[m_stu]
KeyError: 'up_blocks.0'
Could you check this?
Thanks in advance :)
pip install -U datasets
This solves the issue of loading the data.
Thanks for your great work. May I ask a question about the GPU mermory? You write
A toy script can be used to verify the code executability and find the batch size that matches your GPU. With a batch size of 8 (=4×2), training BK-SDM-Base for 20 iterations takes about 5 minutes and 22GB GPU memory.
With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory.
That is about batch size increase about 32x (from 2 to 64), but gpu memory only inscrease less than 3x (from 22G to 53G). Why the gpu memory is so saving? Does the diffusers more gpu efficient than pytorch-lightning (sd v1.5 used)?
Thanks very much
Hi @bokyeong1015 , thanks for your great work!
I modified diffusers/train_text_to_image.py and used your fine-tuning strategy: on 212k subset of laion. But when I run the training code, loading dataset will consume too much time and there is no response in the terminal after even 40 minutes.... Is it caused by the large number of images or some bugs in my code?
# In distributed training, the load_dataset function guarantees that only one local process can concurrently
if args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
dataset = load_dataset(
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
data_dir=args.train_data_dir,
)
else:
data_files = {}
if args.train_data_dir is not None:
data_files["train"] = os.path.join(args.train_data_dir, "**")
print("*** load dataset: start")
t0 = time.time()
dataset = load_dataset(
"imagefolder",
# data_files=data_files,
cache_dir=args.cache_dir,
split="train",
data_dir=args.train_data_dir,
)
print(f"*** load dataset: end --- {time.time()-t0} sec")
# See more about loading custom images at
# https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder
# Preprocessing the datasets.
# We need to tokenize inputs and targets.
# column_names = dataset["train"].column_names
##############################################################################################
column_names = dataset.column_names
image_column = column_names[0]
caption_column = column_names[1]
###################################################################################################
This is the loading dataset code. How much time will 'load_dataset' function cost?
Thanks for your great work, looking forward to your reply!
Best wishes,
Qianli
We note the readme show training BK-SDM-Base need 50K interations, while we find in the "kd_train.py" show --max_train_steps=400K , so can we think the 50K is good enough?
Thanks for the generosity of open sourcing your work, but there was a previous work similar to yours, called Snapfusion, aimed at speeding up Stable diffusion.
From the results of their paper, they achieved better results through efficient-unet and step distillation, but unfortunately this work is not open source.
Do you have any opinion on this work? https://snap-research.github.io/SnapFusion/
thanks for your paper and code. my question is how about the model performance when i not use the eam option. it means i didn't pass the option "--use_ema"
To incorporate the below feature
The original src/generate.py generates images one-by-one which leads to under utilization of GPU and as consequence, generation of 30k images takes a while. I've added batched generation of images to speed-up it.
[Inquiry]
hi, I tried this method, but found that the performance was very poor. My experimental configuration was to train on laion_11k data for 10k steps, and the unet is bk_tiny. And I also replaced the pipeline to inpainting and the input data. I would like to ask you for any good suggestions, thanks.
1 we can download the 212K dataset by
https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/data/improved_aesthetics_6.5plus/preprocessed_212k.tar.gz
but the 2.3M dataset cannot
https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/data/improved_aesthetics_6.5plus/preprocessed_2256k.tar.gz
2 we also try "bash " method
bash scripts/get_laion_data.sh preprocessed_2256k
batch size is 64 (256=4x64), train BK-SDM-Base by single A100 for 50K iteractions takes about 300 hours
batch size is 16 (64=4x16), train BK-SDM-Base by single A100 for 50K iteractions takes about 60 hours ???
in fact ,it is 600 hours??
Hi! I ran this line of code to generate samples to compute FID:
!python3 src/generate.py --model_id nota-ai/bk-sdm-base --save_dir ./results/bk-sdm-base
Then I got this error:
0/30000 | COCO_val2014_000000000042.jpg **A small dog is curled up on top of the shoes** | 25 steps
Total 751.9M (U-Net 579.4M; TextEnc 123.1M; ImageDec 49.5M)
100% 25/25 [00:03<00:00, 8.14it/s]
Traceback (most recent call last):
File "/content/BK-SDM/src/generate.py", line 53, in <module>
img = pipeline.generate(prompt = val_prompt,
File "/content/BK-SDM/src/utils/inference_pipeline.py", line 34, in generate
out = self.pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 706, in __call__
do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]
TypeError: 'bool' object is not iterable
preprocessed_2256k
: 2,256,472 image-text pairs (182GB tar.gz; 204GB data folder)I found that the total number of iterations for the training is 400,000. May I ask, how many days does it take for you to train a distilled model? I use 8*V100, I found that I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).
Do you guys have any plan to fine-tune it for inpainting?
Thanks.
@bokyeong1015 hi thanks for sharing this wonderful work , i had few queries and request
Thanks n advanc
Could the authors share the code of producting heat map of Figure.8? I am very appreciate your nice work and kind help.
response to #10 (comment)
I want to conduct zero-shot MSCOCO evaluation for my intermediate checkpoint trained with multi-GPU setting, I'm not sure how to denote my checkpoint.
Could you give me some hints for this?
In your instruction(2), you enter model_id.
Could I change the model_id to my checkpoint path?
However, I don't know which one should be denoted.
I guess the unet_ema/diffusion_pytorch_model.bin. Am I right?
Thanks in advance.
Hi folks!
Simply amazing work here 🔥
I am Sayak, one of the maintainers of 🧨 diffusers at HF. I see all the weights of BK-SDM are already diffusers-compatible. This is really amazing!
I wanted to know if there is any plan to also open-source the distillation pre-training code. I think that will be beneficial to the community.
Additionally, any plans on doing for SDXL as well?
Greetings!
these tiny models are amazing! i love fp16 versions,
could u please in the future make models that are based on 1.5 and mixed with uncensored models such as lyriel or deliberate for better face and anatomy?
kind regards
Hi,thanks for your great work!
I currently have an A100 GPU server that is not connected to the internet. I can configure the environment offline. **Can I replicate your work offline?**Could you please provide me with some guidance? Thank you.
Hi, thanks to your excellent work, I have conducted many experiments.
When I trained on a subset of LAION-aesthetic-5+ (about 89M pairs), my training process was killed without specific error message:(
Maybe it occurred at the load_dataset
.
I guess that the number of training sets is too big, but I'm not sure.
I think this problem may be caused by the huggingface's dataset library.
Have you ever faced this problem? and have you tried to train your model on much bigger training set?
Thanks in advance :)
Hi, thank you for sharing your awesome work
How to reproduce your Dreambooth quantitative performance in Table. 5?
Would you provide the evaluation code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.