Comments (8)
Okay, thank you for the suggestion. I will give it a try.
from arldm.
@SaulZhang Hi, thanks for your inquiry! For the first question, since the stable diffusion PTM is for 512*512, so I guess the performance will drop, I recommend you to try fp16 training with 512 * 512 resolution. For the second quesiton, the answer is yes! you can also increase the batch size. We use single GPU to avoid sample a same case for multiple times, thereby affect FID score. (for normal experiment, you may feel free to use multiple GPU and large batch size)
from arldm.
Thanks for your reply. I have tried setting precision=16
in Trainer and also setting freeze_clip/freeze_blip/freeze_resnet=True
. Unfortunately, these changes only allow a single A100 40G GPU to handle a maximum resolution of about 470x470. If I don't freeze the weights of stable diffusion, could it reduce the impact of image resolution?
from arldm.
@SaulZhang Hi, I guess it may work, or you can also try to enable gradient checkpointing to save vram. It seems to be already implemented by Diffusers.
from arldm.
@Flash-321 Hello,
I have set the image size to 256x256
and performed story continuation experiments on three datasets. Although the generated images look satisfactory, I'm confused as to why the calculated FID Score exceeds 400
. This appears to be quite unreasonable, and it's consistent across all three datasets. Additionally, I have attached some of the stories that were generated on three datasets. Could you help me understand why the FID Score is so high?
from arldm.
@SaulZhang Hi, do you calculate the FID score across the whole dataset, or only a subset?
from arldm.
I calculate the FID score across the whole testing set, and don't modify the code of sample.
from arldm.
After thorough debugging, I've identified that the main issue lies within this particular line of code.
original_images = [Image.fromarray(im, 'RGB') for im in original_images]
And the correct code should be as follows:
original_images = [Image.fromarray(im.transpose(1,2,0), 'RGB') for im in original_images]
from arldm.
Related Issues (20)
- best fid score? HOT 47
- How long will the sample progress end? HOT 8
- Error
- updating Stable Diffusion to 2.1? HOT 3
- Regarding the data of the VIST Dataset HOT 1
- source images contain not only the first image? HOT 8
- Char-F1 and F-Acc score HOT 1
- is there anyone who runs the code in kaggle?
- StoryDALL-E results HOT 3
- Is the generation text guided? HOT 9
- Training issue. HOT 2
- Implementation about classifier free guidance HOT 8
- Adaptive AR-LDM
- How to scale the model parameters to fit into reasonable GPUs HOT 7
- Training Cannot Start HOT 1
- a problem about google drive
- hello
- License of the codebase
- LinearWarmupCosineAnnealingLR import issue HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arldm.