Code Monkey home page Code Monkey logo

Comments (47)

xichenpan avatar xichenpan commented on May 22, 2024 1

@KyonP Hi, our config is the same as https://github.com/xichenpan/ARLDM/blob/main/config.yaml
However, as discussed in this thread, it is quite weird the implementation in Pytorch Lighting cannot fully reproduce the result. I spend a few hours finding the difference between the two version. However, I have mad too much modification so that I cannot distinguish what cause the problem. You are welcome to refer to our original implementation https://github.com/xichenpan/ARLDM/tree/a24e2e94332eb86fcc071abb83aaf341006aa622, if you find some bug in our pytorch lightning implementation, I am more than happy to know!
Besides, we plan to release the trained weight so that you can use it to inference.

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy Hi, could you please provide some visual samples? and also the training config like how many epochs, freeze clip/blip/resnet or not.

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

btw, in our experiment, we just use the ckpt in 50 epochs, cause the loss is irrelevant with the fid score.

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

can I have you wechat?I tell you by email? [email protected]

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

image
This is my config

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

can I have you wechat?I tell you by email? [email protected]

@yangsenwxy Yeah, I am happy to chat. But I prefer to discuss in this thread because this is a new implementation and we haven't train a model using this repo. If we can figure out the problem, it may help a lot~

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

image This is my config

It seems the init_lr is too large, you may try our config of 1e-5. Also, could you provide your sample config, like scheduler and steps?

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

image
I will give you some exsample

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

image I will give you some exsample

You may follow the config given in https://github.com/Flash-321/ARLDM/blob/main/config.yaml (which is the config for the report performance.) The ddim scheduler with 50 steps provide a 2-3 higher FID score result. We suggest using ddim-6-250 or pndm-7.5-50 (this is faster and provide a 1-2 higher FID score)

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

image I will give you some exsample

Thanks a lot! Only the generated images is enough (caption is not related to FID score)

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

链接:https://pan.baidu.com/s/1s4WU9xdS7Qn_XO-O2kvwaA
提取码:n1dl
--来自百度网盘超级会员V5的分享

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

here

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

here

Copy! So I think it is a training issue, since the generated visual stories are unacceptable.

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy Have you ever try to train the model for 50 epochs using a init_lr of 1e-5?

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

I will try it

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

I will try it

@yangsenwxy Thanks! I have tried using a large lr, but it produce a bad performance when training for many epochs. You can also try to train the model for 5 epochs using a lr of 1e-4 and see if it works. I tried this setting in my early experiment and it can also generate reasonable results. Feel free to at me in this thread if there is an update or further issue!

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

Ok, let me try, do you mean smaller learning rate, smaller FID?

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

Ok, let me try, do you mean smaller learning rate, smaller FID?

@yangsenwxy Not actually, but large learning rate do not work well when you train the model really long.

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

@Flash-321 I used your exact same settings for the last 50 epoch and ran out a FID score of 25

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy yeah, and do you also use the default setting during sampling?

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

also, may I see your learning rate curve to find out if the scheduler work correctly?

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

OK,wait for me for half an hour

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

Sure!

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy Really sorry about a bug in our code. I just check our dataset implementation, and found that during immigranting to Pytorch Lightning code, we forget to add the normalization for training data.
Here is our internal implementation:
https://github.com/Flash-321/ARLDM/blob/a24e2e94332eb86fcc071abb83aaf341006aa622/ARLDM.py#L47-L52
Here is the pytorch lightning implementation in current branch:
https://github.com/Flash-321/ARLDM/blob/eb907e3717ac20f82dfba8e67fd55d95127de098/datasets/flintstones.py#L26-L31
This will do harm to the PTMs since Stable Diffusion is trained on these normalized images.

My sincere apologize for this issue. The FID score should be further improved after fixing this bug.

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy Really sorry about a bug in our code. I just check our dataset implementation, and found that during immigranting to Pytorch Lightning code, we forget to add the normalization for training data. Here is our internal implementation:

https://github.com/Flash-321/ARLDM/blob/a24e2e94332eb86fcc071abb83aaf341006aa622/ARLDM.py#L47-L52

Here is the pytorch lightning implementation in current branch:
https://github.com/Flash-321/ARLDM/blob/eb907e3717ac20f82dfba8e67fd55d95127de098/datasets/flintstones.py#L26-L31

This will do harm to the PTMs since Stable Diffusion is trained on these normalized images.
My sincere apologize for this issue. The FID score should be further improved after fixing this bug.

And this bug only happends in FlintstonesSV dataset, if you have done experiments in other datasets, the performance was not affected.

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

I discovered this bug a long time ago, and it has been corrected during the original training

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

@yangsenwxy yeah, and do you also use the default setting during sampling?

Yes

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

also, may I see your learning rate curve to find out if the scheduler work correctly?

企业微信截图_16754222313870

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy

also, may I see your learning rate curve to find out if the scheduler work correctly?

企业微信截图_16754222313870

Thanks, it looks correct.

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

Now,FID is 25.069521856381982

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

Now,FID is 25.069521856381982

Got it! Could you please post 1-2 visual samples on this thread? There are also several differences between the two implementation. Like in this version, we use stable diffusion v1.5 instead of v1.4, but I think it doesn't matter. We also use a gradient clip in our training process, I guess this may be a factor
https://github.com/Flash-321/ARLDM/blob/a24e2e94332eb86fcc071abb83aaf341006aa622/config/config_flintstones.json#L51-L66
maybe you can go through the training log and see if the gradient is too large in some step? If so, I guess adding this config may help:

trainer = Trainer(gradient_clip_val=1.0)

You can also check our original internal implementation (https://github.com/Flash-321/ARLDM/tree/a24e2e94332eb86fcc071abb83aaf341006aa622) to find some difference, we can discuss them in this thread.

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

Here
企业微信截图_16754238511020

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

2306
2307

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

Here ,some visual samples

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy Copy, your experiments help a lot, thanks for that! It looks like a training issue, the val loss should be around 0.11, and the visual samples are not reasonable and coherent.
I recommend add clip_norm and train for 5 epochs in a learing rate of 1e-4 (this can help reduce traning cost.), so that we can find out whether it is due to the large gradient.

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

OK

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

@Flash-321 I have added clip_norm as you suggested, but the FID is 24. 24.87433417204761

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

loss is here
image

from arldm.

yangsenwxy avatar yangsenwxy commented on May 22, 2024

2305
2308
2307

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@yangsenwxy Hi, thanks for your feedback, it seems the conditioning part is ruined, since each frame is not coherent. I will check the implementation soon and also ask my mentor to release the ckpt and training log for reference.

from arldm.

KyonP avatar KyonP commented on May 22, 2024

hello. I also have a difficult time reproducing your best-performing results on Pororo.

Can you share some config settings?

from arldm.

KyonP avatar KyonP commented on May 22, 2024

@xichenpan
I managed to achieve an FID score of around 18 with PL version of your code. It looks good enough for me right now 👍

I am looking forward to having your pre-trained weights.

BTW, is there any chance you will improve the inference speed of this code? it is very time-consuming.

from arldm.

xichenpan avatar xichenpan commented on May 22, 2024

@KyonP Great! I just asked my mentor, and he told me the release request have been approved by Alibaba, we will provide the pre trained weight this week!

from arldm.

yxding95 avatar yxding95 commented on May 22, 2024

Hey, I found the same issue. @xichenpan Did you try to save the images first and then evaluate the FID? Or you only use the FID calculation in the "main.py" code? Cause I found the Inception Network in the code seems to lose ".eval()", which would norm the features and get a lower FID.

from arldm.

TimandXiyu avatar TimandXiyu commented on May 22, 2024

@xichenpan Hi, I am having a hard time finding the released weight, and I am trying to do some follow-ups of your work. If you haven't upload them yet, can you share them via google drive or maybe other platforms?

from arldm.

kirbu123 avatar kirbu123 commented on May 22, 2024

@xichenpan, Are your checkpoints ready ? If possible, will you be ready to share them. Can you explain, how to learn ARLDM with one available CUDA index. How to beat CUDA out of memory error using CLIP, BLIP, RESNET freezing or other methodics.

from arldm.

kirbu123 avatar kirbu123 commented on May 22, 2024

Can somebody (who made training) share checkpoints, because I have CUDA out of memory, training the model.

from arldm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.