Code Monkey home page Code Monkey logo

Comments (13)

williamyang1991 avatar williamyang1991 commented on May 30, 2024

Since our pSp uses Z+ latent code, while the original pSp uses W+ latent code,
when replacing our pSp with the original one, you should modify

img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True,
z_plus_latent=True, return_z_plus_latent=True, resize=False)

to

# the original pSp has no options of z_plus_latent and return_z_plus_latent
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False)

and modify

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

to

# input_is_latent=True indicates that the input content code is in W+ space
img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent=False, truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

Hope this can solve your problem.

from dualstylegan.

williamyang1991 avatar williamyang1991 commented on May 30, 2024

By the way, you can set use_res=False in

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

This will return the recontructed content image. If it differs a lot from the original one,
you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator.

from dualstylegan.

GrigorySamokhin avatar GrigorySamokhin commented on May 30, 2024

Thanks for your fast reply! I will try and come back to you!

from dualstylegan.

GrigorySamokhin avatar GrigorySamokhin commented on May 30, 2024

It works, thanks!

from dualstylegan.

Grace-93 avatar Grace-93 commented on May 30, 2024

Could you please explain how should I replace the encoder except for the change above? Like if I want to use https://github.com/eladrich/pixel2style2pixel, should I download the whole repository and put it in the encoder directory? Sorry for this stupid question. I'm new to this field and don't have much experience.

from dualstylegan.

williamyang1991 avatar williamyang1991 commented on May 30, 2024

Could you please explain how should I replace the encoder except for the change above? Like if I want to use https://github.com/eladrich/pixel2style2pixel, should I download the whole repository and put it in the encoder directory? Sorry for this stupid question. I'm new to this field and don't have much experience.

My code just uses the encoder from https://github.com/eladrich/pixel2style2pixel

For new encoder, you should find all .py files that have from model.encoder.psp import pSp in the beginning.
And change the encoder (encoder = pSp(opts).to(device).eval()) with your new encoder.
And modify all the codes like _, stylecode = encoder(XXXX) in the file to match the way your encoder encoding an image.

from dualstylegan.

Grace-93 avatar Grace-93 commented on May 30, 2024

Thanks for your fast reply! But since your pSp uses Z+ latent code, isn't it different from the original one here https://github.com/eladrich/pixel2style2pixel? Or we can just use the encoder you provided and simply modify style_transfer.py as you implied above to use W+ latent code?

Below is the result I get when I try other encoders. I don't know where that male face comes from... Do you know what might be the reason?
Screen Shot 2022-12-11 at 10 19 49 PM

I saw you saying that "If it differs a lot from the original one, you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator." May you explain how to fix that? Thanks.

from dualstylegan.

williamyang1991 avatar williamyang1991 commented on May 30, 2024

All z+ codes will be transformed to w+ codes before sending to the generator
You just need to remove this transformation if you directly feed the w+ codes.
You need to check the code and find these transformations and made modifications.

for example, change

exstyle = generator.generator.style(latent.reshape(latent.shape[0]*latent.shape[1], latent.shape[2])).reshape(latent.shape)

to

exstyle = latent

And note that the options of input_is_latent and z_plus_latent

# style transfer
# input_is_latent: instyle is not in W space
# z_plus_latent: instyle is in Z+ space
# use_res: use extrinsic style path, or the style is not transferred
# interp_weights: weight vector for style combination of two paths
img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)
img_gen = torch.clamp(img_gen.detach(), -1, 1)
viz += [img_gen]

For w+ code, you should use input_is_latent=True and z_plus_latent=False

from dualstylegan.

Grace-93 avatar Grace-93 commented on May 30, 2024

Thanks for your clear explanation. Unfortunately, I'm still having the same male face. Below are all the changes I have made. Please let me know if I miss anything. (I'm just trying to use the same encoder in W+)

# reconstructed content image and its intrinsic style code
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True,
z_plus_latent=True, return_z_plus_latent=True, resize=False)

to
img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False) Now the 'instyle' should be w+ right?

exstyle = generator.generator.style(latent.reshape(latent.shape[0]*latent.shape[1], latent.shape[2])).reshape(latent.shape)

to
exstyle = latent This step is to remove the translation from z+ to w+ because the encoder is already returning w+.

img_gen, _ = generator([instyle], exstyle, input_is_latent=False, z_plus_latent=True,
truncation=args.truncation, truncation_latent=0, use_res=True, interp_weights=args.weight)

to
img_gen, _ = generator([instyle], exstyle, input_is_latent=True, z_plus_latent=False, truncation=args.truncation, truncation_latent=0, use_res=False, interp_weights=args.weight)
This indicates that the input content code is in W+ space. For this step, I also tried using StyleGAN without exstyle and truncation=1 but none of them change the result.

from dualstylegan.

williamyang1991 avatar williamyang1991 commented on May 30, 2024

I my previous answer, I just gives two example where you should modify but they are not the only codes to be modified.
You need to check all the codes about the encoder to make modification besides these two parts.

And you can also check the reconstruction before do stylization:

By the way, you can set use_res=False in

https://github.com/williamyang1991/DualStyleGAN/blob/39a9e9e8d6ea51a817ab50fbcf82d336275abf15/style_transfer.py#L126-L127

This will return the recontructed content image. If it differs a lot from the original one,
you might have fed W+ into DualStyleGAN, which by default accepts Z+ input and will falsely feed this W+ code into the mapping network before sending it to the generator.

from dualstylegan.

Grace-93 avatar Grace-93 commented on May 30, 2024

Thanks for your fast reply again! Theoretically, I think I understand what you said, but maybe because I just start to learn some CV this month so it's kind of hard for me to recognize it. I will look into the code more. Just want to confirm two more things:

  1. Is there any other file I need to modify other than "style_transfer.py"? To my understanding, the encoder is returning W+ as it is and all of the transformations between z+ and w+ are in "styke_transfer.py"?
  2. This line is the only line that is using the encoder right?
    # reconstructed content image and its intrinsic style code
    img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True,
    z_plus_latent=True, return_z_plus_latent=True, resize=False)

    And by changing it to img_rec, instyle = encoder(F.adaptive_avg_pool2d(I, 256), randomize_noise=False, return_latents=True, resize=False) shouldn't we already have the 'img_rec' and 'instyle' in W+? (this is the part that confuses me because the original encoder should return the reconstructed face in w+ here already and I'm not sure why the reconstructed face still differs a lot from the original one)
    Then because we have 'instyle' in w+ already, so we don't need to convert 'exstyle' (and other transformations that I couldn't find) to w+ again. We could directly feed the w+ code to the generator.

from dualstylegan.

williamyang1991 avatar williamyang1991 commented on May 30, 2024

if you only test the code, not train the model, then you only need to modify style_transfer.py and psp.py (change to your W+ encoder)

exstyle is always the W+ code when feeding into the generator.

You are suggested to read the code of

if return_latents:
if z_plus_latent and return_z_plus_latent:
return images, codes
if z_plus_latent and not return_z_plus_latent:
return images, result_latent
else:
return images, result_latent

It's much clear than my explanation.

I --> z+ --> w+ --> img_rec
The difference between return_z_plus_latent is whether return z+ or w+, they all lead to the same img_rec.
If you use my encoder, you should always use z_plus_latent=True, this is for the generator.

Finally, I cannot debug for you.

from dualstylegan.

Grace-93 avatar Grace-93 commented on May 30, 2024

Thanks for your reply. I will go back and check the encoder. Sorry for making you debug for me though I don't mean to. Again, thanks a lot for your patient and help!

from dualstylegan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.