Code Monkey home page Code Monkey logo

Comments (2)

phizaz avatar phizaz commented on August 25, 2024 4

This is the question we have been pondering for so long as well. I don't pretend to have the full answer. The following is the best intuition I can give you.

DiffAE doesn't learn a "semantic" latent code. It learns a latent code that packs "the most bang for the buck" under the implicit bias of the decoder, in this case, a diffusion model.

There is no reason to believe that a semantic latent code, corresponding to human judgment, is the same as one that packs the most bang for the buck, due to the strong emphasis of the reconstruction loss.
The fact that we observed DiffAE learn a semantically meaningful latent code should be considered accidental, which may be due to the dataset we used, FFHQ256, which is heavily cropped and properly aligned. We should not expect this quality of DiffAE to transfer to other datasets in general.

What to do if I want to control what's kept in the latent code?
There are two ways I could think of: change the loss function to something more semantic, or using a different encoder.

Using a different loss function resulting in a different latent code is easy to understand, but finding a substitute for the reconstruction loss is hard to do.

A more doable way is through the encoder. Let's consider a CLIP encoder. It was trained with correspondence between images and texts. It tends to capture high-level abstract concepts due to the text information given to describe an image. The latent code from this CLIP encoder is expected to capture high-level concepts rather than precise appearances.

One could extend this intuition to another encoder as ArcFace. Since ArcFace is trained to be invariant to most things expect the face identity. The latent code from this encoder should be deprived of most information except identity, i.e., the decoded face may not share the same pose an the original face.

from diffae.

betterze avatar betterze commented on August 25, 2024

Thx for your reply. It really helps me a lot.

from diffae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.