Code Monkey home page Code Monkey logo

Comments (1)

pkhungurn avatar pkhungurn commented on May 17, 2024 1

The whole snippet that you quoted, plus the one line of code above it, is an implementation of Zhou et al.'s appearance flow algorithm. I reproduce the snippet in full below:

    grid_change = torch.transpose(self.zhou_grid_change(y).view(n, 2, h * w), 1, 2).view(n, h, w, 2)
    device = self.zhou_grid_change.weight.device
    identity = torch.Tensor([[1, 0, 0], [0, 1, 0]]).to(device).unsqueeze(0).repeat(n, 1, 1)
    base_grid = affine_grid(identity, [n, c, h, w])
    grid = base_grid + grid_change
    resampled = grid_sample(image, grid, mode='bilinear', padding_mode='border')

As a result, the affine_grid function does not corresponds to the implementation of the appearance flow algorithm. However, it is a part of that implementation as it is one of the six lines of the snippet above.

I think that you might be confused because you mentioned that "affine_grid [...] builds a Spatial Transformer Networks." This is not true. What affine_grid does is that it creates a flow field that acts like an affine transformation given as one of its argument. Jaderberg et al. have a part of their Spatial Transformer Networks learn to predict this argument. The flow field produced by affine_grid is then fed to grid_sample, which finally produces a resampled image for further processing. As a result, the Pytorch documentation for affine_grid says:

"This function is often used in conjunction with grid_sample() to build Spatial Transformer Networks ."

Note here that it is "used [...] to build Spatial Transformer Networks." So, it is a part of Spatial Transformer Networks. However, it does not "build" Spatial Transformer Networks.

Moreover, even if affine_grid is a part of Spatial Tarnsformer Networks, nothing prevents it from being used to build other networks, including Zhou et al.'s.

This brings us to the main difference between Zhou et al.'s paper and Jaderberg et al.'s paper. Zhou et al.'s predicts the whole flow field. However, Jaderberg et al.'s predicts the transformations (being affine ones or thin plate splines) that are used to create the flow field.

My snippet implements Zhou et al.'s algorithm because it also predicts the whole flow field. That is the "grid" variable. The way it computes the grid variable is not direct, and this might have caused your confusion. It starts with the "base_grid," which represents a flow field that copies all the pixel to the same location. The base_grid is created by calling affine_grid with a fixed transformation (the identity). The fact that this transformation is not learned means that my snippet does not implement a Spatial Transformer Network.

The main prediction is when it computes the grid_change variable on the first line. The grid_change acts as an offset to the base_grid, and so, after these two are added together, the whole flow field is computed. I did this because a lot of pixels in the input image (especially those of the character's body) would remain unchanged, so it would be easier for the network to learn the offsets rather than creating the whole flow field from scratch.

I hope this answer your questions.

from talking-head-anime-demo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.