Comments (1)
The whole snippet that you quoted, plus the one line of code above it, is an implementation of Zhou et al.'s appearance flow algorithm. I reproduce the snippet in full below:
grid_change = torch.transpose(self.zhou_grid_change(y).view(n, 2, h * w), 1, 2).view(n, h, w, 2)
device = self.zhou_grid_change.weight.device
identity = torch.Tensor([[1, 0, 0], [0, 1, 0]]).to(device).unsqueeze(0).repeat(n, 1, 1)
base_grid = affine_grid(identity, [n, c, h, w])
grid = base_grid + grid_change
resampled = grid_sample(image, grid, mode='bilinear', padding_mode='border')
As a result, the affine_grid function does not corresponds to the implementation of the appearance flow algorithm. However, it is a part of that implementation as it is one of the six lines of the snippet above.
I think that you might be confused because you mentioned that "affine_grid [...] builds a Spatial Transformer Networks." This is not true. What affine_grid does is that it creates a flow field that acts like an affine transformation given as one of its argument. Jaderberg et al. have a part of their Spatial Transformer Networks learn to predict this argument. The flow field produced by affine_grid is then fed to grid_sample, which finally produces a resampled image for further processing. As a result, the Pytorch documentation for affine_grid says:
"This function is often used in conjunction with grid_sample() to build Spatial Transformer Networks ."
Note here that it is "used [...] to build Spatial Transformer Networks." So, it is a part of Spatial Transformer Networks. However, it does not "build" Spatial Transformer Networks.
Moreover, even if affine_grid is a part of Spatial Tarnsformer Networks, nothing prevents it from being used to build other networks, including Zhou et al.'s.
This brings us to the main difference between Zhou et al.'s paper and Jaderberg et al.'s paper. Zhou et al.'s predicts the whole flow field. However, Jaderberg et al.'s predicts the transformations (being affine ones or thin plate splines) that are used to create the flow field.
My snippet implements Zhou et al.'s algorithm because it also predicts the whole flow field. That is the "grid" variable. The way it computes the grid variable is not direct, and this might have caused your confusion. It starts with the "base_grid," which represents a flow field that copies all the pixel to the same location. The base_grid is created by calling affine_grid with a fixed transformation (the identity). The fact that this transformation is not learned means that my snippet does not implement a Spatial Transformer Network.
The main prediction is when it computes the grid_change variable on the first line. The grid_change acts as an offset to the base_grid, and so, after these two are added together, the whole flow field is computed. I did this because a lot of pixels in the input image (especially those of the character's body) would remain unchanged, so it would be easier for the network to learn the offsets rather than creating the whole flow field from scratch.
I hope this answer your questions.
from talking-head-anime-demo.
Related Issues (20)
- Cannot use all the GPU resource HOT 3
- About the image requirements. HOT 1
- didn't work on CUDA ver. 10.2 or higher HOT 5
- How to replace the Cuda driver in the AMD Product. HOT 1
- maɔv in 'puppeteer' HOT 2
- Is there some ways to save the module as mtn module file HOT 3
- V2 Models? HOT 1
- mouth movement unclear HOT 1
- could you release your training dataset? HOT 1
- About building anime dataset using your tool? HOT 1
- RuntimeError : CUDA out of memory. HOT 1
- Was this project demonstrated on Nvidia GTC 2021? HOT 1
- Can’t upload photo. HOT 1
- mouth ratio
- customed source image HOT 3
- why Bad results using colab? HOT 1
- Puppeteer Runtime Error HOT 3
- can't unloade photo HOT 14
- torch.device Metal Performance Shaders error HOT 2
- Project webpage error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from talking-head-anime-demo.