anibali / dsntnn Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of DSNT
License: Apache License 2.0
PyTorch implementation of DSNT
License: Apache License 2.0
First of all, hats off for your effort on building and maintaining this. Keep up the good work.
My issue is when I try to jit.trace a model that uses this layer, I get an error similar to this one,
dsntnn.py:47: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return torch.linspace(first, last, length, device=device)
This also happens when trying to export the onnx model from a model that uses dsntnn, so basically a model that we try to export to onnx with a command like this, will give this trace warning, making it impossible to load the exported model.
torch.onnx.export(model, x, "deployment/ckpts/{0}.onnx".format(model_name), export_params=False, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
How to reproduce,
model = CoordRegressionNetwork(n_locations=2)
x = Variable(torch.randn(5, 3, 200, 200, requires_grad=True))
traced_script_module = torch.jit.trace(model, x)
DSNT support only 1 point in 1 heatmap?
How to get poses of multi people in one image?
If i use your module to predict keypoint , the output heatmap is not same as the other method, which i means the peak value is soo small . so do u know how to get a resonable confidence score ?
I would like to convert the model to ONNX format but, the operators flip and linspace are not supported. Is there a workaround for this?
Thanks
Installing dsntnn with pip throws this error:
Could not find a version that satisfies the requirement dsntnn (from versions: )
No matching distribution found for dsntnn
My pip3 version: 10.0.1
Python version: 3.5.2
Os: Mac OS X 10.13.5 High Sierra
Thanks for the paper and code!
I am doing pose estimation and face a problem that the heatmap for predicting the left wrist also fires a little response on the right wrist, which means, the heatmap has two peaks, a strong peak on the left wrist and a weak peak on the right wrist.
The two peaks problem makes the dsnt predicted result uncorrect, do you have any suggestions? Thanks!
I have trained a network that obtains key points of the face by supervising the generation of heatmaps.The network uses the max operation to obtain 68 key point coordinates of the face from the key point heat map with 68 channels output by FCN. At present I want to combine this network with another network to train together, but the max operation used before is not differentiable, so I want to replace the max operation with dsnt.
So I use batch_location_dsnt = dsntnn.dsnt(heatmap)
(the heatmap is obtained by FCN, it's a 68 * 1 * 16 * 16 tensor)
but the batch_location_dsnt I obtained is
`tensor([[[ -0.5989, -0.2222]],
[[ -0.6683, -0.0225]],
[[ -0.7003, 0.1874]],
[[ -0.7120, 0.5027]],
[[ -0.6451, 0.7451]],
[[ -0.5105, 1.0081]],
[[ -0.4522, 1.1898]],
[[ -0.2934, 1.2817]],
[[ -0.0759, 0.9567]],
[[ 0.1304, 1.0607]],
[[ 0.3462, 1.4314]],
[[ 0.7308, 1.3509]],
[[ 0.8871, 1.0625]],
[[ 1.1645, 0.7980]],
[[ 1.4735, 0.5973]],
[[ 1.3658, 0.1797]],
[[ 1.2114, -0.1012]],
[[ -0.7434, -0.7085]],
[[ -0.6286, -0.7392]],
[[ -0.4630, -0.7343]],
[[ -0.2988, -0.6485]],
[[ -0.1515, -0.5185]],
[[ 0.0185, -0.5908]],
[[ 0.3039, -0.6446]],
[[ 0.5553, -0.6704]],
[[ 0.8032, -0.6359]],
[[ 0.9848, -0.4610]],
[[ -0.1231, -0.3595]],
[[ -0.2189, -0.2581]],
[[ -0.2404, -0.0784]],
[[ -0.3306, 0.1073]],
[[ -0.4281, 0.2564]],
[[ -0.3071, 0.3424]],
[[ -0.2748, 0.3945]],
[[ -0.1277, 0.3686]],
[[ 0.0404, 0.3399]],
[[ -0.5630, -0.4150]],
[[ -0.4809, -0.4761]],
[[ -0.3541, -0.4953]],
[[ -0.2261, -0.3877]],
[[ -0.4000, -0.3473]],
[[ -0.5188, -0.3881]],
[[ 0.2428, -0.3442]],
[[ 0.4070, -0.3346]],
[[ 0.5273, -0.3868]],
[[ 0.7190, -0.2441]],
[[ 0.5536, -0.2888]],
[[ 0.4207, -0.2777]],
[[ -0.3997, 0.7421]],
[[ -0.3004, 0.5801]],
[[ -0.3018, 0.5292]],
[[ -0.1713, 0.4833]],
[[ -0.0893, 0.4787]],
[[ 0.0906, 0.6432]],
[[ 0.3095, 0.7009]],
[[ 0.1567, 0.8734]],
[[ -0.0456, 1.1209]],
[[ -0.1621, 1.0680]],
[[ -0.2678, 1.0100]],
[[ -0.3905, 0.8635]],
[[ -0.3840, 0.7459]],
[[ -0.2615, 0.6243]],
[[ -0.1569, 0.5345]],
[[ -0.1064, 0.6030]],
[[ 0.2071, 0.6364]],
[[ -0.0748, 0.8947]],
[[ -0.1838, 0.7509]],
[[ -0.2617, 0.8739]]], device='cuda:0', grad_fn=<CatBackward>)`
Obviously,[-0.5989, -0.2222] doesn't look like coordinates,Why is dsnt not outputting the maximum x and y coordinates like the max operation? How can I get the correct coordinates of the key points?
Use your model to predict 2 points which have x-y-z axis,what modify should i do?
I took a look at the paper, and curious about the regularization part, such as KL, JS..etc. I am wondering if it is possible to use 2d-guassion heatmap generated from ground truth as the regularization. What do you think of that?
The network is defined by:
class Net(nn.Module):
def __init__(self, layers):
super(Net, self).__init__()
if layers == 18:
model = models.resnet18(pretrained=True)
elif layers == 34:
model = models.resnet34(pretrained=True)
# change the first layer to recieve five channel image
model.conv1 = nn.Conv2d(5, 64, kernel_size=7, stride=2, padding=3,bias=True)
# change the last layer to output 32 coordinates
# model.fc=nn.Linear(512,32)
# remove final two layers(fc, avepool)
model = nn.Sequential(*(list(model.children())[:-2]))
for param in model.parameters():
param.requires_grad = True
self.resnet = model
def forward(self, x):
pose_out = self.resnet(x)
return pose_out
class CoordRegressionNetwork(nn.Module):
def __init__(self, n_locations, layers):
super(CoordRegressionNetwork, self).__init__()
self.resnet = Net(layers)
self.hm_conv = nn.Conv2d(512, n_locations, kernel_size=1, bias=False)
def forward(self, images):
# 1. Run the images through our Resnet
resnet_out = self.resnet(images)
# 2. Use a 1x1 conv to get one unnormalized heatmap per location
unnormalized_heatmaps = self.hm_conv(resnet_out)
# 3. Normalize the heatmaps
heatmaps = dsntnn.flat_softmax(unnormalized_heatmaps)
# 4. Calculate the coordinates
coords = dsntnn.dsnt(heatmaps)
return coords, heatmaps
And the training codes are as follows:
for i, data in enumerate(tqdm(train_dataloader)):
# training
images, poses = data['image'], data['pose']
images, poses = images.to(device), poses.to(device)
coords, heatmaps = net(images)
# Per-location euclidean losses
euc_losses = dsntnn.euclidean_losses(coords, poses)
# Per-location regularization losses
reg_losses = dsntnn.js_reg_losses(heatmaps, poses, sigma_t=1.0)
# Combine losses into an overall loss
loss = dsntnn.average_loss(euc_losses + reg_losses)
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_loss_epoch.append(loss.item())
I converted keypoint groundtruth into float(-1,1), but all the predicted coords are negative floats:
tensor([[[-0.1286, -0.0830],
[-0.1169, -0.0810],
[-0.1205, -0.1476],
...,
[-0.1767, -0.3881],
[-0.1970, -0.2403],
[-0.3226, -0.3909]],
[[-0.0694, -0.0165],
[-0.0744, -0.0288],
[-0.1027, -0.0873],
...,
[-0.0766, -0.3926],
[-0.1146, -0.2482],
[-0.0907, -0.1812]],
[[-0.4647, -0.3639],
[-0.4430, -0.3409],
[-0.2485, -0.2339],
...,
[-0.2906, -0.4541],
[-0.3648, -0.3034],
[-0.4190, -0.3880]],
and the heatmap seems strange:
Hi,
first of all: I really like the DSNT layer. It works perfectly and the idea is really cool :)
However, what would be really useful for my application, is a way to get the confidence of my model that the coordinate regressed by DSNT is correct. When looking at the heatmaps, the confidence should be very high when the heatmap values are close to 1 at the predicted position and close to 0 everywhere else. The confidence should be low when it there is a large patch in the heatmap that has a low confidence and then in the middle of this patch there is one point with a slightly higher confidence.
So I guess what I am looking for is a way to derive the standard deviation of a Gaussian which has its center at the position that is predicted by DSNT. And then I would have to transform the deviation to a confidence value between 0 and 1.
In the end I want to have n coordinates that have 3 values: x,y, confidence
Is there an easy way to do this with the functions already provided by DSNT?
Best,
Simon
Do the input image and target heat map need to be normalized to the range (-1,1) before it can be used? Because I saw you write it:
The input and target need to be put into PyTorch tensors. Importantly, the target coordinates are normalized so that they are in the range (-1, 1). The DSNT layer always outputs coordinates in this range.
Hi,
I have a question about the assumption that the DSNT layer always outputs values in (-1,1) as stated here:
https://github.com/anibali/dsntnn/blob/master/examples/basic_usage.md
Importantly, the target coordinates are normalized so that they are in the range (-1, 1). The DSNT layer always outputs coordinates in this range.
Especially in the first epoch, I sometimes get values that are a little bit outside of this range, e.g. -1.0224
I first thought it is because I forgot to normalize the heatmaps, but I did not:
heatmaps = dsntnn.flat_softmax(unnormalized_heatmaps)
coords = dsntnn.dsnt(heatmaps)
Is this maybe because of numerical instability? I read the paper and got the idea and if I understand it correctly, the heatmap is interpreted as a probability and then the x- and y- coordinates area computed by "folding" the heatmap with the grid that is shown in the paper.
Best,
Simon
My torch version is 0.4.1, an error occurs when I ran basic usage guide.
I had tried to pip install another dsntnn version described at #6, looking for your reply as soon as possible.
File ".\model.py", line 71, in <module>
coords, heatmaps = model(t)
File "C:\Python35\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File ".\model.py", line 46, in forward
coords = dsntnn.dsnt(heatmaps)
File "C:\Python35\lib\site-packages\dsntnn\__init__.py", line 79, in dsnt
return soft_argmax(heatmaps)
File "C:\Python35\lib\site-packages\dsntnn\__init__.py", line 67, in soft_argmax
return linear_expectation(heatmaps, values).flip(-1)
RuntimeError: expected flip dims axis >= 0, but got min flip dims=-1```
I'm working with 3D volumes, base model is ResNet3D50 (all convolutions are Conv3D) and wants to predict x, y, z co-ordinates. Can you please help with that? Thanks.
I look up your code at _coord_expectation function. I'm not fully understand why you sum heatmaps (Z) before inner product with own_coords (X or Y) (then you do sum it again). I thought it must be summed after inner product of heatmaps and own_coords? This really makes me confuse because in your paper, you describe in Figure 3 differently.
When training a model that outputs a heatmap, training data with missing or occluded points are easily handled by setting the target output to be a heatmap of all zeros. With the DSNT layer, how can this be handled? An all-zero heatmap corresponds corresponds to target coordinates of [0,0] here, but so does a point very well localized in the center of the image.
Hello
now i want to make landmark detection.
so i use dsntnn.
when i set batch size=1 , result is good
but
if batch size=16
result points are gathered in one place.
why is that? What should i do??
I have read the paper and was wondering if there is a fix for the problem stated on page 8:
Analysis of misclassified examples revealed that DSNT was less accurate for predicting edge case joints that lie very close to the image boundary, which is expected due to how the layer works
The reason seems to be that the X and Y grid is defined to lie in the range (-1,1) by the formulas on page 4. Is there a specific reason for this or would the DSNT also work when the grids are in the range [-1,1]?
A formula to define such a grid would be
-1 + (2*(i-1)) / (w-1)
For a heatmap that has the width 5, the grid would have these values in the columns:
i=1 => -1
i=2 => -1 + 2/4 = -0.5
i=3 => -1 + 4/4 = 0
i=4 => -1 + 6/4 = 0.5
i=5 => -1 + 8/4 = 1
So the grid would look like
-1 | -0.5 | 0 | 0.5 | 1
instead of
-0.8 | -0.4 | 0 | 0.4 | 0.8
So my question is if there is a reason to use the second grid instead of the first one? From what I see this should also work. If there is interest in this change, I could try to implement it.
The advantage would be that the system will be able to regress coordinates on the border and not just very close to the border (depending on the heatmap dimensions)
Hi @anibali !
Thanks to the your concise code, it's real very convenient to add your dsntnn module into a Hourglass Network. But when I try to do this and train the Hourglass Network with MPII dataset, it seems not converge well.
The way I add dsntnn module into Hourglass Network :
class HourglassDsntNet(nn.Module):
def __init__(self, nStack, nModules, nFeats, nRegModules):
super(HourglassDsntNet, self).__init__()
self.nStack = nStack
self.nModules = nModules
self.nFeats = nFeats
self.nRegModules = nRegModules
self.conv1_ = nn.Conv2d(3, 64, bias=True, kernel_size=7, stride=2, padding=3)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.r1 = Residual(64, 128)
self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
self.r4 = Residual(128, 128)
self.r5 = Residual(128, self.nFeats)
_hourglass, _Residual, _lin_, _tmpOut, _ll_, _tmpOut_, _reg_ = [], [], [], [], [], [], []
for i in range(self.nStack):
_hourglass.append(Hourglass(4, self.nModules, self.nFeats))
for j in range(self.nModules):
_Residual.append(Residual(self.nFeats, self.nFeats))
lin = nn.Sequential(nn.Conv2d(self.nFeats, self.nFeats, bias=True, kernel_size=1, stride=1),
nn.BatchNorm2d(self.nFeats), self.relu)
_lin_.append(lin)
_tmpOut.append(nn.Conv2d(self.nFeats, ref.nJoints, bias=True, kernel_size=1, stride=1))
_ll_.append(nn.Conv2d(self.nFeats, self.nFeats, bias=True, kernel_size=1, stride=1))
_tmpOut_.append(nn.Conv2d(ref.nJoints, self.nFeats, bias=True, kernel_size=1, stride=1))
self.hourglass = nn.ModuleList(_hourglass)
self.Residual = nn.ModuleList(_Residual)
self.lin_ = nn.ModuleList(_lin_)
self.tmpOut = nn.ModuleList(_tmpOut)
self.ll_ = nn.ModuleList(_ll_)
self.tmpOut_ = nn.ModuleList(_tmpOut_)
def forward(self, x):
x = self.conv1_(x)
x = self.bn1(x)
x = self.relu(x)
x = self.r1(x)
x = self.maxpool(x)
x = self.r4(x)
x = self.r5(x)
outMap = []
outReg = []
for i in range(self.nStack):
hg = self.hourglass[i](x)
ll = hg
for j in range(self.nModules):
ll = self.Residual[i * self.nModules + j](ll)
ll = self.lin_[i](ll)
tmpOutMap = self.tmpOut[i](ll)
heatmaps = dsntnn.flat_softmax(tmpOutMap)
outMap.append(tmpOutMap)
tmpOutReg = dsntnn.dsnt(heatmaps)
outReg.append(tmpOutReg)
ll_ = self.ll_[i](ll)
tmpOut_ = self.tmpOut_[i](tmpOutMap)
x = x + ll_ + tmpOut_
return outMap, outReg
the way I do the train procedure :
for i, (input, target2D, target3D, meta) in enumerate(dataLoader):
input_var = torch.autograd.Variable(input).float().cuda()
target2D_var = torch.autograd.Variable(target2D).float().cuda()
target3D_var = torch.autograd.Variable(target3D).float().cuda()
out_map, out_reg = model(input_var)
# filter the joint without annotation
filter = target3D_var[:, :, 2].unsqueeze(dim=2)
out_reg[0] = out_reg[0] * filter
out_reg[1] = out_reg[1] * filter
loss_map = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
loss_reg = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
loss = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
for k in range(opt.nStack):
# Per-location euclidean losses
euc_losses = dsntnn.euclidean_losses(out_reg[k], target3D_var[:, :, :2])
# Per-location regularization losses
reg_losses = dsntnn.js_reg_losses(out_map[k], target3D_var[:, :, :2], sigma_t=1.0)
# Combine losses into an overall loss
loss += dsntnn.average_loss(euc_losses + reg_losses)
loss_map += euc_losses
loss_reg += reg_losses
if split == 'train':
optimizer.zero_grad()
loss.backward()
optimizer.step()
I only try to train the network for five epochs, and the results shows that it seems not going to converge at all.
All the other experiment settings work fine with the pure Hourglass Network.
So is there any tricks I should add in my code? or I just add your module in an incorrect way?
It seems that you did some experiments with Hourglass in your paper, could you offer any help?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.