yvanyin / diversedepth Goto Github PK
View Code? Open in Web Editor NEWThe code and data of DiverseDepth
License: Other
The code and data of DiverseDepth
License: Other
test_any_diversedepth.py requires an annotations.json
file.
What is this file and can it be downloaded from somewhere for the pre-trained model?
Traceback (most recent call last):
File "/Users/foobar/opt/anaconda3/envs/diversedepth/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/foobar/opt/anaconda3/envs/diversedepth/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/foobar/Documents/Projects/DiverseDepth/tools/test_any_diversedepth.py", line 55, in <module>
f = open(anno_path, 'r')
FileNotFoundError: [Errno 2] No such file or directory: './annotations.json'
Hello, I see that many gits that use the kitti data set have a maximum depth of 80 meters. How do you determine the maximum depth of 80 meters?
If the original stereo image pairs are provided, then we can make use of the recent optical flow models to improve the GT of the depths.
Do you have the plans to release those data?
Hi @YvanYin
Thanks for your excellent work, but when I test your model on my own images, the results are bad.
Hope you can release the train code~
Thanks your amazing work!
I met a problem here. I tried to retrain the network according to the information you provided, but I did not get similar results to the pre-trained model you provided. The results I get are much worse compared to yours. I can't locate where the problem is, can you give me some advice? The results are as follows:
from PIL import Image
import pandas as pd
import numpy as np
import time
import open3d as o3d
class point_cloud_generator():
def __init__(self, rgb_file, depth_file, pc_file, focal_length, scalingfactor):
self.rgb_file = rgb_file
self.depth_file = depth_file
self.pc_file = pc_file
self.focal_length = focal_length
self.scalingfactor = scalingfactor
self.rgb = Image.open(rgb_file)
self.depth = Image.open(depth_file).convert('I')
self.width = self.rgb.size[0]
self.height = self.rgb.size[1]
def calculate(self):
t1=time.time()
depth = np.asarray(self.depth).T
self.Z = depth / self.scalingfactor
X = np.zeros((self.width, self.height))
Y = np.zeros((self.width, self.height))
for i in range(self.width):
X[i, :] = np.full(X.shape[1], i)
self.X = ((X - self.width / 2) * self.Z) / self.focal_length
for i in range(self.height):
Y[:, i] = np.full(Y.shape[0], i)
self.Y = ((Y - self.height / 2) * self.Z) / self.focal_length
df=np.zeros((6,self.width*self.height))
df[0] = self.X.T.reshape(-1)
df[1] = -self.Y.T.reshape(-1)
df[2] = -self.Z.T.reshape(-1)
img = np.array(self.rgb)
df[3] = img[:, :, 0:1].reshape(-1)
df[4] = img[:, :, 1:2].reshape(-1)
df[5] = img[:, :, 2:3].reshape(-1)
self.df=df
t2=time.time()
print('calcualte 3d point cloud Done.',t2-t1)
def write_ply(self):
t1=time.time()
float_formatter = lambda x: "%.4f" % x
points =[]
for i in self.df.T:
points.append("{} {} {} {} {} {} 0\n".format
(float_formatter(i[0]), float_formatter(i[1]), float_formatter(i[2]),
int(i[3]), int(i[4]), int(i[5])))
file = open(self.pc_file, "w")
file.write('''ply
format ascii 1.0
element vertex %d
property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
property uchar alpha
end_header
%s
''' % (len(points), "".join(points)))
file.close()
t2=time.time()
print("Write into .ply file Done.",t2-t1)
a = point_cloud_generator('rgb1.png', 'depth1.png', '1.ply',focal_length=50, scalingfactor=1)
a.calculate()
a.write_ply()
DiverseDepth/Train/data/diverse_dataset.py
Line 101 in b88c380
In the file test_any_diversedepth.py
, in a commented line, the output is combined with a tanh-fcn-evaluation.
pred_depth = torch.nn.functional.tanh(pred_depth) + 1
Should this be done with every output of the network to get the right disparity scaling?
Thanks in advance.
Hi,
I am trying to convert the pretrained to ios coreml. So i first converted your pretrained model to onnx and then use onnx-coreml to convert it to an ios model. Unfortunately the resize operations on ios coreml happens by using upsampling with an integer scale factor. I put some logs when running your model in python and i noticed some of the resizing are not exact integer scales (in this case they are 1.98):
# ====fcn_topdown_block resizing torch.Size([1, 256, 35, 27]) to size 70 53
# ====fcn_topdown_block resizing torch.Size([1, 256, 70, 53]) to size 140 105
If i round them in coreml script as a hack, the model cannot compile.
I wanted to know if it's possible for me to change those size in your model class to be exactly divisible by 2? i looked at the configuration file /lib/core/config.py
but could not figure it out.
keep getting ModuleNotFoundError: No module named 'Minist_Test when l run python ./Minist_Test/tools/test_depth.py --load_ckpt model.pth why?
Hi I cant download pretrained model from the link u provided, can u share a new one please? Appreciate
Great work. looking for you dataset
Hello, thanks for sharing this great research.
I'm currently studying depth estimation - can you share your trained weights with me?
Hello, thank you for posting this dataset. I am excited to start working with it, but I have a question about the part fore depth format. Specifically, all of the part fore depth images are 8-bit png files, when depths are typically stored as 16-bit in mm (i.e. multiply by 1e-3 to get to meters). So, I am wondering what the scale is for these 8-bit files? Also, is the scale for part_in and part_out actually 1e-3 like normal? It seems after some inspection that all of the 16-bit depth files in part in and part out have a max value of around 255 like the 8-bit files as well, so it would seem impossible that these depths are in mm. Without an example dataloader here, these things are impossible to determine.
In the paper, the author mentioned that the true metric depth is transformed into the perspective of a virtual camera via an affine transformation.
Just wondering what the virtual camera intrinsic parameter is you used for the paper?
Thanks for the interesting work. I have found from the paper that the Virtual Normal loss is computed as a difference between the normals of the point clouds generated from the GT depth maps and predicted depth maps.
I would like to pass the ground truth surface normals instead and compute the loss. Could you give some directions to make these changes?
Hi, thanks for sharing the code of the paper.
How did you get the scaling and translation factors when evaluating on zero-shot datasets?
DiverseDepth/lib/core/config.py
Line 54 in 4d5b285
should be resnext50_32x4d_body_stride16
?
AttributeError Traceback (most recent call last)
in ()
----> 1 test_any_diversedepth.RelDepthModel()
2 frames
/content/DiverseDepth/lib/models/diverse_depth_model.py in init(self)
11 super(RelDepthModel, self).init()
12 self.loss_names = ['Virtual_Normal']
---> 13 self.depth_model = DepthModel()
14
15 def forward(self, data):
/content/DiverseDepth/lib/models/diverse_depth_model.py in init(self)
153 #bottom_up_model = 'lateral_net.lateral_' + cfg.MODEL.ENCODER
154 bottom_up_model = network.name.split('.')[-1] + '.lateral_' + cfg.MODEL.ENCODER
--> 155 self.encoder_modules = get_func(bottom_up_model)()
156 #self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
157 self.decoder_modules = network.fcn_topdown()
/content/DiverseDepth/lib/utils/net_tools.py in get_func(func_name)
28 # Otherwise, assume we're referencing a module under modeling
29 module_name = 'lib.models.' + '.'.join(parts[:-1])
---> 30 module = importlib.import_module(module_name)
31 return getattr(module, parts[-1])
32 except Exception:
AttributeError: module 'lib.models.lateral_net' has no attribute 'lateral_resnext50_body_32x4d'
Your work is excellent, and I appreciate the sharing of your RGB-D dataset. I am wondering if there would be other download links like the google drive link, as the current link seems to be the cloud drive in your school, which is hard to download in Mainland, China.
Hello,
Fristly, thank for sharing your work.!!
like my title, do you have a plan to release the training code ??
Hello,
The teacher_curriculum.npy file is missing from the provided DIML annotations zip file (gdrive and cloudstor). I am wondering if you had a teacher curriculum for that split or not? If so, and this is an error, could you please provide it?
Thanks for your work and publishing your models. I have tested your pre-trained models with some of my images for a different use case. The depth prediction looks promising, but not accurate. I want to train a model by adding some data from my dataset so that it could predict accurate depth. Any suggestions on how to add custom data? Do I need any other files other than the train_annotations.json and valid_annotations.json?
What is the importance of teacher_curriculum.npy? How do I create this for the custom data?
Great work. I just want to check the time of releasing the dataset. Any schedule?
Hi,
I cannot find the diverse data dataset you mention in your paper. Can you please share it?
Hello, it seems that the code related to the loss(Ranking loss/WCEL loss/SSIL loss/MSGL_Loss/YouTube3D_Ranking_Loss) defined in ModelLoss is missing, could you provide them?
Hi, thank you for your work! I am a new one for relative depth estimation. I want to know why the evaluate for depth estimation only use metric depth? Can I use relative depth to evaluate, like RMSE, AbsRel use relative depth? Looking forward to your reply. Thank you very much!
Hi Wei Yin,
I'll give a quick paragraph of introduction before I'll pose my questions. I'm a computer science student who's working on a little side project. I made a fairly small script that changes video frames into audio in real time. I'm looking to test it's feasibility for use of object detection through audio. Currently it's working on grayscale images, but I'm trying to change that to depth estimations. That's why I started looking at your code. I'm a bit at a loss how to use it though. I tried looking into your paper, but I can't seem to open it because I'm not allowed on campus due to COVID
Say I have extracted a videoframe in my own python module. How would I convert that image to a depth estimation. Which imports do I need to do and which functions would I need to call?
How would the execution change on a lower resolution. Due to hardware limitations I'm downscaling my frames before rendering my waveforms. Should I run your code over my frames before or after downscaling?
Lastly, if I won't use it for commercial use, am I allowed to use this as part of my program if I want to write a research with it?
Kind regards,
Koen
Hello author, first of all,thank you for your outstanding work, I used your Git to test on my own data set, the effect is very great; I see that your paper also lists the figures of converting the depth map to point cloud, how did you achieve it, and what software did you use to visualize the point cloud diagram?
Very nice job!
Could you provide the code to get absolute depth values in meters, and camera parameters to generate the pointcloud?
Thanks!
I'm attempting to run the Quick Start code on windows but keep getting the same error. Since "export" doesn't exist on windows, I tried using "set" instead but that didn't seem to help.
I get this error:
Traceback (most recent call last): File "./Minist_Test/tools/test_depth.py", line 1, in <module> from Minist_Test.lib.diverse_depth_model import RelDepthModel ModuleNotFoundError: No module named 'Minist_Test'
Am I missing a step in setting up the environment in Conda or is it something else?
using the model from here and a simple:
from lib.models.diverse_depth_model import RelDepthModel
model = RelDepthModel()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-75d68ac643e0> in <module>()
5 print("__ ", cfg.MODEL.ENCODER)
6
----> 7 model = RelDepthModel()
8 model.eval()
9 """checkpoint = torch.load("resnext50_32x4d.pth")
6 frames
/content/DiverseDepth/lib/utils/resnext_weights_helper.py in convert_state_dict(src_dict)
39 print(k)
40 toks = k.split('.')
---> 41 if int(toks[0]) == 0:
42 name = 'res%d.' % res_id + 'conv1.' + toks[-1]
43 elif int(toks[0]) == 1:
ValueError: invalid literal for int() with base 10: 'model_state_dict'
k == "model_state_dict"
even when trying like this:
for k, v in src_dict["model_state_dict"].items():
k == "depth_model.encoder_modules.topdown_lateral_modules.0.lateral.conv1.weight"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.