Code Monkey home page Code Monkey logo

Comments (13)

yinjunbo avatar yinjunbo commented on June 14, 2024 2

Finally,I find it‘s a problem about the version of g++/gcc .
The default version when compiling MakeFile is 5.4.1 and it turns out that problem.
When I change the version of g++/gcc to 4.8.4 , everything works out well.
CC = gcc-4.8 -O2 -pthread CXX = g++-4.8

from flownet2-tf.

yinjunbo avatar yinjunbo commented on June 14, 2024 1

The same problem happens to me,and I block out the code as @chuchienshu .
But new problem comes out.The training process stop at an early step(0,30,60etc,random),and report "LossTensor is inf or nan".
Have you met the same problem? Thank you very much! @sampepose @chuchienshu @vladpaunescu

I discover that the problem may owe to the wrong version of tf1.2 compiling the code.I change the version to 1.3,and recompile the code.Finally it works.

from flownet2-tf.

yinjunbo avatar yinjunbo commented on June 14, 2024 1

@myhooo Try to add this code before "return..."
image_as, image_bs, flows = map(lambda x: tf.expand_dims(x, 0), [image_a, image_b, flow])
and don't forget to change corresponding variable in tf.train.batch .

from flownet2-tf.

vladpaunescu avatar vladpaunescu commented on June 14, 2024

Please enable debug to check the exact error:

net = FlowNetS(debug=True)

If the error is that (this was in my case) :

Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/opt/ffmpeg/lib
2017-09-29 18:35:26.710728: F ./tensorflow/stream_executor/lib/statusor.h:212] Non-OK-status: status_ status: Failed precondition: could not dlopen DSO: libcupti.so.8.0; dlerror: libcupti.so.8.0: cannot open shared object file: No such file or directory

Please add /usr/local/cuda/extras/CUPTI/lib64/ to your LD_LIBRARY_PATH

tensorflow/tensorflow#8830

from flownet2-tf.

vladpaunescu avatar vladpaunescu commented on June 14, 2024

Actually, the Segmentation Fault is still happening to me.
After running
gdb python
run src/flownet_s/train.py
I found this:

QueueRunner: corrupted record at 275253737

LE:

It seems SIGSEG is caused in C++ augmentation prerpocessing.so:

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fff7a7fc700 (LWP 13435)] std::_Function_handler<void(long long int, long long int), tensorflow::Augment(tensorflow::OpKernelContext*, const Device&, int, int, int, int, int, int, int, float const*, float*, float const*, float*) [with Device = Eigen::ThreadPoolDevice]::<lambda(tensorflow::int64, tensorflow::int64)> >::_M_invoke(const std::_Any_data &, <unknown type in /mnt/hdd1/git/caffe_experiments/obstacle-detection/flow/flownet2-tf/src/./ops/build/preprocessing.so, CU 0x72971, DIE 0xb7a61>, <unknown type in /mnt/hdd1/git/caffe_experiments/obstacle-detection/flow/flownet2-tf/src/./ops/build/preprocessing.so, CU 0x72971, DIE 0xb7a66>) (__functor=..., __args#0=<unknown type in /mnt/hdd1/git/caffe_experiments/obstacle-detection/flow/flownet2-tf/src/./ops/build/preprocessing.so, CU 0x72971, DIE 0xb7a61>, __args#1=<unknown type in /mnt/hdd1/git/caffe_experiments/obstacle-detection/flow/flownet2-tf/src/./ops/build/preprocessing.so, CU 0x72971, DIE 0xb7a66>) at /usr/include/c++/5/functional:1871 1871 (*_Base::_M_get_pointer(__functor))(

from flownet2-tf.

chuchienshu avatar chuchienshu commented on June 14, 2024

Thanks for your kindness, bro. I bow to your judgement, but I have no ability to change the code of prerpocessing.cc. Fortunately, I got ideal result via comment part of that code and did some delicate modify.Thanks the same! @vladpaunescu

from flownet2-tf.

sampepose avatar sampepose commented on June 14, 2024

from flownet2-tf.

vladpaunescu avatar vladpaunescu commented on June 14, 2024

@chuchienshu No worries 👍

Can you share the fix, cause I still can't train any model at all? All I get is Segmentation Fault.
Also, it could help what is the exact TensorFlow version on which the code was implemented, as well as python version, and CUDA with CuDNN version. @sampepose can you kindly add the requirements in README.md?

from flownet2-tf.

vladpaunescu avatar vladpaunescu commented on June 14, 2024

Hi!
I managed to train without any data augmentation. However it would be nice to have the C++ code working. So, any advice could be of great value for me.

Vlad

from flownet2-tf.

chuchienshu avatar chuchienshu commented on June 14, 2024

@sampepose I just removed the data augmentation code at src/dataloader.py
like below.

'''crop = [dataset_config['PREPROCESS']['crop_height'],
                dataset_config['PREPROCESS']['crop_width']]
        config_a = config_to_arrays(dataset_config['PREPROCESS']['image_a'])
        ......
 # Perform flow augmentation using spatial parameters from data augmentation
            flows = _preprocessing_ops.flow_augmentation(
                flows, transforms_from_a, transforms_from_b, crop)'''

@vladpaunescu Looks you did the same, and I wish the C++ code working, too.It looks awesome!

from flownet2-tf.

myhooo avatar myhooo commented on June 14, 2024

@yinjunbo Hi, I meet the same problem 'segmentation fault(core dumped)' and I remove the data augmentation code like the chuchienshu said. However, there are problems like this
File "/home/hmy/flownet2_tf/src/flownet2/train.py", line 22, in <module> './checkpoints/FlowNetSD/flownet-SD.ckpt-0': ('FlowNet2/FlowNetSD', 'FlowNet2') File "src/net.py", line 99, in train predictions = self.model(inputs, training_schedule) File "src/flownet2/flownet2.py", line 19, in model _, height, width, _ = inputs['input_a'].shape.as_list() ValueError: need more than 3 values to unpack
Have you met the same situation? Once you have solved such problems, it will be very kind of you to share some tips with me, thanks~

from flownet2-tf.

myhooo avatar myhooo commented on June 14, 2024

@yinjunbo Thank you very much~ It seems that you have recompiled successfully, congratulation!
And I want to know if you've just changed the version of tensorflow?
Thank you in advance~

from flownet2-tf.

yinjunbo avatar yinjunbo commented on June 14, 2024

@myhooo You're welcome. I've recompiled it without data augmentation,and the version 1.2.0 seems to work.

from flownet2-tf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.