Code Monkey home page Code Monkey logo

Comments (15)

Cysu avatar Cysu commented on May 18, 2024

Sorry but we modified the official caffe for our project. So we rely on openmpi if you want to use multiple GPUs (we only use multi-gpu for testing, but not training).

We also highly recommend you install cudnn v5. After downloading and extracting it, you need to replace the /path/to/cudnn in the cmake command with your own directory path. For example, if you copy the cudnn files to /usr/local/cuda, then the cmake command should be

cmake .. -DUSE_MPI=ON -DCUDNN_INCLUDE=/usr/local/cuda/include -DCUDNN_LIBRARY=/usr/local/cuda/lib64/libcudnn.so

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

Thanks,But according to my environment which has only one server with 4 GPUs,can I use the openmpi?

from person_search.

Cysu avatar Cysu commented on May 18, 2024

Sure. You can change these two lines to

mpirun -n 4 python2 tools/eval_test.py \
  --gpu 0,1,2,3 \

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

um,thanks.
"boost >= 1.55 (A tip for Ubuntu 14.04: sudo apt-get autoremove libboost1.54* then sudo apt-get install libboost1.55-all-dev)"
it must be >=1.55? 

from person_search.

Cysu avatar Cysu commented on May 18, 2024

Yes. It should be >= 1.55.

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

xd@amax-1080:~/person_search-master$ experiments/scripts/eval_test.sh resnet50 50000 resnet50
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_crs_none: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)


A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: amax-1080
Framework: crs
Component: none

[amax-1080:00334] *** Process received signal ***
[amax-1080:00334] Signal: Segmentation fault (11)
[amax-1080:00334] Signal code: Address not mapped (1)
[amax-1080:00334] Failing at address: 0x28
[amax-1080:00334] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f65b2b0a330]
[amax-1080:00334] [ 1] /usr/lib/libmpi.so.1(mca_base_select+0x11e) [0x7f652bf16f1e]
[amax-1080:00334] [ 2] /usr/lib/libmpi.so.1(opal_crs_base_select+0x7e) [0x7f652beff28e]
[amax-1080:00334] [ 3] /usr/lib/libmpi.so.1(opal_cr_init+0x3fc) [0x7f652bf1ff1c]
[amax-1080:00334] [ 4] /usr/lib/libmpi.so.1(opal_init+0x1d0) [0x7f652bf28810]
[amax-1080:00334] [ 5] /usr/lib/libmpi.so.1(orte_init+0x37) [0x7f652beb86e7]
[amax-1080:00334] [ 6] /usr/lib/libmpi.so.1(ompi_mpi_init+0x174) [0x7f652be78024]
[amax-1080:00334] [ 7] /usr/lib/libmpi.so.1(PMPI_Init_thread+0xd4) [0x7f652be8f7f4]
[amax-1080:00334] [ 8] /usr/local/lib/python2.7/dist-packages/mpi4py/MPI.so(initMPI+0x4716) [0x7f652c27d0a6]
[amax-1080:00334] [ 9] python2(_PyImport_LoadDynamicModule+0x9b) [0x427992]
[amax-1080:00334] [10] python2() [0x55642f]
[amax-1080:00334] [11] python2() [0x4e2dec]
[amax-1080:00334] [12] python2() [0x556cf1]
[amax-1080:00334] [13] python2() [0x569c08]
[amax-1080:00334] [14] python2(PyEval_CallObjectWithKeywords+0x6b) [0x4c8c8b]
[amax-1080:00334] [15] python2(PyEval_EvalFrameEx+0x2958) [0x5264a8]
[amax-1080:00334] [16] python2() [0x567d14]
[amax-1080:00334] [17] python2(PyRun_FileExFlags+0x92) [0x465bf4]
[amax-1080:00334] [18] python2(PyRun_SimpleFileExFlags+0x2ee) [0x46612d]
[amax-1080:00334] [19] python2(Py_Main+0xb5e) [0x466d92]
[amax-1080:00334] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f65b2756f45]
[amax-1080:00334] [21] python2() [0x577c2e]
[amax-1080:00334] *** End of error message ***
jxd@amax-1080:~/person_search-master$

I do not do the pretrain work,and directly use the trained model.
As you say,I can do the test without MPI,so I do not use the MPI with "use only one GPU, remove the mpirun -n 8 in L14 and change L16 to --gpu 0",but it show the error above.How can I solve it,thanks.
In addtion,when I use the MPI following what you advise,it also show the errors like this.

from person_search.

Cysu avatar Cysu commented on May 18, 2024

It seems that you have different versions of openmpi. Let's say if you compile openmpi and install it into a local directory like /home/jxd/openmpi. Then please add the following lines in your ~/.bashrc:

export PATH=/home/jxd/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/home/jxd/openmpi/lib:$LD_LIBRARY_PATH

Restart the terminal, rm -rf build, and compile the caffe again.

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

Hello,I have successfully installed the openmpi,and test it that it can be used.Then I cmake the caffe successfully,but I still exist the questions above.So I try to do the training,it meets the same questions.
Thanks!

jxd@amax-1080:~/person_search-master$ experiments/scripts/train.sh 0 --set EXP_DIR resnet50

  • set -e
  • export PYTHONUNBUFFERED=True
  • PYTHONUNBUFFERED=True
  • GPU_ID=0
  • NET=resnet50
  • DATASET=psdb
  • array=($@)
  • len=4
  • EXTRA_ARGS='--set EXP_DIR resnet50'
  • EXTRA_ARGS_SLUG=--set_EXP_DIR_resnet50
  • case $DATASET in
  • TRAIN_IMDB=psdb_train
  • TEST_IMDB=psdb_test
  • PT_DIR=psdb
  • ITERS=50000
    ++ date +%Y-%m-%d_%H-%M-%S
  • LOG=experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53
  • exec
    ++ tee -a experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53
  • echo Logging output to experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53
    Logging output to experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53
  • python2 tools/train_net.py --gpu 0 --solver models/psdb/resnet50/solver.prototxt --weights data/imagenet_models/resnet50.caffemodel --imdb psdb_train --iters 50000 --cfg experiments/cfgs/resnet50.yml --rand --set EXP_DIR resnet50
    [amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
    [amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
    [amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
    [amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
    [amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_crs_none: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)

A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: amax-1080
Framework: crs
Component: none

[amax-1080:22914] *** Process received signal ***
[amax-1080:22914] Signal: Segmentation fault (11)
[amax-1080:22914] Signal code: Address not mapped (1)
[amax-1080:22914] Failing at address: 0x28
[amax-1080:22914] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f0a35507330]
[amax-1080:22914] [ 1] /usr/lib/libmpi.so.1(mca_base_select+0x11e) [0x7f09acb8bf1e]
[amax-1080:22914] [ 2] /usr/lib/libmpi.so.1(opal_crs_base_select+0x7e) [0x7f09acb7428e]
[amax-1080:22914] [ 3] /usr/lib/libmpi.so.1(opal_cr_init+0x3fc) [0x7f09acb94f1c]
[amax-1080:22914] [ 4] /usr/lib/libmpi.so.1(opal_init+0x1d0) [0x7f09acb9d810]
[amax-1080:22914] [ 5] /usr/lib/libmpi.so.1(orte_init+0x37) [0x7f09acb2d6e7]
[amax-1080:22914] [ 6] /usr/lib/libmpi.so.1(ompi_mpi_init+0x174) [0x7f09acaed024]
[amax-1080:22914] [ 7] /usr/lib/libmpi.so.1(PMPI_Init_thread+0xd4) [0x7f09acb047f4]
[amax-1080:22914] [ 8] /usr/local/lib/python2.7/dist-packages/mpi4py/MPI.so(initMPI+0x4716) [0x7f09acef20a6]
[amax-1080:22914] [ 9] python2(_PyImport_LoadDynamicModule+0x9b) [0x427992]
[amax-1080:22914] [10] python2() [0x55642f]
[amax-1080:22914] [11] python2() [0x4e2dec]
[amax-1080:22914] [12] python2() [0x556cf1]
[amax-1080:22914] [13] python2() [0x569c08]
[amax-1080:22914] [14] python2(PyEval_CallObjectWithKeywords+0x6b) [0x4c8c8b]
[amax-1080:22914] [15] python2(PyEval_EvalFrameEx+0x2958) [0x5264a8]
[amax-1080:22914] [16] python2() [0x567d14]
[amax-1080:22914] [17] python2(PyRun_FileExFlags+0x92) [0x465bf4]
[amax-1080:22914] [18] python2(PyRun_SimpleFileExFlags+0x2ee) [0x46612d]
[amax-1080:22914] [19] python2(Py_Main+0xb5e) [0x466d92]
[amax-1080:22914] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f0a35153f45]
[amax-1080:22914] [21] python2() [0x577c2e]
[amax-1080:22914] *** End of error message ***
experiments/scripts/train.sh: line 47: 22914 Segmentation fault (core dumped) python2 tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/solver.prototxt --weights data/imagenet_models/${NET}.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/${NET}.yml --rand ${EXTRA_ARGS}

from person_search.

Cysu avatar Cysu commented on May 18, 2024

Could you please check the output of the following commands:

which mpirun
ldd $(which mpirun) | grep mpi
ldd caffe/build/install/bin/caffe | grep mpi

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

yeah,maybe I do not cmake caffe successfully as there's no information about it ?

ldd: caffe/build/install/bin/caffe: No such file or directory

jxd@amax-1080:$ which mpirun
/usr/local/openmpi/bin/mpirun
jxd@amax-1080:
$ ldd $(which mpirun) | grep mpi
libopen-rte.so.12 => /usr/local/openmpi/lib/libopen-rte.so.12 (0x00007f75c7edc000)
libopen-pal.so.13 => /usr/local/openmpi/lib/libopen-pal.so.13 (0x00007f75c7bfe000)
jxd@amax-1080:~$ ldd caffe/build/install/bin/caffe | grep mpi
ldd: caffe/build/install/bin/caffe: No such file or directory

from person_search.

Cysu avatar Cysu commented on May 18, 2024

OK. You have another self-compiled openmpi installed at /usr/local/openmpi. So you need to add these lines to ~/.bashrc:

export PATH=/usr/local/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH

Restart the terminal, remove the build directory under caffe, and recompile it following the steps in the README file.

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

Yes,I have added these lines to ~/.bashrc,and recompile it yesterday.Are there two openmpi installed in the system?
Now I try to remove the build directory again and recompile it.Thanks

from person_search.

Cysu avatar Cysu commented on May 18, 2024

Right. In your previous log, it complaints

mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)

So you have a system-installed openmpi at /usr/lib, and a self-installed /usr/local/openmpi.

from person_search.

Jcdidi avatar Jcdidi commented on May 18, 2024

thanks a lot!
I found the issue.I add the line to ~/.bashrc:

export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so

all detection:
recall = 79.37%
ap = 74.82%
labeled only detection:
recall = 97.76%
search ranking:
mAP = 75.41%
top- 1 = 78.48%
top- 5 = 90.07%
top-10 = 92.34%

from person_search.

Cysu avatar Cysu commented on May 18, 2024

Good to hear that! Will close the issue for now, and please feel free to reopen it if there are further problems.

from person_search.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.