Code Monkey home page Code Monkey logo

byteps's Introduction

BytePS

Build Status License Pypi

BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on either TCP or RDMA network.

BytePS outperforms existing open-sourced distributed training frameworks by a large margin. For example, on BERT-large training, BytePS can achieve ~90% scaling efficiency with 256 GPUs (see below), which is much higher than Horovod+NCCL. In certain scenarios, BytePS can double the training speed compared with Horovod+NCCL.

News

  • BytePS paper has been accepted to OSDI'20. The code to reproduce the end-to-end evaluation is available here.
  • Support gradient compression.
  • v0.2.4
    • Fix compatibility issue with tf2 + standalone keras
    • Add support for tensorflow.keras
    • Improve robustness of broadcast
  • v0.2.3
    • Add DistributedDataParallel module for PyTorch
    • Fix the problem of different CPU tensor using the same name
    • Add skip_synchronize api for PyTorch
    • Add the option for lazy/non-lazy init
  • v0.2.0
    • Largely improve RDMA performance by enforcing page aligned memory.
    • Add IPC support for RDMA. Now support colocating servers and workers without sacrificing much performance.
    • Fix a hanging bug in BytePS server.
    • Fix RDMA-related segmentation fault problem during fork() (e.g., used by PyTorch data loader).
    • New feature: Enable mixing use of colocate and non-colocate servers, along with a smart tensor allocation strategy.
    • New feature: Add bpslaunch as the command to launch tasks.
    • Add support for pip install: pip3 install byteps

Performance

We show our experiment on BERT-large training, which is based on GluonNLP toolkit. The model uses mixed precision.

We use Tesla V100 32GB GPUs and set batch size equal to 64 per GPU. Each machine has 8 V100 GPUs (32GB memory) with NVLink-enabled. Machines are inter-connected with 100 Gbps RDMA network. This is the same hardware setup you can get on AWS.

BytePS achieves ~90% scaling efficiency for BERT-large with 256 GPUs. The code is available here. As a comparison, Horovod+NCCL has only ~70% scaling efficiency even after expert parameter tunning.

BERT-Large

With slower network, BytePS offers even more performance advantages -- up to 2x of Horovod+NCCL. You can find more evaluation results at performance.md.

Goodbye MPI, Hello Cloud

How can BytePS outperform Horovod by so much? One of the main reasons is that BytePS is designed for cloud and shared clusters, and throws away MPI.

MPI was born in the HPC world and is good for a cluster built with homogeneous hardware and for running a single job. However, cloud (or in-house shared clusters) is different.

This leads us to rethink the best communication strategy, as explained in here. In short, BytePS only uses NCCL inside a machine, while re-implements the inter-machine communication.

BytePS also incorporates many acceleration techniques such as hierarchical strategy, pipelining, tensor partitioning, NUMA-aware local communication, priority-based scheduling, etc.

Quick Start

We provide a step-by-step tutorial for you to run benchmark training tasks. The simplest way to start is to use our docker images. Refer to Documentations for how to launch distributed jobs and more detailed configurations. After you can start BytePS, read best practice to get the best performance.

Below, we explain how to install BytePS by yourself. There are two options.

Install by pip

pip3 install byteps

Build from source code

You can try out the latest features by directly installing from master branch:

git clone --recursive https://github.com/bytedance/byteps
cd byteps
python3 setup.py install

Notes for above two options:

  • BytePS assumes that you have already installed one or more of the following frameworks: TensorFlow / PyTorch / MXNet.
  • BytePS depends on CUDA and NCCL. You should specify the NCCL path with export BYTEPS_NCCL_HOME=/path/to/nccl. By default it points to /usr/local/nccl.
  • The installation requires gcc>=4.9. If you are working on CentOS/Redhat and have gcc<4.9, you can try yum install devtoolset-7 before everything else. In general, we recommend using gcc 4.9 for best compatibility (how to pin gcc).
  • RDMA support: During setup, the script will automatically detect the RDMA header file. If you want to use RDMA, make sure your RDMA environment has been properly installed and tested before install (install on Ubuntu-18.04).

Examples

Basic examples are provided under the example folder.

To reproduce the end-to-end evaluation in our OSDI'20 paper, find the code at this repo.

Use BytePS in Your Code

Though being totally different at its core, BytePS is highly compatible with Horovod interfaces (Thank you, Horovod community!). We chose Horovod interfaces in order to minimize your efforts for testing BytePS.

If your tasks only rely on Horovod's allreduce and broadcast, you should be able to switch to BytePS in 1 minute. Simply replace import horovod.tensorflow as hvd by import byteps.tensorflow as bps, and then replace all hvd in your code by bps. If your code invokes hvd.allreduce directly, you should also replace it by bps.push_pull.

Many of our examples were copied from Horovod and modified in this way. For instance, compare the MNIST example for BytePS and Horovod.

BytePS also supports other native APIs, e.g., PyTorch Distributed Data Parallel and TensorFlow Mirrored Strategy. See DistributedDataParallel.md and MirroredStrategy.md for usage.

Limitations and Future Plans

BytePS does not support pure CPU training for now. One reason is that the cheap PS assumption of BytePS do not hold for CPU training. Consequently, you need CUDA and NCCL to build and run BytePS.

We would like to have below features, and there is no fundamental difficulty to implement them in BytePS architecture. However, they are not implemented yet:

  • Sparse model training
  • Fault-tolerance
  • Straggler-mitigation

Publications

  1. [OSDI'20] "A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters". Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo.

  2. [SOSP'19] "A Generic Communication Scheduler for Distributed DNN Training Acceleration". Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo. (Code is at bytescheduler branch)

byteps's People

Contributors

azuresol avatar bobzhuyb avatar changlan avatar dbonner avatar eric-haibin-lin avatar gongwei-130 avatar haoxintong avatar hugozhl avatar jasonliu747 avatar jasperzhong avatar joapolarbear avatar juncgu avatar laochonlam avatar pengyanghua avatar pleasantrabbit avatar un-knight avatar vincentleemax avatar wuyifan18 avatar ymjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

byteps's Issues

How did you get the horovod & bytePS performance

I have the same hardware envs, same network, but I could not get the result as you, almost half as you. Any best practices and experience? thanks very much! for bytePS with 1 instance and 8 GPU, I have similar testing result.

which version of tensorflow is required by byteps

I installed tensorflow 1.10, but when I ran byteps, the logs showed AttributeError: 'module' object has no attribute 'v1'. I think it might be caused by my mismatched tf. So, which version of tensorflow does byteps support?

$ DMLC_ROLE=worker DMLC_PS_ROOT_URI=12.12.10.12 DMLC_PS_ROOT_PORT=9000 DMLC_WORKER_ID=0 DMLC_NUM_WORKER=1 DMLC_NUM_SERVER=1 python launcher/launch.py python example/tensorflow/tensorflow_mnist.py
BytePS launching worker
INFO:tensorflow:Create CheckpointSaverHook.
Traceback (most recent call last):
  File "example/tensorflow/tensorflow_mnist.py", line 160, in <module>
    tf.app.run()
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "example/tensorflow/tensorflow_mnist.py", line 152, in main
    config=config) as mon_sess:
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 421, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 832, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 548, in __init__
    h.begin()
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/__init__.py", line 107, in begin
    self.bcast_op = broadcast_global_variables(self.root_rank)
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/__init__.py", line 66, in broadcast_global_variables
    return broadcast_variables(tf.global_variables(), root_rank, scope)
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/__init__.py", line 78, in broadcast_variables
    for var in variables])
  File "/home/shuai/.conda/envs/byteps/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/ops.py", line 116, in broadcast
    scope = tf.compat.v1.get_default_graph().get_name_scope()
AttributeError: 'module' object has no attribute 'v1'
^C^C^C^C^C^C^C^C^C^CTerminated

BytePS requires root permission

Describe the bug
BytePS needs to create socket hard-coded at this path:

#define BASE_SOCKET_PATH_RECV "/usr/local/socket_recv_"
#define BASE_SOCKET_PATH_SEND "/usr/local/socket_send_"

As a result, BytePS cannot be used without root permission. Running BytePS without root permission gives:

[2019-06-27 11:01:05.326916: F byteps/common/communicator.cc:135] Check failed: (ret) >= (0) /usr/local/socket_send_0 bind failed: Permission denied

"set_mempolicy: Operation not permitted" and performance degradation in 8GPU with single machine

Describe the bug
I use 4GPUs(1080ti) in a single machine, It perform well, but when I use 8GPUs and byteps get performance degradation: from 161.8 img/sec per GPU to 17.4 img/sec per GPU and have some warning info "set_mempolicy: Operation not permitted".

To Reproduce
Steps to reproduce the behavior:
Just change the gpu num in step-by-step toturial

Environment (please complete the following information):

  • OS: ubuntu
  • GCC version: Ubuntu 5.4.0-6ubuntu1~16.04.11)
  • CUDA and NCCL version: CUDA Version 9.0.176
  • Framework (TF, PyTorch, MXNet): TensorFlow
  • use the byteps docker

Additional context
Add any other context about the problem here.

Has anyone test this with bert training?

I'm very curious about the performance on bert training. Now I train a bert with horovod with 32 V100 gpus, but the speedup ratio is very low, if some test this tell me please.

more workers (distributed mode) perform worse than single worker (standalone mode)?

I'm running torch resnet50 benchmark. but get the following unexpected results.
GPUs are the same. workers are in the same LAN, but network bandwidth between worker/server/scheduler are somewhat different. is there something wrong with my experiment?

1、single worker with 4GPUs:156 img/sec per GPU。

2、2 workers with 4GPUs respectively(8 GPUs in total):40~50 img/sec per GPU。

Build Error with mxnet extension

Describe the bug
Failed with building byteps with MXNet extension.

The output of import byteps.mxnet as bps is :

OSError: /home/anaconda3/lib/python3.5/site-packages/byteps-0.1.0-py3.5-linux-x86_64.egg/byteps/mxnet/c_lib.cpython-35m-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory

Envs

  • OS: ubuntu16.04 and 18.04
  • GCC version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
  • CUDA and NCCL version: cuda10.0
  • MXNet version: cu100 1.5.0b20190418

To Reproduce

python setup.py install

Error Info:

byteps/mxnet/tensor_util.cc: In static member function ‘static void byteps::mxnet::TensorUtil::ResizeNd(mxnet::NDArray*, int, int64_t*)’:
byteps/mxnet/tensor_util.cc:139:29: error: no matching function for call to ‘mxnet::TShape::TShape(int&)’
   TShape mx_shape(nDimension);
                             ^

In mxnet/tuple.h there is no constructor function for only dimension input, so I changed the code from TShape mx_shape(nDimension) to TShape mx_shape(nDimension, 0).
Then it works fine for me.

Im not sure if the reason is the version of mxnet.

Why not using worker'CPU as PS?

hello:

May I ask why not using workers' CPUs as PS?

Why should PS be placed on a physically different machine from workers? There should be data transfer through worker and PS.

If local CPU is used as PS, the cost between local server and worker should be less.

Can you help me understand this point?

kvstore='device' cause `TypeError: bad operand type for unary -: 'str'`

Describe the bug
I set kvstore='device' in model.fit() method, cause byteps/byteps/mxnet/__init__.py line 50 the variable index is not int type but str type, and raise TypeError: bad operand type for unary -: 'str'

To Reproduce

  1. the environments and codes are the same as codes
  2. I just changed the arg kvstore=None to kvstore='device' in model.fit() train_bps.py
  3. I got Exception
    Traceback (most recent call last): File "_ctypes/callbacks.c", line 315, in 'calling callback function' File "/usr/local/lib/python2.7/dist-packages/mxnet/kvstore.py", line 85, in updater_handle updater(key, lhs, rhs) File "/usr/local/lib/python2.7/dist-packages/mxnet/optimizer/optimizer.py", line 1531, in __call__ self.optimizer.update_multi_precision(index, weight, grad, self.states[index]) File "/usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/__init__.py", line 61, in update_multi_precision self._do_push_pull(index, grad) File "/usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/__init__.py", line 52, in _do_push_pull byteps_push_pull(grad, version=0, priority=-index, TypeError: bad operand type for unary -: 'str'
    But the the worker process would not exit.
    And I also found that the index which caused TypeError was the same as the name variable in _update_params_on_kvstore() in mxnet/model.py mxnet/model.py

Environment (please complete the following information):

  • OS: 16.04.3 LTS
  • GCC version: 4.9
  • CUDA and NCCL version:9.0 and d7a58cfa5865c4f627a128c3238cc72502649881
  • Framework (TF, PyTorch, MXNet):MXNet

run example failed

Describe the bug
installed byteps with gcc4.9 and tensorflow=1.11.0

when i run python3 keras_mnist.py
it occurs that
byteps-0.1.0-py3.6-linux-x86_64.egg/byteps/tensorflow/c_lib.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNK10tensorflow8OpKernel4nameB5cxx11Ev

Environment (please complete the following information):

  • OS: ubuntu 18.04
  • GCC version: gcc49
  • CUDA and NCCL version: cuda 10, nccl 2.4
  • Framework (TF, PyTorch, MXNet): tf = 1.11.0 keras = 2.2.4

Additional context
Add any other context about the problem here.

Parameter sharding ?

Hi,

I am just curious. In your document, you specify that:

  • PS: do gradient reduction
  • Worker: apply gradient update

So does it mean that each worker should hold all the parameters ?

CUDA runtime error when running with pytorch benchmark_byteps.py

Describe the bug
Got cuda runtime error when running with pytorch benchmark_byteps.py.

Error info:

BytePS launching worker
running benchmark...
Model: resnet50
Batch size: 32
Number of GPUs: 1
Running warmup...
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 109, in <module>
    timeit.timeit(benchmark_step, number=args.num_warmup_batches)
  File "/usr/lib/python2.7/timeit.py", line 237, in timeit
    return Timer(stmt, setup, timer).timeit(number)
  File "/usr/lib/python2.7/timeit.py", line 202, in timeit
    timing = self.inner(it, self.timer)
  File "/usr/lib/python2.7/timeit.py", line 100, in inner
    _func()
  File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 90, in benchmark_step
    output = model(data)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torchvision/models/resnet.py", line 150, in forward
    x = self.conv1(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405

To Reproduce
Steps to reproduce the behavior:
Following the step by step tutorial, and I use the bytepsimage/worker_pytorch image from official.

Environment (please complete the following information):
same as byteps official pytorch worker image.

Additional context
Add any other context about the problem here.

NCCL error: invalid usage

Describe the bug
When running the example/pytorch/train_mnist_byteps.py, at bps.broadcast_parameters(model.state_dict(), root_rank=0)
an error was raised
byteps/common/core_loops.cc :309] Check failed: r == ncclSuccess NCCL error: invalid usage

To Reproduce
Steps to reproduce the behavior:

export NVIDIA_VISIBLE_DEVICES=4,5,6,7  # say you have 4 GPUs
export CUDA_VISIBLE_DEVICES=4,5,6,7  # say you have 4 GPUs
export DMLC_WORKER_ID=0 # your worker id
export DMLC_NUM_WORKER=1 # you only have one worker
export DMLC_ROLE=worker # your role is worker

export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=127.0.0.1
export DMLC_PS_ROOT_PORT=1234

python byteps/launcher/launch.py python example/pytorch/train_mnist_byteps.py

Expected behavior
Expected to no error.

Environment (please complete the following information):

  • OS: Ubuntu 16.04
  • GCC version: 5.4.0
  • CUDA and NCCL version: CUDA 9.0, NCCL 2.4.8
  • Framework (TF, PyTorch, MXNet): Pytorch

How to run distributed TensorFlow

Hi,

I want to run distributed tensorflow in my physical machine, though the scheduler, the server and the only worker are in the same physical machine (without docker). I only installed tensorflow, without mxnet or pytorch. When I run cmd

$ DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=12.12.10.12 DMLC_PS_ROOT_PORT=9000 DMLC_NUM_WORKER=1 DMLC_NUM_SERVER=1 python launcher/launch.py

the output is

BytePS launching scheduler
BYTEPS_SERVER_MXNET_PATH env not set

And the server's output is similar. Do I have to install mxnet to run distributed training, though I only use tensorflow?

tensorflow distribute run error

I follow the step-by-step-tutorial run distribute tensorflow demo with docker,it run error when step near 10000

[06:42:50] 3rdparty/ps-lite/include/dmlc/logging.h:276: [06:42:50] src/./zmq_van.h:304: Check failed: 0 failed to receive message. errno: 88 Socket operation on non-socket

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x31ef8) [0x7f0ce560eef8]
[bt] (1) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x32e19) [0x7f0ce560fe19]
[bt] (2) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x83c08) [0x7f0ce5660c08]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f0d6e4b1c80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f0e0dc2a6ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0e0d96041d]

terminate called after throwing an instance of 'dmlc::Error'
what(): [06:42:50] src/./zmq_van.h:304: Check failed: 0 failed to receive message. errno: 88 Socket operation on non-socket

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x31ef8) [0x7f0ce560eef8]
[bt] (1) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x32e19) [0x7f0ce560fe19]
[bt] (2) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x83c08) [0x7f0ce5660c08]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f0d6e4b1c80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f0e0dc2a6ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0e0d96041d]

/usr/local/byteps/example/tensorflow/run_tensorflow_byteps.sh: line 14: 333 Aborted (core dumped) python $path/tensorflow_mnist.py $@
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/byteps/launcher/launch.py", line 18, in worker
subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/tensorflow/run_tensorflow_byteps.sh --model ResNet50 --num-iters 1000' returned non-zero exit status 134

Bug in byteps.mxnet broadcast_parameters when processing gluon deferred initialization parameters.

Describe the bug

for _, p in sorted(params.items()):
try:
tensors.append(p.data())
except mx.gluon.parameter.DeferredInitializationError:
# Inject wrapper method with post-initialization broadcast to
# handle parameters with deferred initialization
global parameter_index
byteps_declare_tensor(p.data(), "parameter_"+str(parameter_index))
new_init = _append_broadcast_init(p, root_rank, parameter_index)
parameter_index += 1

When broadcast Block.collect_params(), in each step of loop:

  1. try p.data()
  2. if catch DeferredInitializationError, it call p.data() again.

Then the same exception will be raised.

In example/mxnet-gluon, I use model.summary(...) to initialize parameters before broadcast to other processes. But there should be a better way to address it.

One possible way is to put the broadcast in DistributedTrainer. When running first iter of training, all deferred initializied parameters will finish their initialization. Now we can run declare_tensor and push_pull.

Can I use my own container?

Hello. I was using Horovod for tensorflow distributed training for a long time. Given the performance comparison, I really want to try Byteps. However, I have a customized container which includes some dependencies and most importantly, our own version of tensorflow. Is it possible to run with our containers or do some modification with the official container?

Stuck with keras example provided by byteps

My command is just like:
python /usr/local/byteps/launcher/launch.py /usr/local/byteps/example/keras/run_keras.sh --batch-size=32 --num-iters=2
The two workers stuck:
image
image
While the server and scheduler:
~# python /usr/local/byteps/launcher/launch.py
BytePS launching server
[08:26:36] [08:26:36] src/customer.ccsrc/kvstore/././kvstore_dist_server.h:166: Enable zero copy of pull operations.
:226: Use seperate thread to process pull requests from each worker.
[08:26:36] src/customer.cc:239: Server uses 1 threads to process push requests.
[08:26:36] src/customer.cc:93: Server inits Pull Thread-1
[08:26:36] src/customer.cc:93: Server inits Pull Thread-0
[08:26:36] src/customer.cc:137: Server inits Push Thread-0
[08:26:36] src/customer.cc:258: All threads inited, ready to process message
[08:26:36] src/./zmq_van.h:285: Start ZMQ recv thread

python /usr/local/byteps/launcher/launch.py
BytePS launching scheduler
[08:26:22] src/./zmq_van.h:285: Start ZMQ recv thread

Following the instruction from https://github.com/bytedance/byteps/blob/master/docs/step-by-step-tutorial.md
I've changed the environment variable to: export EVAL_TYPE=mnist_advanced
and I tried the tensorflow workers with mnist example(/usr/local/byteps/example/tensorflow/tensorflow_mnist.py) and it done well.
I'm confused. Could you help me out?

Failure in mxnet-gluon example (RDMA)

Describe the bug
I am running the mxnet-gluon exmaple. However, the server will crash at the beginning of training (after building connections with others).

Experiment setup:
bytedance-mxnet, byteps (4f40347)
one scheduler, one worker (on the same machine); one worker with 4 GPUs (on the other macine)

If I use TCP, then there is no error, the job can run normally
However, if I use RDMA (DMLC_ENABLE_RDMA=1), the server will crash:

[12:03:48] src/./rdma_van.h:572: Connecting to S[8]
[12:03:49] src/./rdma_van.h:572: Connecting to S[8]
[12:03:49] src/./rdma_van.h:572: Connecting to S[8]
[12:03:49] src/van.cc:306: S[8] is connected to others
[12:03:49] src/van.cc:446: ? => 1. Meta: request=1, timestamp=1, control={ cmd=BARRIER, barrier_group=7 }. THIS IS NOT DATA MSG!
[12:03:49] src/van.cc:471: 1 => 8. Meta: request=0, timestamp=4, control={ cmd=BARRIER, barrier_group=1505746176 }. THIS IS NOT DATA MSG!
terminate called after throwing an instance of 'dmlc::Error'
  what():  [12:04:11] src/./rdma_van.h:134: Check failed: p

Before that, the worker should just send a data msg to the server:

[12:04:11] src/./rdma_van.h:690: send push key=0, val_len=2000, val_addr=69268529152000, rkey=1909622
[12:04:11] src/van.cc:446: ? => 8. Meta: request=1, timestamp=0, app_id=0, customer_id=0, simple_app=0, push=1, head=0, key=0, data_type={ UINT64 OTHER INT32 } Body: data_size=8 data_size=2000 data_size=4

If I run the regular mxnet example with RDMA, there is no error.

The error at the server side is inside ps-lite:
https://github.com/bytedance/ps-lite/blob/cd02f17b39c59fced4ae4f7688e8a8cdb54a96ca/src/rdma_van.h#L134

char *Alloc(size_t size) {
     ...
      char *p =
          reinterpret_cast<char *>(aligned_alloc(kAlignment, new_mem_size));
      CHECK(p);

Thank you

The speed of iteration of distributed training is slower than single instance's.

Describe the bug
I ran both single-node-training and distributed-training respectively. The speed of iteration of single-training is about 100 samples/sec, and the speed of iteration of distributed-training is about 50 samples/sec per node. The single-node-training is in one node with GPU, and the distributed-training is in two nodes with 1GPU/node(there are altogether 2GPUs). the sum of two nodes speed of iteration is about 100 samples/sec(the same as single-node-training), but it cost two times of resource as single-node-training. Why???

To Reproduce
Steps to reproduce the behavior:

  1. Building Docker images
    I used Docker to run tasks, here is my Dockerfile:
    Dockerfile.server, Dockerfile.worker.mxnet.cu90
  2. I have 4 nodes to run distributed training with byteps, 2 workers, 2 servers, and 1 scheduler. All of the nodes' speed of network is 20000Mb/s(ethtool bond0 | grep Speed).
    192.168.0.101: worker-0
    192.168.0.104: worker-1
    192.168.0.56: server-0, scheduler
    192.168.0.7: server-1
  3. Run distributed training
    192.168.0.101 worker-0:
export DMLC_ROLE=worker;
export DMLC_PS_ROOT_URI=192.168.0.56;
export DMLC_PS_ROOT_PORT=9000;
export DMLC_WORKER_ID=0;
export DMLC_NUM_WORKER=2;
export DMLC_NUM_SERVER=2;
python /usr/local/byteps/launcher/launch.py python train_bps.py --network r100 --loss arcface --dataset retina --models-root ./models/r100-5x5 --device 0 --per-batch-size 60

192.168.0.104 worker-1:

export DMLC_ROLE=worker;
export DMLC_PS_ROOT_URI=192.168.0.56;
export DMLC_PS_ROOT_PORT=9000;
export DMLC_WORKER_ID=1;
export DMLC_NUM_WORKER=2;
export DMLC_NUM_SERVER=2;
python /usr/local/byteps/launcher/launch.py python train_bps.py --network r100 --loss arcface --dataset retina --models-root ./models/r100-5x5 --device 0 --per-batch-size 60

192.168.0.56 server-0:

export DMLC_ROLE=server;
export DMLC_PS_ROOT_URI=192.168.0.56;
export DMLC_PS_ROOT_PORT=9000;
export DMLC_NUM_WORKER=2;
export DMLC_NUM_SERVER=2;
python /usr/local/byteps/launcher/launch.py

192.168.0.7 server-1:

export DMLC_ROLE=server;
export DMLC_PS_ROOT_URI=192.168.0.56;
export DMLC_PS_ROOT_PORT=9000;
export DMLC_NUM_WORKER=2;
export DMLC_NUM_SERVER=2;
python /usr/local/byteps/launcher/launch.py

192.168.0.56 scheduler:

export DMLC_ROLE=scheduler;
export DMLC_PS_ROOT_URI=192.168.0.56;
export DMLC_PS_ROOT_PORT=9000;
export DMLC_NUM_WORKER=2;
export DMLC_NUM_SERVER=2;
python /usr/local/byteps/launcher/launch.py
  1. distributed log:
    worker-0.log
    worker-1.log
    server-0.log
    server-1.log
    scheduler.log

  2. run single-node-training:

python train.py --network r100 --loss arcface --dataset retina --models-root ./models/r100-5x5 --device 0 --per-batch-size 60
  1. single-node-training log:
    single-node-training.log

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Ubuntu 16.04.3 LTS
  • GCC version: 4.9
  • CUDA and NCCL version: 9.0 and d7a58cfa5865c4f627a128c3238cc72502649881
  • Framework (TF, PyTorch, MXNet): MXNet

Additional context

Add any other context about the problem here.

TypeError: bytes or integer address expected instead of str instance

Describe the bug
bps.push_pull on a naive tensor fails.

To Reproduce

In [1]: import tensorflow as tf

In [2]: import byteps.tensorflow as bps
WARNING: Logging before flag parsing goes to stderr.
W0627 11:36:47.010180 139917697820480 deprecation_wrapper.py:119] From /private/home/yuxinwu/.local/lib/python3.6/site-packages/byteps/tensorflow/__init__.py:79: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

W0627 11:36:47.010504 139917697820480 deprecation_wrapper.py:119] From /private/home/yuxinwu/.local/lib/python3.6/site-packages/byteps/tensorflow/__init__.py:111: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.


In [3]: bps.push_pull(tf.constant([0.0]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-fc0e9eb4f2bb> in <module>()
----> 1 bps.push_pull(tf.constant([0.0]))

~/.local/lib/python3.6/site-packages/byteps/tensorflow/__init__.py in push_pull(tensor, scope, average, device_dense, device_sparse, compression)
     50         byteps_size = tf.cast(size(), dtype=tensor.dtype)
     51         tensor_compressed, ctx = compression.compress(tensor)
---> 52         summed_tensor_compressed = _push_pull(tensor_compressed, scope)
     53         summed_tensor = compression.decompress(summed_tensor_compressed, ctx)
     54         new_tensor = (tf.div(summed_tensor, byteps_size)

~/.local/lib/python3.6/site-packages/byteps/tensorflow/ops.py in _push_pull(tensor, scope, name)
     80     if name is None and not _executing_eagerly():
     81         name = 'BytePSPushPull_%s' % _normalize_name(tensor.name)
---> 82     TF_LIB_CTYPES.byteps_tensorflow_declare_tensor(ctypes.c_char_p(scope+name))
     83     return C_LIB.byteps_push_pull(tensor, name=name)
     84 

TypeError: bytes or integer address expected instead of str instance

Environment (please complete the following information):

  • OS: ubuntu 18.04
  • GCC version:
  • CUDA and NCCL version:
  • Framework (TF, PyTorch, MXNet): TF 1.14.0-rc1

Run the distributed training on Kubernetes

After the successful single run on Kubernetes with the workaround, I tried to run the distributed train with 2 workers on Kubernetes. However there is only one worker running, and the another one hangs always. I assigned just 1 device (with 0 as device tag), but the running worker said 2 GPUS benchmarking. The running worker has 2 GPUs, and hanging worker has 1 GPU only.

  1. How did you benchmark? bare-mental or Kubernetes?
  2. Does it work if the worker just has 1 GPU? and is there any requirement on the GPU model?
  3. Is there any Kubernetes operator to setup bytePS?

worse performance than ps

Hi,

I ran experiment according to the guide in https://github.com/bytedance/byteps/blob/master/docs/best-practice.md#multi-machine-distributed-mode. I have 4 machines, each having one ConnectX-3 NIC and two K40c GPUs. I set the NIC speed at 10Gbps and I use TCP only, instead of RoCE. Two machines are workers, and the other two are parameter servers, and the scheduler runs in server-0. Only workers use GPU(1 or 2). I use docker images you provide to run experiment.

For single-GPU training (without distributed mode), the training speed is 54 imgs/sec. But for distributed training, the total training speed is 51.2 imgs/sec and 103.1 imgs/sec for 1-GPU worker and 2-GPU worker, respectively. In contrast, the training with regular PS architecture is 63.08 imgs/sec and 121.51 imgs/sec, respectively. Why is byteps worse than PS? Is there something wrong with my experiment?

The following are specific cmds:
scheduler:

docker pull bytepsimage/byteps_server
docker run -it --net=host bytepsimage/byteps_server bash
export DMLC_NUM_WORKER=2 
export DMLC_ROLE=scheduler
export DMLC_NUM_SERVER=2
export DMLC_PS_ROOT_URI=12.12.11.12
export DMLC_PS_ROOT_PORT=9000
export DMLC_INTERFACE=eth3.10
python /usr/local/byteps/launcher/launch.py

server-0 and server-1:

docker pull bytepsimage/byteps_server
docker run -it --net=host bytepsimage/byteps_server bash
export DMLC_NUM_WORKER=2 
export DMLC_ROLE=server  
export DMLC_NUM_SERVER=2
export DMLC_PS_ROOT_URI=12.12.11.12
export DMLC_PS_ROOT_PORT=9000
export DMLC_INTERFACE=eth3.10
python /usr/local/byteps/launcher/launch.py

worker-0:

docker pull bytepsimage/worker_tensorflow
nvidia-docker run -it --net=host --shm-size=32768m bytepsimage/worker_tensorflow bash
export NVIDIA_VISIBLE_DEVICES=0 (or 0,1 for 2-GPU worker)
export DMLC_WORKER_ID=0 
export DMLC_NUM_WORKER=2
export DMLC_ROLE=worker
export DMLC_NUM_SERVER=2 
export DMLC_PS_ROOT_URI=12.12.11.12 
export DMLC_PS_ROOT_PORT=9000
export DMLC_INTERFACE=eth3.10
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py /usr/local/byteps/example/tensorflow/run_tensorflow_byteps.sh  --model ResNet50 --num-iters 10

worker-1:

docker pull bytepsimage/worker_tensorflow
nvidia-docker run -it --net=host --shm-size=32768m bytepsimage/worker_tensorflow bash
export NVIDIA_VISIBLE_DEVICES=0 
export DMLC_WORKER_ID=1
export DMLC_NUM_WORKER=2
export DMLC_ROLE=worker
export DMLC_NUM_SERVER=2 
export DMLC_PS_ROOT_URI=12.12.11.12 
export DMLC_PS_ROOT_PORT=9000
export DMLC_INTERFACE=eth3.10
export EVAL_TYPE=benchmark
python /usr/local/byteps/launcher/launch.py /usr/local/byteps/example/tensorflow/run_tensorflow_byteps.sh  --model ResNet50 --num-iters 10

and the interconnect topology of GPU and NIC is as follows:
worker-0:

$ nvidia-smi topo -m
        GPU0    GPU1    mlx4_0  CPU Affinity
GPU0     X      PHB     PHB     8-15,24-31
GPU1    PHB      X      PHB     8-15,24-31
mlx4_0  PHB     PHB      X

worker-1:

$ nvidia-smi topo -m
        GPU0    GPU1    mlx5_0  mlx4_0  CPU Affinity
GPU0     X      SYS     PHB     SYS     0-7,16-23
GPU1    SYS      X      SYS     PHB     8-15,24-31
mlx5_0  PHB     SYS      X      SYS
mlx4_0  SYS     PHB     SYS      X

I can't get the results as you provided with 4 rdma machines.

The result I got as followed, I use 2 machines as worker and the other 2 as server. I run the scripts followed Distributed Training with RDMA. I only got 2723 images / s with 16 gpus, your result is almost 5000 images / s. So what's wrong with my training?

machines with rdma (80Gbps)

card image per gpu / s single card speed up ratio caption
1 335.1 1 single machine
2 309.4 0.923306476 single machine
4 310.6 0.926887496 single machine
8 306.6 0.914950761 single machine
16 170.4 0.508 two workers two servers

src/./zmq_van.h:304: Check failed: 0 failed to receive message. errno: 88 Socket operation on non-socket

@ymjiang, I met a similar problem with closed issue 49 in pytorch benchmark. I used 8 GPUs in total with 2 wokers. How can I fix it?

Iter #992: 43.5 img/sec per GPU
Iter #993: 46.1 img/sec per GPU
Iter #994: 41.1 img/sec per GPU
Iter #995: 48.6 img/sec per GPU
Iter #996: 50.1 img/sec per GPU
Iter #997: 49.4 img/sec per GPU
Iter #998: 42.5 img/sec per GPU
Traceback (most recent call last):
File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 132, in
raise Exception
Exception
Traceback (most recent call last):
File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 132, in
raise Exception
Exception
Traceback (most recent call last):
File "/usr/local/byteps/example/pytorch/benchmark_byteps.py", line 132, in
raise Exception
Exception
Iter #999: 41.3 img/sec per GPU
[16:25:36] 3rdparty/ps-lite/include/dmlc/logging.h:276: [16:25:36] src/./zmq_van.h:304: Check failed: 0 failed to receive message. errno: 88 Socket operation on non-socket

Stack trace returned 6 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x2e1e8) [0x7f0cc47b81e8]
[bt] (1) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x2f109) [0x7f0cc47b9109]
[bt] (2) /usr/local/lib/python2.7/dist-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x8c4e8) [0x7f0cc48164e8]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f0cd56edc80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f0cda1f06ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0cd9f2641d]

About GPU utilization and speed

I have 4 GPU working nodes, but each node has only one 1080ti.
I followed A Step-by-Step Tutorial,used tensorflow,tested a single node and distributed with docker。

When I only use a single GPU node, the result is 128.3 img/sec per GPU. Observe that nvidia-smi can see that the Volatile GPU-Util is 99% and the result is normal.
But when I use all 4 nodes (one is scheduler, one is server, 2 is GPU working node), the result is about 17 img/sec per GPU (only shown in worker0), observe nvidia-smi can see To the Volatile GPU-Util in rapid fluctuations, it seems that the GPU is not getting a good use.
How can I solve this problem?
I have more similar nodes, but each has only one 1080ti GPU. If I use 10 or 20 identical nodes, can I achieve better results?
Finally, exporting on multiple nodes is a bit too cumbersome. You can consider writing a script to use ssh to complete the operation of the entire cluster. In my previous work, it played a very good role.
Thank you

distributed training hang

I follow the step-by-step-tutorial to run distributed training with mxnet and tensorflow, both hang.
I have 3 nodes
and on first node I run scheduler and server
and second and third node I run worker0 and worker1
cmd and logs as following

scheduler:

[root@--0002 ~]# docker run -it --net=host bytepsimage/byteps_server bash
root@--0002:~# export DMLC_NUM_WORKER=2 
root@--0002:~# export DMLC_ROLE=scheduler 
root@--0002:~# export DMLC_NUM_SERVER=1 
root@--0002:~# export DMLC_PS_ROOT_URI=10.94.1.77 
root@--0002:~# export DMLC_PS_ROOT_PORT=1234  
root@--0002:~# python /usr/local/byteps/launcher/launch.py
BytePS launching scheduler

server:

[root@--0002 ~]# docker run -it --net=host bytepsimage/byteps_server bash
root@--0002:~# export DMLC_NUM_WORKER=2 
root@--0002:~# export DMLC_ROLE=server  
root@--0002:~# export DMLC_NUM_SERVER=1 
root@--0002:~# export DMLC_PS_ROOT_URI=10.94.1.77
root@--0002:~# export DMLC_PS_ROOT_PORT=1234  
root@--0002:~# 
root@--0002:~# python /usr/local/byteps/launcher/launch.py
BytePS launching server
[[03:13:29] src/customer.cc:03:13:29] 226: Use seperate thread to process pull requests from each worker.
src/kvstore/././kvstore_dist_server.h:166: Enable zero copy of pull operations.
[03:13:29[] src/customer.cc:239: Server uses 1 threads to process push requests.
03:13:29] src/customer.cc:137: Server inits Push Thread-0
[03:13:29] src/customer.cc:93: Server inits Pull Thread-1
[03:13:29] src/customer.cc:93: Server inits Pull Thread-0
[03:13:29] src/customer.cc:258: All threads inited, ready to process message 
[03:13:29] src/./zmq_van.h:285: Start ZMQ recv thread

worker0

[root@0001 ~]# docker run --name worker-0 -it --net=host --shm-size=32768m bytepsimage/worker_mxnet bash
root@0001:~# export NVIDIA_VISIBLE_DEVICES=0,1,2,3  
root@0001:~# export DMLC_WORKER_ID=0 
root@0001:~# export DMLC_NUM_WORKER=2 
root@0001:~# export DMLC_ROLE=worker
root@0001:~# export DMLC_NUM_SERVER=1 
root@0001:~# export DMLC_PS_ROOT_URI=10.94.1.77 
root@0001:~# export DMLC_PS_ROOT_PORT=1234
root@0001:~# 
root@0001:~# export EVAL_TYPE=benchmark 
root@0001:~# python /usr/local/byteps/launcher/launch.py \
>        /usr/local/byteps/example/mxnet/start_mxnet_byteps.sh \
>        --benchmark 1 --batch-size=32  
BytePS launching worker
[03:14:57] src/customer.cc:363: Do not use thread pool for receiving.
[03:14:57] src/./zmq_van.h:285: Start ZMQ recv thread
[03:15:12] src/./zmq_van.h:285: Start ZMQ recv thread
[03:15:12] src/./zmq_van.h:285: Start ZMQ recv thread

worker1

[root@0002 ~]# docker run --name worker-1 -it --net=host --shm-size=32768m train.registry.docker.com:5000/bytepsimage/worker_mxnet bash
root@0002:~# export NVIDIA_VISIBLE_DEVICES=0,1,2,3 
root@0002:~# export DMLC_WORKER_ID=1
root@0002:~# export DMLC_NUM_WORKER=2 
root@0002:~# export DMLC_ROLE=worker 
root@0002:~# export DMLC_NUM_SERVER=1 
root@0002:~# export DMLC_PS_ROOT_URI=10.94.1.77
root@0002:~# export DMLC_PS_ROOT_PORT=1234
root@0002:~# 
root@0002:~# export EVAL_TYPE=benchmark 
root@0002:~# python /usr/local/byteps/launcher/launch.py \
>        /usr/local/byteps/example/mxnet/start_mxnet_byteps.sh \
>        --benchmark 1 --batch-size=32 
BytePS launching worker
[03:15:17] src/customer.cc:363: Do not use thread pool for receiving.
[03:15:17] src/./zmq_van.h:285: Start ZMQ recv thread
[03:15:17] src/./zmq_van.h:285: Start ZMQ recv thread

In addition
after I input ctrl+c to abort the process. and ps -ef |grep python shows that the process still there?

[03:15:17] src/./zmq_van.h:285: Start ZMQ recv thread
^C[[[2019-07-03 03:24:17.2019-07-03 03:24:172019-07-03 03:24:17579377..: 579377579381F: :  FFbyteps/common/communicator.cc  :byteps/common/communicator.cc206byteps/common/communicator.cc:] :206Check failed: (rc) >= (0) Interrupted system call, rank=0206] 
] Check failed: (rc) >= (0) Interrupted system call, rank=2Check failed: (rc) >= (0) Interrupted system call, rank=1

^C^C^C^C^C^C^C^C/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh: line 5:    32 Aborted                 (core dumped) python $path/train_imagenet_byteps.py $@
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh --benchmark 1 --batch-size=32' returned non-zero exit status -2

/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh: line 5:    26 Aborted                 (core dumped) python $path/train_imagenet_byteps.py $@
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/mxnet/start_mxnet_byteps.sh --benchmark 1 --batch-size=32' returned non-zero exit status -2

Traceback (most recent call last):
  File "/usr/local/byteps/launcher/launch.py", line 38, in <module>
    t[i].join() 
  File "/usr/lib/python2.7/threading.py", line 940, in join
    self.__block.wait()
  File "/usr/lib/python2.7/threading.py", line 340, in wait
    waiter.acquire()
KeyboardInterrupt
root@bms-bf19-0002:~# /usr/local/byteps/example/mxnet/start_mxnet_byteps.sh: line 5:    28 Aborted                 (core dumped) python $path/train_imagenet_byteps.py $@
^C
root@0002:~# ^C
root@0002:~# ps -ef | grep python
root         31     25 99 03:15 pts/0    00:30:46 python /usr/local/byteps/example/mxnet/train_imagenet_byteps.py --benchmark 1 --batch-size=32
root        768      1  0 03:25 pts/0    00:00:00 grep --color=auto python

and the server and scheduler didn't abort when input ctrl+c

BytePS launching server
[[03:13:29] src/customer.cc:03:13:29] 226: Use seperate thread to process pull requests from each worker.
src/kvstore/././kvstore_dist_server.h:166: Enable zero copy of pull operations.
[03:13:29[] src/customer.cc:239: Server uses 1 threads to process push requests.
03:13:29] src/customer.cc:137: Server inits Push Thread-0
[03:13:29] src/customer.cc:93: Server inits Pull Thread-1
[03:13:29] src/customer.cc:93: Server inits Pull Thread-0
[03:13:29] src/customer.cc:258: All threads inited, ready to process message 
[03:13:29] src/./zmq_van.h:285: Start ZMQ recv thread
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

install error:fatal error: numa.h: No such file or directory

Describe the bug
I meet this problem when I install the byteps.
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
byteps/common/global.cc:19:18: fatal error: numa.h: No such file or directory
compilation terminated.
INFO: Unable to build TensorFlow plugin, will skip it.

Traceback (most recent call last):
File "lib/python3.7/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "lib/python3.7/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "lib/python3.7/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

To Reproduce
python setup.py install

Environment (please complete the following information):

  • OS:ubuntu16.04
  • GCC version:gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0
  • CUDA and NCCL version:10.0
  • Framework (TF, PyTorch, MXNet):TF

MXNet 1.5 and CUDA 10.1 support

Apache MXNet 1.5 vote is passed, will byteps add a mxnet 1.5.0 pip build, for cuda 10.1? The new version includes new operators added for BERT training.

Did this support FP16?

The mixed precision can speed up training, I can not find the fp16 information in you project. So did you support fp16?

Broadcast op cannot be created inside name scope

Describe the bug
I can run the example synthetic_benchmark.py successfully after fixing some previously reported bugs. However, after making this change:

-bcast_op = bps.broadcast_global_variables(0)
+with tf.name_scope("test"):
+    bcast_op = bps.broadcast_global_variables(0)

And run it again:

python3 byteps/launcher/launch.py python3 byteps/example/tensorflow/synthetic_benchmark.py

It produces the error:

2019-06-27 12:19:26.837903: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
[2019-06-27 12:19:27.271525: F byteps/common/global.cc:265] Check failed: _name_to_cxt.find(name) != _name_to_cxt.end() test/BytePSBroadcast_bn4b_branch2a_gamma_0 is not initialized

It looks like when the broadcast op is created inside a name scope, the tensor is declared in byteps without the scope, but is then looked up with the scope, causing a mismatch.

distributed benchmark has problems

I has three nodes:
first node I run: scheduler and server mode
second node I run: worker0
third node I run: worker1
The problem is that woker nodes has been hanged
It shows in first node: BytePS launching scheduler BytePS launching server
second node:
image
third node has same problem:
image
So please tell me how to solve this problem, thanks

Best regards

AttributeError: module 'byteps.tensorflow' has no attribute 'allreduce'

Describe the bug
It was claimed in the README that:

If your tasks only rely on Horovod's allreduce and broadcast, you should be able to switch to BytePS in 1 minute. Simply replace import horovod.tensorflow as hvd by import byteps.tensorflow as bps, and then replace all hvd in your code by bps.

However, bps.allreduce does not exist.

To Reproduce
Steps to reproduce the behavior:

  1. import byteps.tensorflow as bps; bps.allreduce

Error:

AttributeError: module 'byteps.tensorflow' has no attribute 'allreduce'

PyTorch example failed

Describe the bug
By following the instruction in step-by-step-tutorials.md, I failed to run the example.

To Reproduce
Steps to reproduce the behavior:

docker pull bytepsimage/worker_pytorch

nvidia-docker run --shm-size=32768m -it bytepsimage/worker_pytorch bash

# now you are in docker environment
export NVIDIA_VISIBLE_DEVICES=0,1,2,3  # say you have 4 GPUs 
export DMLC_WORKER_ID=0 # your worker id
export DMLC_NUM_WORKER=1 # you only have one worker
export DMLC_ROLE=worker # your role is worker

# the following value does not matter for non-distributed jobs 
export DMLC_NUM_SERVER=1 
export DMLC_PS_ROOT_URI=10.0.0.1 
export DMLC_PS_ROOT_PORT=1234 

export EVAL_TYPE=benchmark 
python /usr/local/byteps/launcher/launch.py \
       /usr/local/byteps/example/pytorch/start_pytorch_byteps.sh \
       --model resnet50 --num-iters 1000      

The error messages are attached below

root@265e564096d1:~# python /usr/local/byteps/launcher/launch.py \
>        /usr/local/byteps/example/pytorch/start_pytorch_byteps.sh \
>        --model resnet50 --num-iters 1000
BytePS launching worker
running benchmark...
running benchmark...
running benchmark...
running benchmark...
[2019-06-27 17:46:54.407767: F byteps/common/global.cc:101] Check failed: getenv("DMLC_NUM_SERVER") error: env DMLC_NUM_SERVER not set
[2019-06-27 17:46:54.428154: F byteps/common/global.cc:101] Check failed: getenv("DMLC_NUM_SERVER") error: env DMLC_NUM_SERVER not set
[2019-06-27 17:46:54.437652: F byteps/common/global.cc:101] Check failed: getenv("DMLC_NUM_SERVER") error: env DMLC_NUM_SERVER not set
[2019-06-27 17:46:54.453323: F byteps/common/global.cc:101] Check failed: getenv("DMLC_NUM_SERVER") error: env DMLC_NUM_SERVER not set
/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh: line 20:   220 Aborted                 (core dumped) python $path/benchmark_byteps.py $@
/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh: line 20:   218 Aborted                 (core dumped) python $path/benchmark_byteps.py $@
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 18, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh --model resnet50 --num-iters 1000' returned non-zero exit status 134
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 18, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh --model resnet50 --num-iters 1000' returned non-zero exit status 134


/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh: line 20:   216 Aborted                 (core dumped) python $path/benchmark_byteps.py $@
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 18, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh --model resnet50 --num-iters 1000' returned non-zero exit status 134

/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh: line 20:   219 Aborted                 (core dumped) python $path/benchmark_byteps.py $@
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/byteps/launcher/launch.py", line 18, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '/usr/local/byteps/example/pytorch/start_pytorch_byteps.sh --model resnet50 --num-iters 1000' returned non-zero exit status 134

Environment (please complete the following information):

  • Docker version 18.09.1, build 4c52b90
  • 8 1080Ti GPUs.

Error in `python': munmap_chunk(): invalid pointer

Describe the bug
Hi, I was following the Step-by-Step tutorial and try to build from the source code.

The single machine training (by 2 local GPUs) is work, but when I start to try run distributed training by only change
DMLC_NUM_WORKER=1
to
DMLC_NUM_WORKER=2

These 3 frameworks are failed.

MXNet

(base) laochanlam@ubuntu-X299-UD4-Pro:~/git/byteps_org$ BYTEPS_LOG_LEVEL=TRACE DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 DMLC_WORKER_ID=0 DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 python launcher/launch.py example/mxnet/start_mxnet_byteps.sh
BytePS launching worker
[2019-07-18 14:49:39. 48896: D byteps/common/communicator.cc:63] Using Communicator=Socket
[2019-07-18 14:49:39. 48991: D byteps/common/communicator.cc:151] Init socket at /tmp/socket_send_0
[2019-07-18 14:49:39. 49012: D byteps/common/communicator.cc:151] Init socket at /tmp/socket_recv_0
[2019-07-18 14:49:39. 49045: D byteps/common/communicator.cc:121] This is ROOT device, rank=0, all sockets create successfully
[2019-07-18 14:49:39. 49053: D byteps/common/global.cc:96] Partition bound set to 4096000 bytes, aligned to 4096000 bytes
[2019-07-18 14:49:39. 49059: D byteps/common/global.cc:116] Number of worker=2, launching distributed job
[14:49:39] src/./zmq_van.h:61: BYTEPS_ZMQ_MAX_SOCKET set to 1024
[14:49:39] src/./zmq_van.h:66: BYTEPS_ZMQ_NTHREADS set to 4
[14:49:39] src/customer.cc[2019-07-18 14:49:39. 49224: D byteps/common/communicator.cc:158:] 363Listening on socket 0
: Do not use thread pool for receiving.
[14:49:39] src/van.cc:357: Bind to role=worker, ip=101.6.101.94, port=53083, is_recovery=0
[[14:49:39] src/van.cc:446: ? => 1. Meta: request=0, timestamp=0, control={ cmd=ADD_NODE, node={ role=worker, ip=101.6.101.94, port=53083, is_recovery=0 } }. THIS IS NOT DATA MSG!
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007ffd04d1c308 ***
14:49:39] src/./zmq_van.h:286: Start ZMQ recv thread
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb747dfa7e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7fb747e07698]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so(+0x30659)[0x7fb6a741c659]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so(+0x7f1b2)[0x7fb6a746b1b2]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so(+0x8c2e6)[0x7fb6a74782e6]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so(+0x7a04c)[0x7fb6a746604c]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so(_ZN6byteps6common12BytePSGlobal4InitEv+0x556)[0x7fb6a742f896]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so(byteps_init+0x1f)[0x7fb6a74179df]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)[0x7fb7471c4ec0]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d)[0x7fb7471c487d]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x4de)[0x7fb747cb399e]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(+0x9b61)[0x7fb747ca9b61]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7fb748b2fb73]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3bb9)[0x7fb748bc6119]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x7fee)[0x7fb748bca54e]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7e9)[0x7fb748bcba99]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCode+0x1a)[0x7fb748bcbcba]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(+0x10101d)[0x7fb748be501d]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x78)[0x7fb748be61c8]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe8)[0x7fb748be73e8]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(Py_Main+0xbac)[0x7fb748bf967c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fb747da3830]
python(+0x107f)[0x556d9050607f]
======= Memory map: ========
556d90505000-556d90506000 r--p 00000000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
556d90506000-556d90507000 r-xp 00001000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
556d90507000-556d90508000 r--p 00002000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
556d90508000-556d90509000 r--p 00002000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
556d90509000-556d9050a000 rw-p 00003000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
556d91d3f000-556d93be6000 rw-p 00000000 00:00 0                          [heap]
7fb690000000-7fb690021000 rw-p 00000000 00:00 0 
7fb690021000-7fb694000000 ---p 00000000 00:00 0 
7fb698000000-7fb698021000 rw-p 00000000 00:00 0 
7fb698021000-7fb69c000000 ---p 00000000 00:00 0 
7fb69c000000-7fb69c021000 rw-p 00000000 00:00 0 
7fb69c021000-7fb6a0000000 ---p 00000000 00:00 0 
7fb6a0000000-7fb6a0021000 rw-p 00000000 00:00 0 
7fb6a0021000-7fb6a4000000 ---p 00000000 00:00 0 
7fb6a6beb000-7fb6a6bec000 ---p 00000000 00:00 0 
7fb6a6bec000-7fb6a73ec000 rwxp 00000000 00:00 0 
7fb6a73ec000-7fb6a7410000 r--p 00000000 08:01 151128831                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so
7fb6a7410000-7fb6a7567000 r-xp 00024000 08:01 151128831                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so
7fb6a7567000-7fb6abff3000 r--p 0017b000 08:01 151128831                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so
7fb6abff3000-7fb6abff4000 ---p 04c07000 08:01 151128831                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so
7fb6abff4000-7fb6abffb000 r--p 04c07000 08:01 151128831                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so
7fb6abffb000-7fb6abffd000 rw-p 04c0e000 08:01 151128831                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/mxnet/c_lib.so
7fb6abffd000-7fb6ac000000 rw-p 00000000 00:00 0 
7fb6ac000000-7fb6ac02b000 rw-p 00000000 00:00 0 
7fb6ac02b000-7fb6b0000000 ---p 00000000 00:00 0 
7fb6b0000000-7fb6b0021000 rw-p 00000000 00:00 0 
7fb6b0021000-7fb6b4000000 ---p 00000000 00:00 0 
7fb6b4000000-7fb6b402b000 rw-p 00000000 00:00 0 
7fb6b402b000-7fb6b8000000 ---p 00000000 00:00 0 
7fb6b87f9000-7fb6b87fa000 ---p 00000000 00:00 0 
7fb6b87fa000-7fb6b8ffa000 rwxp 00000000 00:00 0 
7fb6b8ffa000-7fb6b8ffb000 ---p 00000000 00:00 0 
7fb6b8ffb000-7fb6b97fb000 rwxp 00000000 00:00 0 
7fb6b97fb000-7fb6b97fc000 ---p 00000000 00:00 0 
7fb6b97fc000-7fb6b9ffc000 rwxp 00000000 00:00 0 
7fb6b9ffc000-7fb6b9ffd000 ---p 00000000 00:00 0 
7fb6b9ffd000-7fb6ba7fd000 rwxp 00000000 00:00 0 
7fb6ba7fd000-7fb6ba7fe000 ---p 00000000 00:00 0 
7fb6ba7fe000-7fb6baffe000 rwxp 00000000 00:00 0 
7fb6baffe000-7fb6bafff000 ---p 00000000 00:00 0 
7fb6bafff000-7fb6bb7ff000 rwxp 00000000 00:00 0 
7fb6bb7ff000-7fb6bb800000 ---p 00000000 00:00 0 
7fb6bb800000-7fb6bc000000 rwxp 00000000 00:00 0 
7fb6bc000000-7fb6bc02b000 rw-p 00000000 00:00 0 
7fb6bc02b000-7fb6c0000000 ---p 00000000 00:00 0 
7fb6c002e000-7fb6c002f000 ---p 00000000 00:00 0 
7fb6c002f000-7fb6c082f000 rwxp 00000000 00:00 0 
7fb6c082f000-7fb6c08a4000 r-xp 00000000 103:03 53744467                  /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130
7fb6c08a4000-7fb6c0aa4000 ---p 00075000 103:03 53744467                  /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130
7fb6c0aa4000-7fb6c0aa8000 rw-p 00075000 103:03 53744467                  /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130
7fb6c0aa8000-7fb6c0aa9000 rw-p 00000000 00:00 0 
7fb6c0aa9000-7fb6c0ab3000 r-xp 00000000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7fb6c0ab3000-7fb6c0cb2000 ---p 0000a000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7fb6c0cb2000-7fb6c0cb3000 r--p 00009000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7fb6c0cb3000-7fb6c0cb4000 rw-p 0000a000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7fb6c0cdb000-7fb6c0fdb000 rw-p 00000000 00:00 0 
7fb6c0fdb000-7fb6c0ffa000 r--p 00000000 08:01 151001721                  /home/laochanlam/anaconda2/lib/libssl.so.1.1
7fb6c0ffa000-7fb6c104b000 r-xp 0001f000 08:01 151001721                  /home/laochanlam/anaconda2/lib/libssl.so.1.1
7fb6c104b000-7fb6c1064000 r--p 00070000 08:01 151001721                  /home/laochanlam/anaconda2/lib/libssl.so.1.1
7fb6c1064000-7fb6c106d000 r--p 00088000 08:01 151001721                  /home/laochanlam/anaconda2/lib/libssl.so.1.1
7fb6c106d000-7fb6c1071000 rw-p 00091000 08:01 151001721                  /home/laochanlam/anaconda2/lib/libssl.so.1.1
7fb6c1071000-7fb6c109c000 r--p 00000000 08:01 152831643                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_openssl.so
7fb6c109c000-7fb6c10f1000 r-xp 0002b000 08:01 152831643                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_openssl.so
7fb6c10f1000-7fb6c1110000 r--p 00080000 08:01 152831643                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_openssl.so
7fb6c1110000-7fb6c1111000 ---p 0009f000 08:01 152831643                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_openssl.so
7fb6c1111000-7fb6c1121000 r--p 0009f000 08:01 152831643                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_openssl.so
7fb6c1121000-7fb6c1128000 rw-p 000af000 08:01 152831643                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_openssl.so
7fb6c1128000-7fb6c1168000 rw-p 00000000 00:00 0 
7fb6c1168000-7fb6c1170000 r--p 00000000 08:01 152570654                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/_cffi_backend.so
7fb6c1170000-7fb6c1185000 r-xp 00008000 08:01 152570654                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/_cffi_backend.so
7fb6c1185000-7fb6c118d000 r--p 0001d000 08:01 152570654                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/_cffi_backend.so
7fb6c118d000-7fb6c118e000 ---p 00025000 08:01 152570654                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/_cffi_backend.so
7fb6c118e000-7fb6c118f000 r--p 00025000 08:01 152570654                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/_cffi_backend.so
7fb6c118f000-7fb6c1194000 rw-p 00026000 08:01 152570654                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/_cffi_backend.so
7fb6c1194000-7fb6c1197000 rw-p 00000000 00:00 0 
7fb6c1197000-7fb6c1198000 r--p 00000000 08:01 152831641                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_constant_time.so
7fb6c1198000-7fb6c1199000 r-xp 00001000 08:01 152831641                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_constant_time.so
7fb6c1199000-7fb6c119a000 r--p 00002000 08:01 152831641                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_constant_time.so
7fb6c119a000-7fb6c119b000 r--p 00002000 08:01 152831641                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_constant_time.so
7fb6c119b000-7fb6c119c000 rw-p 00003000 08:01 152831641                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/bindings/_constant_time.so
7fb6c119c000-7fb6c121c000 rw-p 00000000 00:00 0 
7fb6c121c000-7fb6c1296000 r--p 00000000 08:01 150875217                  /home/laochanlam/anaconda2/lib/libcrypto.so.1.1
7fb6c1296000-7fb6c145c000 r-xp 0007a000 08:01 150875217                  /home/laochanlam/anaconda2/lib/libcrypto.so.1.1
7fb6c145c000-7fb6c14e4000 r--p 00240000 08:01 150875217                  /home/laochanlam/anaconda2/lib/libcrypto.so.1.1
7fb6c14e4000-7fb6c150f000 r--p 002c7000 08:01 150875217                  /home/laochanlam/anaconda2/lib/libcrypto.so.1.1
7fb6c150f000-7fb6c1511000 rw-p 002f2000 08:01 150875217                  /home/laochanlam/anaconda2/lib/libcrypto.so.1.1
7fb6c1511000-7fb6c1514000 rw-p 00000000 00:00 0 
7fb6c1514000-7fb6c1517000 r--p 00000000 08:01 150867019                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/unicodedata.so
7fb6c1517000-7fb6c151a000 r-xp 00003000 08:01 150867019                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/unicodedata.so
7fb6c151a000-7fb6c15ac000 r--p 00006000 08:01 150867019                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/unicodedata.so
7fb6c15ac000-7fb6c15ad000 r--p 00097000 08:01 150867019                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/unicodedata.so
7fb6c15ad000-7fb6c15c0000 rw-p 00098000 08:01 150867019                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/unicodedata.so
7fb6c15c0000-7fb6c1840000 rw-p 00000000 00:00 0 
7fb6c1840000-7fb6c1841000 ---p 00000000 00:00 0 
7fb6c1841000-7fb6c2041000 rwxp 00000000 00:00 0 
7fb6c2041000-7fb6c2081000 rw-p 00000000 00:00 0 
7fb6c2081000-7fb6c2083000 r--p 00000000 08:01 153752756                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/mvn.so
7fb6c2083000-7fb6c208f000 r-xp 00002000 08:01 153752756                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/mvn.so
7fb6c208f000-7fb6c2094000 r--p 0000e000 08:01 153752756                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/mvn.so
7fb6c2094000-7fb6c2095000 ---p 00013000 08:01 153752756                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/mvn.so
7fb6c2095000-7fb6c2096000 r--p 00013000 08:01 153752756                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/mvn.so
7fb6c2096000-7fb6c2098000 rw-p 00014000 08:01 153752756                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/mvn.so
7fb6c2098000-7fb6c21cf000 rw-p 00000000 00:00 0 
7fb6c21cf000-7fb6c21d1000 r--p 00000000 08:01 153752749                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/statlib.so
7fb6c21d1000-7fb6c21d8000 r-xp 00002000 08:01 153752749                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/statlib.so
7fb6c21d8000-7fb6c21da000 r--p 00009000 08:01 153752749                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/statlib.so
7fb6c21da000-7fb6c21db000 r--p 0000a000 08:01 153752749                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/statlib.so
7fb6c21db000-7fb6c21dc000 rw-p 0000b000 08:01 153752749                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/statlib.so
7fb6c21dc000-7fb6c235c000 rw-p 00000000 00:00 0 
7fb6c235c000-7fb6c2364000 r--p 00000000 08:01 153752770                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/_stats.so
7fb6c2364000-7fb6c23af000 r-xp 00008000 08:01 153752770                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/_stats.so
7fb6c23af000-7fb6c23b9000 r--p 00053000 08:01 153752770                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/_stats.so
7fb6c23b9000-7fb6c23ba000 r--p 0005c000 08:01 153752770                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/_stats.so
7fb6c23ba000-7fb6c23bf000 rw-p 0005d000 08:01 153752770                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/stats/_stats.so
7fb6c23bf000-7fb6c243f000 rw-p 00000000 00:00 0 
7fb6c243f000-7fb6c2442000 r--p 00000000 08:01 153878944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/lsoda.so
7fb6c2442000-7fb6c2455000 r-xp 00003000 08:01 153878944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/lsoda.so
7fb6c2455000-7fb6c2459000 r--p 00016000 08:01 153878944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/lsoda.so
7fb6c2459000-7fb6c245a000 r--p 00019000 08:01 153878944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/lsoda.so
7fb6c245a000-7fb6c245b000 rw-p 0001a000 08:01 153878944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/lsoda.so
7fb6c245b000-7fb6c245c000 rw-p 00000000 00:00 0 
7fb6c245c000-7fb6c245f000 r--p 00000000 08:01 153878940                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_dop.so
7fb6c245f000-7fb6c2473000 r-xp 00003000 08:01 153878940                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_dop.so
7fb6c2473000-7fb6c2477000 r--p 00017000 08:01 153878940                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_dop.so
7fb6c2477000-7fb6c2478000 r--p 0001a000 08:01 153878940                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_dop.so
7fb6c2478000-7fb6c247a000 rw-p 0001b000 08:01 153878940                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_dop.so
7fb6c247a000-7fb6c247d000 r--p 00000000 08:01 153878959                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/vode.so
7fb6c247d000-7fb6c24a5000 r-xp 00003000 08:01 153878959                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/vode.so
7fb6c24a5000-7fb6c24aa000 r--p 0002b000 08:01 153878959                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/vode.so
7fb6c24aa000-7fb6c24ab000 ---p 00030000 08:01 153878959                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/vode.so
7fb6c24ab000-7fb6c24ac000 r--p 00030000 08:01 153878959                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/vode.so
7fb6c24ac000-7fb6c24ae000 rw-p 00031000 08:01 153878959                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/vode.so
7fb6c24ae000-7fb6c24ee000 rw-p 00000000 00:00 0 
7fb6c24ee000-7fb6c24f0000 r--p 00000000 08:01 153878952                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_quadpack.so
7fb6c24f0000-7fb6c2507000 r-xp 00002000 08:01 153878952                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_quadpack.so
7fb6c2507000-7fb6c250a000 r--p 00019000 08:01 153878952                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_quadpack.so
7fb6c250a000-7fb6c250b000 r--p 0001b000 08:01 153878952                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_quadpack.so
7fb6c250b000-7fb6c250c000 rw-p 0001c000 08:01 153878952                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_quadpack.so
7fb6c250c000-7fb6c250e000 r--p 00000000 08:01 153878939                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_odepack.so
7fb6c250e000-7fb6c251f000 r-xp 00002000 08:01 153878939                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_odepack.so
7fb6c251f000-7fb6c2522000 r--p 00013000 08:01 153878939                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_odepack.so
7fb6c2522000-7fb6c2523000 r--p 00015000 08:01 153878939                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_odepack.so
7fb6c2523000-7fb6c2524000 rw-p 00016000 08:01 153878939                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/integrate/_odepack.so
7fb6c2524000-7fb6c2564000 rw-p 00000000 00:00 0 
7fb6c2564000-7fb6c2566000 r--p 00000000 08:01 153878931                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_nnls.so
7fb6c2566000-7fb6c256c000 r-xp 00002000 08:01 153878931                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_nnls.so
7fb6c256c000-7fb6c256e000 r--p 00008000 08:01 153878931                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_nnls.so
7fb6c256e000-7fb6c256f000 r--p 00009000 08:01 153878931                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_nnls.so
7fb6c256f000-7fb6c2570000 rw-p 0000a000 08:01 153878931                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_nnls.so
7fb6c2570000-7fb6c2571000 r--p 00000000 08:01 153878927                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_zeros.so
7fb6c2571000-7fb6c2573000 r-xp 00001000 08:01 153878927                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_zeros.so
7fb6c2573000-7fb6c2574000 r--p 00003000 08:01 153878927                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_zeros.so
7fb6c2574000-7fb6c2575000 r--p 00003000 08:01 153878927                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_zeros.so
7fb6c2575000-7fb6c2576000 rw-p 00004000 08:01 153878927                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_zeros.so
7fb6c2576000-7fb6c25b6000 rw-p 00000000 00:00 0 
7fb6c25b6000-7fb6c25bc000 r--p 00000000 08:01 153878943                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lsq/givens_elimination.so
7fb6c25bc000-7fb6c25d6000 r-xp 00006000 08:01 153878943                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lsq/givens_elimination.so
7fb6c25d6000-7fb6c25dc000 r--p 00020000 08:01 153878943                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lsq/givens_elimination.so
7fb6c25dc000-7fb6c25dd000 r--p 00025000 08:01 153878943                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lsq/givens_elimination.so
7fb6c25dd000-7fb6c25e0000 rw-p 00026000 08:01 153878943                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lsq/givens_elimination.so
7fb6c25e0000-7fb6c25e2000 r--p 00000000 08:01 153878950                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_minpack.so
7fb6c25e2000-7fb6c25fe000 r-xp 00002000 08:01 153878950                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_minpack.so
7fb6c25fe000-7fb6c2600000 r--p 0001e000 08:01 153878950                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_minpack.so
7fb6c2600000-7fb6c2601000 r--p 0001f000 08:01 153878950                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_minpack.so
7fb6c2601000-7fb6c2602000 rw-p 00020000 08:01 153878950                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_minpack.so
7fb6c2602000-7fb6c2604000 r--p 00000000 08:01 153878942                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_slsqp.so
7fb6c2604000-7fb6c2616000 r-xp 00002000 08:01 153878942                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_slsqp.so
7fb6c2616000-7fb6c2619000 r--p 00014000 08:01 153878942                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_slsqp.so
7fb6c2619000-7fb6c261a000 r--p 00016000 08:01 153878942                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_slsqp.so
7fb6c261a000-7fb6c261b000 rw-p 00017000 08:01 153878942                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_slsqp.so
7fb6c261b000-7fb6c261d000 r--p 00000000 08:01 153878945                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_cobyla.so
7fb6c261d000-7fb6c2636000 r-xp 00002000 08:01 153878945                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_cobyla.so
7fb6c2636000-7fb6c2638000 r--p 0001b000 08:01 153878945                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_cobyla.so
7fb6c2638000-7fb6c2639000 ---p 0001d000 08:01 153878945                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_cobyla.so
7fb6c2639000-7fb6c263a000 r--p 0001d000 08:01 153878945                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_cobyla.so
7fb6c263a000-7fb6c263b000 rw-p 0001e000 08:01 153878945                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_cobyla.so
7fb6c263b000-7fb6c263c000 r--p 00000000 08:01 153878929                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so
7fb6c263c000-7fb6c2643000 r-xp 00001000 08:01 153878929                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so
7fb6c2643000-7fb6c2644000 r--p 00008000 08:01 153878929                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so
7fb6c2644000-7fb6c2645000 r--p 00008000 08:01 153878929                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so
7fb6c2645000-7fb6c2646000 rw-p 00009000 08:01 153878929                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so
7fb6c2646000-7fb6c2648000 r--p 00000000 08:01 153878951                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lbfgsb.so
7fb6c2648000-7fb6c2661000 r-xp 00002000 08:01 153878951                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lbfgsb.so
7fb6c2661000-7fb6c2665000 r--p 0001b000 08:01 153878951                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lbfgsb.so
7fb6c2665000-7fb6c2666000 r--p 0001e000 08:01 153878951                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lbfgsb.so
7fb6c2666000-7fb6c2667000 rw-p 0001f000 08:01 153878951                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_lbfgsb.so
7fb6c2667000-7fb6c26a7000 rw-p 00000000 00:00 0 
7fb6c26a7000-7fb6c26ad000 r--p 00000000 08:01 153878948                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_group_columns.so
7fb6c26ad000-7fb6c26cd000 r-xp 00006000 08:01 153878948                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_group_columns.so
7fb6c26cd000-7fb6c26d3000 r--p 00026000 08:01 153878948                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_group_columns.so
7fb6c26d3000-7fb6c26d4000 ---p 0002c000 08:01 153878948                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_group_columns.so
7fb6c26d4000-7fb6c26d5000 r--p 0002c000 08:01 153878948                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_group_columns.so
7fb6c26d5000-7fb6c26d8000 rw-p 0002d000 08:01 153878948                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_group_columns.so
7fb6c26d8000-7fb6c26e0000 r--p 00000000 08:01 153878957                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_trlib/_trlib.so
7fb6c26e0000-7fb6c2718000 r-xp 00008000 08:01 153878957                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_trlib/_trlib.so
7fb6c2718000-7fb6c2721000 r--p 00040000 08:01 153878957                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_trlib/_trlib.so
7fb6c2721000-7fb6c2722000 ---p 00049000 08:01 153878957                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_trlib/_trlib.so
7fb6c2722000-7fb6c2723000 r--p 00049000 08:01 153878957                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_trlib/_trlib.so
7fb6c2723000-7fb6c2727000 rw-p 0004a000 08:01 153878957                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/_trlib/_trlib.so
7fb6c2727000-7fb6c2729000 r--p 00000000 08:01 153878934                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack2.so
7fb6c2729000-7fb6c2730000 r-xp 00002000 08:01 153878934                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack2.so
7fb6c2730000-7fb6c2732000 r--p 00009000 08:01 153878934                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack2.so
7fb6c2732000-7fb6c2733000 r--p 0000a000 08:01 153878934                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack2.so
7fb6c2733000-7fb6c2734000 rw-p 0000b000 08:01 153878934                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/optimize/minpack2.so
7fb6c2734000-7fb6c2834000 rw-p 00000000 00:00 0 
7fb6c2834000-7fb6c2859000 r-xp 00000000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7fb6c2859000-7fb6c2a58000 ---p 00025000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7fb6c2a58000-7fb6c2a59000 r--p 00024000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7fb6c2a59000-7fb6c2a5a000 rw-p 00025000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7fb6c2a5a000-7fb6c2a64000 r--p 00000000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7
7fb6c2a64000-7fb6c2af0000 r-xp 0000a000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7
7fb6c2af0000-7fb6c2afc000 r--p 00096000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7
7fb6c2afc000-7fb6c2afd000 ---p 000a2000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7
7fb6c2afd000-7fb6c2afe000 r--p 000a2000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7
7fb6c2afe000-7fb6c2aff000 rw-p 000a3000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7
7fb6c2aff000-7fb6c2b09000 r--p 00000000 08:01 151913812                  /home/laochanlam/anaconda2/lib/libtiff.so.5.4.0
7fb6c2b09000-7fb6c2b4c000 r-xp 0000a000 08:01 151913812                  /home/laochanlam/anaconda2/lib/libtiff.so.5.4.0
7fb6c2b4c000-7fb6c2b77000 r--p 0004d000 08:01 151913812                  /home/laochanlam/anaconda2/lib/libtiff.so.5.4.0
7fb6c2b77000-7fb6c2b78000 ---p 00078000 08:01 151913812                  /home/laochanlam/anaconda2/lib/libtiff.so.5.4.0
7fb6c2b78000-7fb6c2b7c000 r--p 00078000 08:01 151913812                  /home/laochanlam/anaconda2/lib/libtiff.so.5.4.0
7fb6c2b7c000-7fb6c2b7d000 rw-p 0007c000 08:01 151913812                  /home/laochanlam/anaconda2/lib/libtiff.so.5.4.0
7fb6c2b7d000-7fb6c2bb8000 r-xp 00000000 08:01 150874453                  /home/laochanlam/anaconda2/lib/libjpeg.so.9.2.0
7fb6c2bb8000-7fb6c2db7000 ---p 0003b000 08:01 150874453                  /home/laochanlam/anaconda2/lib/libjpeg.so.9.2.0
7fb6c2db7000-7fb6c2db8000 r--p 0003a000 08:01 150874453                  /home/laochanlam/anaconda2/lib/libjpeg.so.9.2.0
7fb6c2db8000-7fb6c2db9000 rw-p 0003b000 08:01 150874453                  /home/laochanlam/anaconda2/lib/libjpeg.so.9.2.0
7fb6c2db9000-7fb6c2dcb000 r--p 00000000 08:01 152700944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/PIL/_imaging.so
7fb6c2dcb000-7fb6c2e17000 r-xp 00012000 08:01 152700944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/PIL/_imaging.so
7fb6c2e17000-7fb6c2e25000 r--p 0005e000 08:01 152700944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/PIL/_imaging.so
7fb6c2e25000-7fb6c2e29000 r--p 0006b000 08:01 152700944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/PIL/_imaging.so
7fb6c2e29000-7fb6c2e2c000 rw-p 0006f000 08:01 152700944                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/PIL/_imaging.so
7fb6c2e2c000-7fb6c2ead000 rw-p 00000000 00:00 0 
7fb6c2ead000-7fb6c2eb3000 r--p 00000000 08:01 153752760                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_hausdorff.so
7fb6c2eb3000-7fb6c2ed2000 r-xp 00006000 08:01 153752760                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_hausdorff.so
7fb6c2ed2000-7fb6c2ed8000 r--p 00025000 08:01 153752760                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_hausdorff.so
7fb6c2ed8000-7fb6c2ed9000 ---p 0002b000 08:01 153752760                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_hausdorff.so
7fb6c2ed9000-7fb6c2eda000 r--p 0002b000 08:01 153752760                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_hausdorff.so
7fb6c2eda000-7fb6c2edd000 rw-p 0002c000 08:01 153752760                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_hausdorff.so
7fb6c2edd000-7fb6c2ee0000 r--p 00000000 08:01 153752752                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_distance_wrap.so
7fb6c2ee0000-7fb6c2ef3000 r-xp 00003000 08:01 153752752                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_distance_wrap.so
7fb6c2ef3000-7fb6c2ef6000 r--p 00016000 08:01 153752752                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_distance_wrap.so
7fb6c2ef6000-7fb6c2ef7000 r--p 00018000 08:01 153752752                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_distance_wrap.so
7fb6c2ef7000-7fb6c2ef8000 rw-p 00019000 08:01 153752752                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_distance_wrap.so
7fb6c2ef8000-7fb6c2efe000 r--p 00000000 08:01 153752759                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_voronoi.so
7fb6c2efe000-7fb6c2f1c000 r-xp 00006000 08:01 153752759                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_voronoi.so
7fb6c2f1c000-7fb6c2f22000 r--p 00024000 08:01 153752759                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_voronoi.so
7fb6c2f22000-7fb6c2f23000 r--p 00029000 08:01 153752759                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_voronoi.so
7fb6c2f23000-7fb6c2f26000 rw-p 0002a000 08:01 153752759                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/_voronoi.so
7fb6c2f26000-7fb6c2f66000 rw-p 00000000 00:00 0 
7fb6c2f66000-7fb6c2f71000 r--p 00000000 08:01 153752778                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/qhull.so
7fb6c2f71000-7fb6c3003000 r-xp 0000b000 08:01 153752778                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/qhull.so
7fb6c3003000-7fb6c302a000 r--p 0009d000 08:01 153752778                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/scipy/spatial/qhull.soexample/mxnet/start_mxnet_byteps.sh: line 5: 13709 Aborted                 (core dumped) python $path/train_imagenet_byteps.py $@
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/laochanlam/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/laochanlam/anaconda2/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/home/laochanlam/anaconda2/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command 'example/mxnet/start_mxnet_byteps.sh' returned non-zero exit status 134

PyTorch

(base) laochanlam@ubuntu-X299-UD4-Pro:~/git/byteps_org$ DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 DMLC_WORKER_ID=0 DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 EVAL_TYPE=benchmark python launcher/launch.py example/pytorch/start_pytorch_byteps.sh 
BytePS launching worker
running benchmark...
[14:53:42] src/./zmq_van.h:61: BYTEPS_ZMQ_MAX_SOCKET set to 1024
[14:53:42] src/./zmq_van.h:66: BYTEPS_ZMQ_NTHREADS set to 4
[14:53:42] src/customer.cc:363: Do not use thread pool for receiving.
[14:53:42] src/van.cc:357: Bind to role=worker, ip=101.6.101.94, port=37199, is_recovery=0
[14:53:42] src/./zmq_van.h:286: Start ZMQ recv thread
[14:53:42] src/van.cc:446: ? => 1. Meta: request=0, timestamp=0, control={ cmd=ADD_NODE, node={ role=worker, ip=101.6.101.94, port=37199, is_recovery=0 } }. THIS IS NOT DATA MSG!
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007ffc3e62b1c8 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f2d7a5637e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7f2d7a570698]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x34e39)[0x7f2cf8c12e39]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x89782)[0x7f2cf8c67782]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x968b6)[0x7f2cf8c748b6]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(+0x8461c)[0x7f2cf8c6261c]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(_ZN6byteps6common12BytePSGlobal4InitEv+0x556)[0x7f2cf8c26076]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so(byteps_init+0x1f)[0x7f2cf8c0e1bf]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)[0x7f2d4898aec0]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d)[0x7f2d4898a87d]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x4de)[0x7f2d48ba199e]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(+0x9b61)[0x7f2d48b97b61]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2d7b298b73]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3bb9)[0x7f2d7b32f119]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x7fee)[0x7f2d7b33354e]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7e9)[0x7f2d7b334a99]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCode+0x1a)[0x7f2d7b334cba]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(+0x10101d)[0x7f2d7b34e01d]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x78)[0x7f2d7b34f1c8]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe8)[0x7f2d7b3503e8]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(Py_Main+0xbac)[0x7f2d7b36267c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f2d7a50c830]
python(+0x107f)[0x55eae0b4107f]
======= Memory map: ========
200000000-400200000 ---p 00000000 00:00 0 
10000000000-10100000000 ---p 00000000 00:00 0 
55eae0b40000-55eae0b41000 r--p 00000000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55eae0b41000-55eae0b42000 r-xp 00001000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55eae0b42000-55eae0b43000 r--p 00002000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55eae0b43000-55eae0b44000 r--p 00002000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55eae0b44000-55eae0b45000 rw-p 00003000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55eae2ab3000-55eae430a000 rw-p 00000000 00:00 0                          [heap]
7f2cc8000000-7f2cc8021000 rw-p 00000000 00:00 0 
7f2cc8021000-7f2ccc000000 ---p 00000000 00:00 0 
7f2ccc000000-7f2ccc021000 rw-p 00000000 00:00 0 
7f2ccc021000-7f2cd0000000 ---p 00000000 00:00 0 
7f2cd0000000-7f2cd0021000 rw-p 00000000 00:00 0 
7f2cd0021000-7f2cd4000000 ---p 00000000 00:00 0 
7f2cd4000000-7f2cd4021000 rw-p 00000000 00:00 0 
7f2cd4021000-7f2cd8000000 ---p 00000000 00:00 0 
7f2cd8000000-7f2cd8021000 rw-p 00000000 00:00 0 
7f2cd8021000-7f2cdc000000 ---p 00000000 00:00 0 
7f2ce0000000-7f2cf0000000 ---p 00000000 00:00 0 
7f2cf03d9000-7f2cf03da000 ---p 00000000 00:00 0 
7f2cf03da000-7f2cf0bda000 rw-p 00000000 00:00 0 
7f2cf0bda000-7f2cf0bdb000 ---p 00000000 00:00 0 
7f2cf0bdb000-7f2cf13db000 rw-p 00000000 00:00 0 
7f2cf13db000-7f2cf13dc000 ---p 00000000 00:00 0 
7f2cf13dc000-7f2cf1bdc000 rw-p 00000000 00:00 0 
7f2cf1bdc000-7f2cf1bdd000 ---p 00000000 00:00 0 
7f2cf1bdd000-7f2cf23dd000 rw-p 00000000 00:00 0 
7f2cf23dd000-7f2cf23de000 ---p 00000000 00:00 0 
7f2cf23de000-7f2cf2bde000 rw-p 00000000 00:00 0 
7f2cf2bde000-7f2cf8bde000 ---p 00000000 00:00 0 
7f2cf8bde000-7f2cf8c06000 r--p 00000000 08:01 151258785                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so
7f2cf8c06000-7f2cf8d63000 r-xp 00028000 08:01 151258785                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so
7f2cf8d63000-7f2cfd7f3000 r--p 00185000 08:01 151258785                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so
7f2cfd7f3000-7f2cfd7fa000 r--p 04c14000 08:01 151258785                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so
7f2cfd7fa000-7f2cfd7fc000 rw-p 04c1b000 08:01 151258785                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/torch/c_lib.so
7f2cfd7fc000-7f2cfd7ff000 rw-p 00000000 00:00 0 
7f2cfd7ff000-7f2d0751c000 r-xp 00000000 08:01 154014683                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2.so
7f2d0751c000-7f2d0771c000 ---p 09d1d000 08:01 154014683                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2.so
7f2d0771c000-7f2d077fa000 r--p 09d1d000 08:01 154014683                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2.so
7f2d077fa000-7f2d07955000 rw-p 09dfb000 08:01 154014683                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2.so
7f2d07955000-7f2d079cb000 rw-p 00000000 00:00 0 
7f2d079cb000-7f2d07dda000 rw-p 0ad8e000 08:01 154014683                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2.so
7f2d07dda000-7f2d41e0d000 r-xp 00000000 08:01 154014692                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2_gpu.so
7f2d41e0d000-7f2d4200d000 ---p 3a033000 08:01 154014692                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2_gpu.so
7f2d4200d000-7f2d42112000 r--p 3a033000 08:01 154014692                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2_gpu.so
7f2d42112000-7f2d43533000 rw-p 3a138000 08:01 154014692                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2_gpu.so
7f2d43533000-7f2d442ed000 rw-p 00000000 00:00 0 
7f2d442ed000-7f2d45a50000 rw-p 3e9f7000 08:01 154014692                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/torch/lib/libcaffe2_gpu.so
7f2d45a50000-7f2d45ad0000 rw-p 00000000 00:00 0 
7f2d45b09000-7f2d45bc4000 r-xp 00000000 08:01 154276261                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so
7f2d45bc4000-7f2d45dc4000 ---p 000bb000 08:01 154276261                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so
7f2d45dc4000-7f2d45de9000 rw-p 000bb000 08:01 154276261                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so
7f2d45de9000-7f2d45e2b000 rw-p 00000000 00:00 0 
7f2d45e2b000-7f2d45e34000 r-xp 00000000 08:01 154276046                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/fft/fftpack_lite.so
7f2d45e34000-7f2d46033000 ---p 00009000 08:01 154276046                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/fft/fftpack_lite.so
7f2d46033000-7f2d46034000 rw-p 00008000 08:01 154276046                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/fft/fftpack_lite.so
7f2d46034000-7f2d46074000 rw-p 00000000 00:00 0 
7f2d46074000-7f2d4609d000 r-xp 00000000 08:01 154275981                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so
7f2d4609d000-7f2d4629c000 ---p 00029000 08:01 154275981                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so
7f2d4629c000-7f2d462a1000 rw-p 00028000 08:01 154275981                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so
7f2d462a1000-7f2d462a5000 r-xp 00000000 08:01 154275986                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so
7f2d462a5000-7f2d464a5000 ---p 00004000 08:01 154275986                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so
7f2d464a5000-7f2d464a8000 rw-p 00004000 08:01 154275986                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so
7f2d464a8000-7f2d46528000 rw-p 00000000 00:00 0 
7f2d46528000-7f2d4652a000 r--p 00000000 08:01 150866969                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_random.so
7f2d4652a000-7f2d4652b000 r-xp 00002000 08:01 150866969                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_random.so
7f2d4652b000-7f2d4652c000 r--p 00003000 08:01 150866969                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_random.so
7f2d4652c000-7f2d4652d000 r--p 00003000 08:01 150866969                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_random.so
7f2d4652d000-7f2d4652e000 rw-p 00004000 08:01 150866969                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_random.so
7f2d4652e000-7f2d465b5000 r--p 00000000 08:01 150867250                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_hashlib.so
7f2d465b5000-7f2d46754000 r-xp 00087000 08:01 150867250                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_hashlib.so
7f2d46754000-7f2d467cd000 r--p 00226000 08:01 150867250                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_hashlib.so
7f2d467cd000-7f2d467fc000 r--p 0029e000 08:01 150867250                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_hashlib.so
7f2d467fc000-7f2d467ff000 rw-p 002cd000 08:01 150867250                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_hashlib.so
7f2d467ff000-7f2d46842000 rw-p 00000000 00:00 0 
7f2d46842000-7f2d46843000 r--p 00000000 08:01 150866965                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/grp.so
7f2d46843000-7f2d46844000 r-xp 00001000 08:01 150866965                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/grp.so
7f2d46844000-7f2d46845000 r--p 00002000 08:01 150866965                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/grp.so
7f2d46845000-7f2d46846000 r--p 00002000 08:01 150866965                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/grp.so
7f2d46846000-7f2d46847000 rw-p 00003000 08:01 150866965                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/grp.so
7f2d46847000-7f2d4684b000 r--p 00000000 08:01 150867017                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/bz2.so
7f2d4684b000-7f2d4685d000 r-xp 00004000 08:01 150867017                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/bz2.so
7f2d4685d000-7f2d46860000 r--p 00016000 08:01 150867017                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/bz2.so
7f2d46860000-7f2d46861000 r--p 00018000 08:01 150867017                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/bz2.so
7f2d46861000-7f2d46864000 rw-p 00019000 08:01 150867017                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/bz2.so
7f2d46864000-7f2d46866000 r--p 00000000 08:01 150866983                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/zlib.so
7f2d46866000-7f2d46869000 r-xp 00002000 08:01 150866983                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/zlib.so
7f2d46869000-7f2d4686a000 r--p 00005000 08:01 150866983                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/zlib.so
7f2d4686a000-7f2d4686b000 r--p 00005000 08:01 150866983                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/zlib.so
7f2d4686b000-7f2d4686d000 rw-p 00006000 08:01 150866983                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/zlib.so
7f2d4686d000-7f2d4692d000 rw-p 00000000 00:00 0 
7f2d4692d000-7f2d46931000 r--p 00000000 08:01 150867007                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/cPickle.so
7f2d46931000-7f2d4693f000 r-xp 00004000 08:01 150867007                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/cPickle.so
7f2d4693f000-7f2d46942000 r--p 00012000 08:01 150867007                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/cPickle.so
7f2d46942000-7f2d46943000 ---p 00015000 08:01 150867007                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/cPickle.so
7f2d46943000-7f2d46944000 r--p 00015000 08:01 150867007                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/cPickle.so
7f2d46944000-7f2d46945000 rw-p 00016000 08:01 150867007                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/cPickle.so
7f2d46945000-7f2d48945000 rw-p 00000000 00:00 0 
7f2d48945000-7f2d48985000 rw-p 00000000 00:00 0 
7f2d48985000-7f2d4898c000 r-xp 00000000 08:01 150874486                  /home/laochanlam/anaconda2/lib/libffi.so.6.0.4
7f2d4898c000-7f2d48b8c000 ---p 00007000 08:01 150874486                  /home/laochanlam/anaconda2/lib/libffi.so.6.0.4
7f2d48b8c000-7f2d48b8d000 r--p 00007000 08:01 150874486                  /home/laochanlam/anaconda2/lib/libffi.so.6.0.4
7f2d48b8d000-7f2d48b8e000 rw-p 00008000 08:01 150874486                  /home/laochanlam/anaconda2/lib/libffi.so.6.0.4
7f2d48b8e000-7f2d48b96000 r--p 00000000 08:01 150867013                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so
7f2d48b96000-7f2d48ba6000 r-xp 00008000 08:01 150867013                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so
7f2d48ba6000-7f2d48bac000 r--p 00018000 08:01 150867013                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so
7f2d48bac000-7f2d48bad000 r--p 0001d000 08:01 150867013                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so
7f2d48bad000-7f2d48bb1000 rw-p 0001e000 08:01 150867013                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so
7f2d48bb1000-7f2d4abb1000 rw-p 00000000 00:00 0 
7f2d4abb1000-7f2d4ad49000 r-xp 00000000 08:01 154275839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/core/umath.so
7f2d4ad49000-7f2d4af49000 ---p 00198000 08:01 154275839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/core/umath.so
7f2d4af49000-7f2d4af4f000 rw-p 00198000 08:01 154275839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/core/umath.so
7f2d4af4f000-7f2d4af91000 rw-p 00000000 00:00 0 
7f2d4af91000-7f2d4af96000 r--p 00000000 08:01 150867010                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/datetime.so
7f2d4af96000-7f2d4afa4000 r-xp 00005000 08:01 150867010                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/datetime.so
7f2d4afa4000-7f2d4afa8000 r--p 00013000 08:01 150867010                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/datetime.so
7f2d4afa8000-7f2d4afa9000 ---p 00017000 08:01 150867010                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/datetime.so
7f2d4afa9000-7f2d4afaa000 r--p 00017000 08:01 150867010                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/datetime.so
7f2d4afaa000-7f2d4afad000 rw-p 00018000 08:01 150867010                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/datetime.so
7f2d4afad000-7f2d4afae000 rw-p 00000000 00:00 0 
7f2d4afae000-7f2d50fae000 rw-p 00000000 00:00 0 
7f2d50fae000-7f2d50faf000 ---p 00000000 00:00 0 
7f2d50faf000-7f2d517af000 rw-p 00000000 00:00 0 
7f2d517af000-7f2d517b0000 ---p 00000000 00:00 0 
7f2d517b0000-7f2d51fb0000 rw-p 00000000 00:00 0 
7f2d51fb0000-7f2d51fb1000 ---p 00000000 00:00 0 
7f2d51fb1000-7f2d527b1000 rw-p 00000000 00:00 0 
7f2d527b1000-7f2d527b2000 ---p 00000000 00:00 0 
7f2d527b2000-7f2d52fb2000 rw-p 00000000 00:00 0 
7f2d52fb2000-7f2d58fb2000 rw-p 00000000 00:00 0 
7f2d59645000-7f2d5a42d000 r-xp 00000000 103:03 36716961                  /usr/lib/x86_64-linux-gnu/libcuda.so.430.34
7f2d5a42d000-7f2d5a62d000 ---p 00de8000 103:03 36716961                  /usr/lib/x86_64-linux-gnu/libcuda.so.430.34
7f2d5a62d000-7f2d5a7a5000 rw-p 00de8000 103:03 36716961                  /usr/lib/x86_64-linux-gnu/libcuda.so.430.34
7f2d5a7a5000-7f2d5a7b5000 rw-p 00000000 00:00 0 
7f2d5a7b5000-7f2d607b5000 rw-p 00000000 00:00 0 
7f2d607b5000-7f2d607b6000 ---p 00000000 00:00 0 
7f2d607b6000-7f2d60fb6000 rw-p 00000000 00:00 0 
7f2d60fb6000-7f2d66fb6000 rw-p 00000000 00:00 0 
7f2d67569000-7f2d675b0000 r-xp 00000000 103:03 36716966                  /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.34
7f2d675b0000-7f2d677b0000 ---p 00047000 103:03 36716966                  /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.34
7f2d677b0000-7f2d677b2000 rw-p 00047000 103:03 36716966                  /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.34
7f2d677b2000-7f2d677b7000 rw-p 00000000 00:00 0 
7f2d677b7000-7f2d697b7000 rw-p 00000000 00:00 0 
7f2d69809000-7f2d69849000 rw-p 00000000 00:00 0 
7f2d69849000-7f2d69853000 r-xp 00000000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7f2d69853000-7f2d69a52000 ---p 0000a000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7f2d69a52000-7f2d69a53000 r--p 00009000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7f2d69a53000-7f2d69a54000 rw-p 0000a000 103:03 36708927                  /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0
7f2d69a7b000-7f2d69a81000 r--p 00000000 08:01 150867018                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/pyexpat.so
7f2d69a81000-7f2d69aab000 r-xp 00006000 08:01 150867018                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/pyexpat.so
7f2d69aab000-7f2d69abc000 r--p 00030000 08:01 150867018                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/pyexpat.so
7f2d69abc000-7f2d69abf000 r--p 00040000 08:01 150867018                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/pyexpat.so
7f2d69abf000-7f2d69ac1000 rw-p 00043000 08:01 150867018                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/pyexpat.so
7f2d69ac1000-7f2d69e82000 rw-p 00000000 00:00 0 
7f2d69e82000-7f2d69ea7000 r-xp 00000000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7f2d69ea7000-7f2d6a0a6000 ---p 00025000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7f2d6a0a6000-7f2d6a0a7000 r--p 00024000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7f2d6a0a7000-7f2d6a0a8000 rw-p 00025000 08:01 151001950                  /home/laochanlam/anaconda2/lib/liblzma.so.5.2.4
7f2d6a0a8000-7f2d6a0b2000 r--p 00000000 08:01 151913051                  /home/laochanlam/anaconda2/lib/libzstd.so.1.3.7example/pytorch/start_pytorch_byteps.sh: line 20: 14049 Aborted                 (core dumped) python $path/benchmark_byteps.py $@
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/laochanlam/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/laochanlam/anaconda2/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/home/laochanlam/anaconda2/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command 'example/pytorch/start_pytorch_byteps.sh' returned non-zero exit status 134

Tensorflow

(base) laochanlam@ubuntu-X299-UD4-Pro:~/git/byteps_org$ DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 DMLC_WORKER_ID=0 DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 EVAL_TYPE=benchmark python launcher/launch.py example/tensorflow/run_tensorflow_byteps.sh 
BytePS launching worker
Run synthetic benchmark...
[14:56:55] src/./zmq_van.h:61: BYTEPS_ZMQ_MAX_SOCKET set to 1024
[14:56:55] src/./zmq_van.h:66: BYTEPS_ZMQ_NTHREADS set to 4
[14:56:55] src/customer.cc:363: Do not use thread pool for receiving.
[14:56:55] src/van.cc:357: Bind to role=worker, ip=101.6.101.94, port=34677, is_recovery=0
[[14:56:55] src/van.cc:446: ? => 1. Meta: request=0, timestamp=0, control={ cmd=ADD_NODE, node={ role=worker, ip=101.6.101.94, port=34677, is_recovery=0 } }. THIS IS NOT DATA MSG!
14:56:55*** Error in `python': munmap_chunk(): invalid pointer: 0x00007fff1d4dcc28 ***
] src/./zmq_van.h:286: Start ZMQ recv thread
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f3b2d5f17e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7f3b2d5fe698]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x33199)[0x7f3a93c09199]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x7a3e2)[0x7f3a93c503e2]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x87486)[0x7f3a93c5d486]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(+0x7527c)[0x7f3a93c4b27c]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(_ZN6byteps6common12BytePSGlobal4InitEv+0x556)[0x7f3a93c1b666]
/home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so(byteps_init+0x1f)[0x7f3a93c0452f]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)[0x7f3af9f9fec0]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d)[0x7f3af9f9f87d]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x4de)[0x7f3afa1b699e]
/home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(+0x9b61)[0x7f3afa1acb61]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f3b2e326b73]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3bb9)[0x7f3b2e3bd119]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x7fee)[0x7f3b2e3c154e]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x7e9)[0x7f3b2e3c2a99]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCode+0x1a)[0x7f3b2e3c2cba]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(+0x10101d)[0x7f3b2e3dc01d]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x78)[0x7f3b2e3dd1c8]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xe8)[0x7f3b2e3de3e8]
/home/laochanlam/anaconda2/bin/../lib/libpython2.7.so.1.0(Py_Main+0xbac)[0x7f3b2e3f067c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f3b2d59a830]
python(+0x107f)[0x55ef6404907f]
======= Memory map: ========
55ef64048000-55ef64049000 r--p 00000000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55ef64049000-55ef6404a000 r-xp 00001000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55ef6404a000-55ef6404b000 r--p 00002000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55ef6404b000-55ef6404c000 r--p 00002000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55ef6404c000-55ef6404d000 rw-p 00003000 08:01 150867207                  /home/laochanlam/anaconda2/bin/python2.7
55ef64a26000-55ef690d5000 rw-p 00000000 00:00 0                          [heap]
7f3a7c000000-7f3a7c021000 rw-p 00000000 00:00 0 
7f3a7c021000-7f3a80000000 ---p 00000000 00:00 0 
7f3a80000000-7f3a80021000 rw-p 00000000 00:00 0 
7f3a80021000-7f3a84000000 ---p 00000000 00:00 0 
7f3a84000000-7f3a84021000 rw-p 00000000 00:00 0 
7f3a84021000-7f3a88000000 ---p 00000000 00:00 0 
7f3a88000000-7f3a88021000 rw-p 00000000 00:00 0 
7f3a88021000-7f3a8c000000 ---p 00000000 00:00 0 
7f3a8c000000-7f3a8c021000 rw-p 00000000 00:00 0 
7f3a8c021000-7f3a90000000 ---p 00000000 00:00 0 
7f3a913d1000-7f3a913d2000 ---p 00000000 00:00 0 
7f3a913d2000-7f3a91bd2000 rw-p 00000000 00:00 0 
7f3a91bd2000-7f3a91bd3000 ---p 00000000 00:00 0 
7f3a91bd3000-7f3a923d3000 rw-p 00000000 00:00 0 
7f3a923d3000-7f3a923d4000 ---p 00000000 00:00 0 
7f3a923d4000-7f3a92bd4000 rw-p 00000000 00:00 0 
7f3a92bd4000-7f3a92bd5000 ---p 00000000 00:00 0 
7f3a92bd5000-7f3a933d5000 rw-p 00000000 00:00 0 
7f3a933d5000-7f3a933d6000 ---p 00000000 00:00 0 
7f3a933d6000-7f3a93bd6000 rw-p 00000000 00:00 0 
7f3a93bd6000-7f3a93bfd000 r--p 00000000 08:01 151128839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so
7f3a93bfd000-7f3a93d4c000 r-xp 00027000 08:01 151128839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so
7f3a93d4c000-7f3a987d8000 r--p 00176000 08:01 151128839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so
7f3a987d8000-7f3a987df000 r--p 04c01000 08:01 151128839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so
7f3a987df000-7f3a987e1000 rw-p 04c08000 08:01 151128839                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/byteps-0.1.0-py2.7-linux-x86_64.egg/byteps/tensorflow/c_lib.so
7f3a987e1000-7f3a990e5000 rw-p 00000000 00:00 0 
7f3a990e5000-7f3a990fd000 r-xp 00000000 08:01 154533992                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/fast_tensor_util.so
7f3a990fd000-7f3a992fc000 ---p 00018000 08:01 154533992                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/fast_tensor_util.so
7f3a992fc000-7f3a992fd000 r--p 00017000 08:01 154533992                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/fast_tensor_util.so
7f3a992fd000-7f3a992ff000 rw-p 00018000 08:01 154533992                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/fast_tensor_util.so
7f3a992ff000-7f3a9943f000 rw-p 00000000 00:00 0 
7f3a9943f000-7f3a9966d000 r-xp 00000000 08:01 154010518                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/google/protobuf/pyext/_message.so
7f3a9966d000-7f3a9986d000 ---p 0022e000 08:01 154010518                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/google/protobuf/pyext/_message.so
7f3a9986d000-7f3a9987e000 rw-p 0022e000 08:01 154010518                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/google/protobuf/pyext/_message.so
7f3a9987e000-7f3a99881000 rw-p 00000000 00:00 0 
7f3a99881000-7f3a99882000 r-xp 00000000 08:01 154010501                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/_api_implementation.so
7f3a99882000-7f3a99a81000 ---p 00001000 08:01 154010501                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/_api_implementation.so
7f3a99a81000-7f3a99a82000 rw-p 00000000 08:01 154010501                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/_api_implementation.so
7f3a99a82000-7f3a99cc2000 rw-p 00000000 00:00 0 
7f3a99ce7000-7f3a99ce9000 r--p 00000000 08:01 150866981                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/select.so
7f3a99ce9000-7f3a99ceb000 r-xp 00002000 08:01 150866981                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/select.so
7f3a99ceb000-7f3a99cec000 r--p 00004000 08:01 150866981                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/select.so
7f3a99cec000-7f3a99ced000 r--p 00004000 08:01 150866981                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/select.so
7f3a99ced000-7f3a99cef000 rw-p 00005000 08:01 150866981                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/select.so
7f3a99cef000-7f3a99cf1000 r--p 00000000 08:01 150866994                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_json.so
7f3a99cf1000-7f3a99cf8000 r-xp 00002000 08:01 150866994                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_json.so
7f3a99cf8000-7f3a99cf9000 r--p 00009000 08:01 150866994                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_json.so
7f3a99cf9000-7f3a99cfa000 ---p 0000a000 08:01 150866994                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_json.so
7f3a99cfa000-7f3a99cfb000 r--p 0000a000 08:01 150866994                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_json.so
7f3a99cfb000-7f3a99cfc000 rw-p 0000b000 08:01 150866994                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_json.so
7f3a99cfc000-7f3a99dfc000 rw-p 00000000 00:00 0 
7f3a99dfc000-7f3a99ea9000 r--p 00000000 08:01 150867251                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ssl.so
7f3a99ea9000-7f3a9a0a4000 r-xp 000ad000 08:01 150867251                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ssl.so
7f3a9a0a4000-7f3a9a13e000 r--p 002a8000 08:01 150867251                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ssl.so
7f3a9a13e000-7f3a9a176000 r--p 00341000 08:01 150867251                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ssl.so
7f3a9a176000-7f3a9a17f000 rw-p 00379000 08:01 150867251                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_ssl.so
7f3a9a17f000-7f3a9a282000 rw-p 00000000 00:00 0 
7f3a9a282000-7f3a9a2c9000 r-xp 00000000 103:03 36716966                  /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.34
7f3a9a2c9000-7f3a9a4c9000 ---p 00047000 103:03 36716966                  /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.34
7f3a9a4c9000-7f3a9a4cb000 rw-p 00047000 103:03 36716966                  /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.34
7f3a9a4cb000-7f3a9a4d0000 rw-p 00000000 00:00 0 
7f3a9a4d0000-7f3a9c959000 r-xp 00000000 103:03 36833743                  /usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
7f3a9c959000-7f3a9cb58000 ---p 02489000 103:03 36833743                  /usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
7f3a9cb58000-7f3a9df2a000 rw-p 02488000 103:03 36833743                  /usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
7f3a9df2a000-7f3a9e434000 rw-p 00000000 00:00 0 
7f3a9e434000-7f3aa6262000 r-xp 00000000 103:03 36833767                  /usr/local/cuda-9.0/lib64/libcufft.so.9.0.176
7f3aa6262000-7f3aa6462000 ---p 07e2e000 103:03 36833767                  /usr/local/cuda-9.0/lib64/libcufft.so.9.0.176
7f3aa6462000-7f3aa6471000 rw-p 07e2e000 103:03 36833767                  /usr/local/cuda-9.0/lib64/libcufft.so.9.0.176
7f3aa6471000-7f3aa64d5000 rw-p 00000000 00:00 0 
7f3aa64d5000-7f3ac0266000 r-xp 00000000 103:03 36701489                  /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.1
7f3ac0266000-7f3ac0466000 ---p 19d91000 103:03 36701489                  /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.1
7f3ac0466000-7f3ac0529000 rw-p 19d91000 103:03 36701489                  /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.1
7f3ac0529000-7f3ac05bb000 rw-p 00000000 00:00 0 
7f3ac05bb000-7f3ac13a3000 r-xp 00000000 103:03 36716961                  /usr/lib/x86_64-linux-gnu/libcuda.so.430.34
7f3ac13a3000-7f3ac15a3000 ---p 00de8000 103:03 36716961                  /usr/lib/x86_64-linux-gnu/libcuda.so.430.34
7f3ac15a3000-7f3ac171b000 rw-p 00de8000 103:03 36716961                  /usr/lib/x86_64-linux-gnu/libcuda.so.430.34
7f3ac171b000-7f3ac172b000 rw-p 00000000 00:00 0 
7f3ac172b000-7f3ac172e000 r--p 00000000 08:01 150867372                  /home/laochanlam/anaconda2/lib/libgcc_s.so.1
7f3ac172e000-7f3ac173b000 r-xp 00003000 08:01 150867372                  /home/laochanlam/anaconda2/lib/libgcc_s.so.1
7f3ac173b000-7f3ac173e000 r--p 00010000 08:01 150867372                  /home/laochanlam/anaconda2/lib/libgcc_s.so.1
7f3ac173e000-7f3ac173f000 r--p 00012000 08:01 150867372                  /home/laochanlam/anaconda2/lib/libgcc_s.so.1
7f3ac173f000-7f3ac1740000 rw-p 00013000 08:01 150867372                  /home/laochanlam/anaconda2/lib/libgcc_s.so.1
7f3ac1740000-7f3ac17d4000 r--p 00000000 08:01 150873887                  /home/laochanlam/anaconda2/lib/libstdc++.so.6.0.25
7f3ac17d4000-7f3ac1839000 r-xp 00094000 08:01 150873887                  /home/laochanlam/anaconda2/lib/libstdc++.so.6.0.25
7f3ac1839000-7f3ac1870000 r--p 000f9000 08:01 150873887                  /home/laochanlam/anaconda2/lib/libstdc++.so.6.0.25
7f3ac1870000-7f3ac187a000 r--p 0012f000 08:01 150873887                  /home/laochanlam/anaconda2/lib/libstdc++.so.6.0.25
7f3ac187a000-7f3ac187e000 rw-p 00139000 08:01 150873887                  /home/laochanlam/anaconda2/lib/libstdc++.so.6.0.25
7f3ac187e000-7f3ac1881000 rw-p 00000000 00:00 0 
7f3ac1881000-7f3ac1888000 r-xp 00000000 103:03 13632009                  /lib/x86_64-linux-gnu/librt-2.23.so
7f3ac1888000-7f3ac1a87000 ---p 00007000 103:03 13632009                  /lib/x86_64-linux-gnu/librt-2.23.so
7f3ac1a87000-7f3ac1a88000 r--p 00006000 103:03 13632009                  /lib/x86_64-linux-gnu/librt-2.23.so
7f3ac1a88000-7f3ac1a89000 rw-p 00007000 103:03 13632009                  /lib/x86_64-linux-gnu/librt-2.23.so
7f3ac1a89000-7f3ac1a8c000 r--p 00000000 08:01 150866998                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/array.so
7f3ac1a8c000-7f3ac1a92000 r-xp 00003000 08:01 150866998                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/array.so
7f3ac1a92000-7f3ac1a94000 r--p 00009000 08:01 150866998                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/array.so
7f3ac1a94000-7f3ac1a95000 r--p 0000a000 08:01 150866998                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/array.so
7f3ac1a95000-7f3ac1a97000 rw-p 0000b000 08:01 150866998                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/array.so
7f3ac1a97000-7f3ac1a9c000 r--p 00000000 08:01 150867008                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_socket.so
7f3ac1a9c000-7f3ac1aa6000 r-xp 00005000 08:01 150867008                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_socket.so
7f3ac1aa6000-7f3ac1aaa000 r--p 0000f000 08:01 150867008                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_socket.so
7f3ac1aaa000-7f3ac1aab000 r--p 00012000 08:01 150867008                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_socket.so
7f3ac1aab000-7f3ac1ab0000 rw-p 00013000 08:01 150867008                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_socket.so
7f3ac1ab0000-7f3ac1b19000 r-xp 00000000 103:03 36833747                  /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
7f3ac1b19000-7f3ac1d18000 ---p 00069000 103:03 36833747                  /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
7f3ac1d18000-7f3ac1d1c000 rw-p 00068000 103:03 36833747                  /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
7f3ac1d1c000-7f3ac1d1d000 rw-p 00000000 00:00 0 
7f3ac1d1d000-7f3ac66cc000 r-xp 00000000 103:03 36833876                  /usr/local/cuda-9.0/lib64/libcusolver.so.9.0.176
7f3ac66cc000-7f3ac68cc000 ---p 049af000 103:03 36833876                  /usr/local/cuda-9.0/lib64/libcusolver.so.9.0.176
7f3ac68cc000-7f3ac6906000 rw-p 049af000 103:03 36833876                  /usr/local/cuda-9.0/lib64/libcusolver.so.9.0.176
7f3ac6906000-7f3ac6918000 rw-p 00000000 00:00 0 
7f3ac6918000-7f3ac9b09000 r-xp 00000000 103:03 36833794                  /usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
7f3ac9b09000-7f3ac9d08000 ---p 031f1000 103:03 36833794                  /usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
7f3ac9d08000-7f3ac9d3f000 rw-p 031f0000 103:03 36833794                  /usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
7f3ac9d3f000-7f3ac9d4e000 rw-p 00000000 00:00 0 
7f3ac9d4e000-7f3acaa6a000 r-xp 00000000 08:01 154533900                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so
7f3acaa6a000-7f3acac69000 ---p 00d1c000 08:01 154533900                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so
7f3acac69000-7f3acacb0000 r--p 00d1b000 08:01 154533900                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so
7f3acacb0000-7f3acacb3000 rw-p 00d62000 08:01 154533900                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so
7f3acacb3000-7f3acacbc000 rw-p 00000000 00:00 0 
7f3acacbc000-7f3af88d2000 r-xp 00000000 08:01 154533902                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
7f3af88d2000-7f3af8ad2000 ---p 2dc16000 08:01 154533902                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
7f3af8ad2000-7f3af8eab000 r--p 2dc16000 08:01 154533902                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
7f3af8eab000-7f3af8ece000 rw-p 2dfef000 08:01 154533902                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
7f3af8ece000-7f3af9006000 rw-p 00000000 00:00 0 
7f3af9008000-7f3af900b000 r--p 00000000 08:01 150866978                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/termios.so
7f3af900b000-7f3af900c000 r-xp 00003000 08:01 150866978                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/termios.so
7f3af900c000-7f3af900d000 r--p 00004000 08:01 150866978                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/termios.so
7f3af900d000-7f3af900e000 r--p 00004000 08:01 150866978                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/termios.so
7f3af900e000-7f3af9010000 rw-p 00005000 08:01 150866978                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/termios.so
7f3af9010000-7f3af9012000 r--p 00000000 08:01 150866987                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_csv.so
7f3af9012000-7f3af9015000 r-xp 00002000 08:01 150866987                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_csv.so
7f3af9015000-7f3af9016000 r--p 00005000 08:01 150866987                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_csv.so
7f3af9016000-7f3af9017000 r--p 00005000 08:01 150866987                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_csv.so
7f3af9017000-7f3af9019000 rw-p 00006000 08:01 150866987                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/_csv.so
7f3af9019000-7f3af9022000 r--p 00000000 08:01 150867380                  /home/laochanlam/anaconda2/lib/libgomp.so.1.0.0
7f3af9022000-7f3af9035000 r-xp 00009000 08:01 150867380                  /home/laochanlam/anaconda2/lib/libgomp.so.1.0.0
7f3af9035000-7f3af903d000 r--p 0001c000 08:01 150867380                  /home/laochanlam/anaconda2/lib/libgomp.so.1.0.0
7f3af903d000-7f3af903e000 r--p 00023000 08:01 150867380                  /home/laochanlam/anaconda2/lib/libgomp.so.1.0.0
7f3af903e000-7f3af903f000 rw-p 00024000 08:01 150867380                  /home/laochanlam/anaconda2/lib/libgomp.so.1.0.0
7f3af903f000-7f3af90fa000 r-xp 00000000 08:01 154276261                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so
7f3af90fa000-7f3af92fa000 ---p 000bb000 08:01 154276261                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so
7f3af92fa000-7f3af931f000 rw-p 000bb000 08:01 154276261                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so
7f3af931f000-7f3af9361000 rw-p 00000000 00:00 0 
7f3af9361000-7f3af936a000 r-xp 00000000 08:01 154276046                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/fft/fftpack_lite.so
7f3af936a000-7f3af9569000 ---p 00009000 08:01 154276046                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/fft/fftpack_lite.so
7f3af9569000-7f3af956a000 rw-p 00008000 08:01 154276046                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/fft/fftpack_lite.so
7f3af956a000-7f3af95ea000 rw-p 00000000 00:00 0 
7f3af95ea000-7f3af95eb000 r--p 00000000 08:01 150866964                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/future_builtins.so
7f3af95eb000-7f3af95ec000 r-xp 00001000 08:01 150866964                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/future_builtins.so
7f3af95ec000-7f3af95ed000 r--p 00002000 08:01 150866964                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/future_builtins.so
7f3af95ed000-7f3af95ee000 r--p 00002000 08:01 150866964                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/future_builtins.so
7f3af95ee000-7f3af95ef000 rw-p 00003000 08:01 150866964                  /home/laochanlam/anaconda2/lib/python2.7/lib-dynload/future_builtins.so
7f3af95ef000-7f3af962f000 rw-p 00000000 00:00 0 
7f3af962f000-7f3af9658000 r-xp 00000000 08:01 154275981                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so
7f3af9658000-7f3af9857000 ---p 00029000 08:01 154275981                  /home/laochanlam/anaconda2/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so
example/tensorflow/run_tensorflow_byteps.sh: line 14: 14749 Aborted                 (core dumped) python $path/synthetic_benchmark.py $@
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/laochanlam/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/laochanlam/anaconda2/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "launcher/launch.py", line 19, in worker
    subprocess.check_call(command, env=my_env, stdout=sys.stdout, stderr=sys.stderr, shell=True)
  File "/home/laochanlam/anaconda2/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command 'example/tensorflow/run_tensorflow_byteps.sh' returned non-zero exit status 134


To Reproduce
Steps to reproduce the behavior:

  1. python setup.py install
  2. BYTEPS_LOG_LEVEL=TRACE DMLC_ROLE=worker DMLC_PS_ROOT_URI=10.0.0.1 DMLC_PS_ROOT_PORT=9000 DMLC_WORKER_ID=0 DMLC_NUM_WORKER=2 DMLC_NUM_SERVER=1 python launcher/launch.py example/mxnet/start_mxnet_byteps.sh
  3. See error

Expected behavior
Works.

Screenshots
N/A

Environment (please complete the following information):

  • OS: Ubuntu16.04
  • GCC version: 4.9 (follow Dockerfile.worker.pytorch.*)
  • CUDA and NCCL version: cuda9.0 with 2.4.7 & cuda10.0 with NCCL 2.3.7 (I was deployed on two machines and both have the same error)
  • Framework (TF, PyTorch, MXNet): TF, PyTorch, MXNet

Additional context
2GPU at single machine training is work. Therefore I think the problem is not related to CUDA or NCCL.

core dump in running tensorflow benchmark

I have successfully installed byteps using "python setup.py install".
when i run benchmark, byteps core dumped.

core backtrack:
image

env:
1.tf version 1.14
2.cuda version: 9.0
3.nccl version: 2.4.7 for cuda9.0
4.os: ubuntu 16.04
5.g++: 5.4.0

script: copy from step_by_step_tutorial.md
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export NVIDIA_VISIBLE_DEVICES=0,1,2,3
export DMLC_WORKER_ID=0
export DMLC_NUM_WORKER=1
export DMLC_ROLE=worker
export DMLC_NUM_SERVER=1
export DMLC_PS_ROOT_URI=x.x.x.x
export DMLC_PS_ROOT_PORT=9999
export EVAL_TYPE=benchmark

python /home/mark/mark/code/byteps/launcher/launch.py \
/home/mark/mark/code/byteps/example/tensorflow/run_tensorflow_byteps.sh \
--model ResNet50 --num-iters 1000

Failure in distributed training with MXNet

Describe the bug
Hi,
I am following this note and want to run distributed training using byteps in
MXNet. Both MXNet (bytedance/incubator-mxnet) and byteps (cb88d29) are built from their source code with RDMA enabled (USE_RDMA=1). The basic test for mxnet (tests/test_mxnet.py) with single worker has been passed in this environment.

However, when I am running the benchmark for MXNet, it keeps crashing and reports the following error:

mxnet.base.MXNetError: [00:05:18] src/storage/./pinned_memory_storage.h:61: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: all CUDA-capable devices are busy or unavailable

I have one scheduler, one server, and one worker (with 4 GPUs) in my experiment. The scheduler and the server are sharing the same node, and the worker is running on another node. All of them are running in their own 'virtualenv' without using the dockfile in this repo.

More details can be found in the followings.

To Reproduce

  • scheduler
export DMLC_ENABLE_RDMA=1
export DMLC_INTERFACE='ib0'
export DMLC_PS_ROOT_URI='192.168.0.103'
export DMLC_PS_ROOT_PORT=40001
export BYTEPS_FORCE_DISTRIBUTED=1
export DMLC_NUM_SERVER=1
export DMLC_NUM_WORKER=1
export DMLC_ROLE='scheduler'
python /{byteps_root}/launcher/launch.py
  • server
export DMLC_ENABLE_RDMA=1
export DMLC_INTERFACE='ib0'
export DMLC_PS_ROOT_URI='192.168.0.103'
export DMLC_PS_ROOT_PORT=40001
export BYTEPS_FORCE_DISTRIBUTED=1
export DMLC_NUM_SERVER=1
export DMLC_NUM_WORKER=1
export DMLC_ROLE='server'
python /{byteps_root}/launcher/launch.py
  • worker-0
export DMLC_ENABLE_RDMA=1
export DMLC_INTERFACE='ib0'
export DMLC_PS_ROOT_URI='192.168.0.103'
export DMLC_PS_ROOT_PORT=40001
export BYTEPS_FORCE_DISTRIBUTED=1
export DMLC_NUM_SERVER=1
export DMLC_NUM_WORKER=1
export DMLC_ROLE='worker'
export EVAL_TYPE=benchmark 
export DMLC_WORKER_ID=0
export NVIDIA_VISIBLE_DEVICES=0,1,2,3

python /{byteps_root}/launcher/launch.py /{byteps_root}/example/mxnet/start_mxnet_byteps.sh --benchmark 1 --batch-size=32

After launching them, I can see they are building RDMA connections and are executing ADD_NODE. However, the worker will also report the following errors:

...
[01:19:44] src/./rdma_van.h:572: Connecting to W[9]
[01:19:44] src/van.cc:306: W[9] is connected to others
INFO:root:start with arguments Namespace(batch_size=32, benchmark=1, cpu_train=False, data_nthreads=4, data_train=None, data_train_idx='', data_val=None, data_val_idx='', disp_batches=20, dtype='float32', gc_threshold=0.5, gc_type='none', image_shape='3,224,224', initializer='default', kv_store='device', load_epoc$
=None, loss='', lr=0.1, lr_factor=0.1, lr_step_epochs='30,60', macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet',
num_classes=1000, num_epochs=80, num_examples=1281167, num_layers=50, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
INFO:root:start with arguments Namespace(batch_size=32, benchmark=1, cpu_train=False, data_nthreads=4, data_train=None, data_train_idx='', data_val=None, data_val_idx='', disp_batches=20, dtype='float32', gc_threshold=0.5, gc_type='none', image_shape='3,224,224', initializer='default', kv_store='device', load_epoc$
=None, loss='', lr=0.1, lr_factor=0.1, lr_step_epochs='30,60', macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet',
num_classes=1000, num_epochs=80, num_examples=1281167, num_layers=50, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
INFO:root:start with arguments Namespace(batch_size=32, benchmark=1, cpu_train=False, data_nthreads=4, data_train=None, data_train_idx='', data_val=None, data_val_idx='', disp_batches=20, dtype='float32', gc_threshold=0.5, gc_type='none', image_shape='3,224,224', initializer='default', kv_store='device', load_epoc$
=None, loss='', lr=0.1, lr_factor=0.1, lr_step_epochs='30,60', macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet',
num_classes=1000, num_epochs=80, num_examples=1281167, num_layers=50, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
INFO:root:start with arguments Namespace(batch_size=32, benchmark=1, cpu_train=False, data_nthreads=4, data_train=None, data_train_idx='', data_val=None, data_val_idx='', disp_batches=20, dtype='float32', gc_threshold=0.5, gc_type='none', image_shape='3,224,224', initializer='default', kv_store='device', load_epoc$
=None, loss='', lr=0.1, lr_factor=0.1, lr_step_epochs='30,60', macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network='resnet',
num_classes=1000, num_epochs=80, num_examples=1281167, num_layers=50, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
Traceback (most recent call last):
  File "/{byteps-root}/example/mxnet/train_imagenet_byteps.py", line 66, in <module>
    fit.fit(args, sym, data.get_rec_iter)
  File "/{byteps-root}/example/mxnet/common/fit_byteps.py", line 161, in fit
    (train, val) = data_loader(args, (bps.rank(), bps.size()))
  File "/{byteps-root}/example/mxnet/common/data_byteps.py", line 116, in get_rec_iter
    args.num_examples / args.batch_size, np.float32)
  File "/{byteps-root}/example/mxnet/common/data_byteps.py", line 85, in __init__
    self.data = mx.nd.array(data, dtype=self.dtype, ctx=mx.Context('cpu_pinned', 0))
  File "/{virtualenv-path}/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/utils.py", line 146, in array
    return _array(source_array, ctx=ctx, dtype=dtype)
  File "/{virtualenv-path}/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/ndarray.py", line 2503, in array
    arr = empty(source_array.shape, ctx, dtype)
  File "/{virtualenv-path}/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/ndarray.py", line 3892, in empty
    return NDArray(handle=_new_alloc_handle(shape, ctx, False, dtype))
  File "/{virtualenv-path}/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/ndarray/ndarray.py", line 140, in _new_alloc_handle
Traceback (most recent call last):
  File "/{byteps-root}/example/mxnet/train_imagenet_byteps.py", line 66, in <module>
    ctypes.byref(hdl)))
  File "/{virtualenv-path}/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [01:19:45] src/storage/./pinned_memory_storage.h:61: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: all CUDA-capable devices are busy or unavailable
...

However, if the worker only has one GPU like this:

export NVIDIA_VISIBLE_DEVICES=0

then, the example can run without any problems.

It seems like one worker subprocess (in launch.py) is trying to get some busy GPUs when allocating NDArray. Did I make any mistakes or missing any configurations during this experiment?

Thank you

Expected behavior
None
Screenshots
None

Environment (please complete the following information):

  • OS: RedHat 7.5 (on PowerPC ppc64le)
  • GCC version: 5.4.0
  • Python version: 3.6.2 anaconda
  • CUDA and NCCL version: 9.2
  • Framework : MXNet

Additional context
None

RDMA compile error

Hi,
I tried to enable RDMA for byteps, but the logs output that byteps/3rdparty/ps-lite/src/./rdma_van.h:663: undefined reference to `ibv_reg_mr' and some other ibv_* functions are similar. Compare to the install process without RDMA, I just add BYTEPS_USE_RDMA=1 before installation. It seems that I need to specify the locations of my libibverbs.a. If so, would you mind adding support for customizing libiverbs's location?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.