When calling MobulaOP in a subprocess, it gets stuck. Environment: l

I tried that, but it does not work. Example code: <div class="highlight highlight-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Not working with multiple processes about mobulaop HOT 13 CLOSED

wkcn commented on September 23, 2024 1

Not working with multiple processes

from mobulaop.

Comments (13)

YutingZhang commented on September 23, 2024 1

Thanks!

FYI, If you move import mxnet as mx into foo(), the bug can disappear. But this is generally not doable because mxnet is usually imported in the main process. It may related to how mxnet works with subprocesses.

from mobulaop.

YutingZhang commented on September 23, 2024 1

I tried that, but it does not work. Example code:

from concurrent import futures

import sys
import mxnet as mx

import mobula
# Import Custom Operator Dynamically
mobula.op.load('./AdditionOP')

def foo():

    AdditionOP = mobula.op.AdditionOP

    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])

    a.attach_grad()
    b.attach_grad()

    with mx.autograd.record():
        c = AdditionOP(a, b)

    dc = mx.nd.array([7, 8, 9])
    c.backward(dc)

    assert ((a + b).asnumpy() == c.asnumpy()).all()
    assert (a.grad.asnumpy() == dc.asnumpy()).all()
    assert (b.grad.asnumpy() == dc.asnumpy()).all()

    print('Okay :-)')
    print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

def main():
    ex = futures.ProcessPoolExecutor(1)
    r = ex.submit(foo)
    r.result()

if __name__ == "__main__":
    main()

from mobulaop.

wkcn commented on September 23, 2024

Thanks for your report!
I will check it.

from mobulaop.

wkcn commented on September 23, 2024

moving import mobula and mobula.op.load('./AdditionOP') outside foo() may work, since MobulaOP will register operator into MXNet when mobula.op.load('./AdditionOP') is called.
I will add a check to avoid duplicated register.

from mobulaop.

wkcn commented on September 23, 2024

@YutingZhang
Hi! I found the bug is not related to MobulaOP.
It seems that MXNet triggers the bug.

from concurrent import futures

import mxnet as mx
import sys
from mobula.testing import assert_almost_equal
sys.path.append('../../')  # Add MobulaOP Path

class AdditionOP(mx.operator.CustomOp):
    def __init__(self):
        super(AdditionOP, self).__init__()
    def forward(self, is_train, req, in_data, out_data, aux):
        out_data[0][:] = in_data[0] + in_data[1]
    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        in_grad[0][:] = out_grad[0]
        in_grad[1][:] = out_grad[0]

@mx.operator.register("AdditionOP")
class AdditionOPProp(mx.operator.CustomOpProp):
    def __init__(self):
        super(AdditionOPProp, self).__init__()
    def list_arguments(self):
        return ['a', 'b']
    def list_outputs(self):
        return ['output']
    def infer_shape(self, in_shape):
        return in_shape, [in_shape[0]]
    def create_operator(self, ctx, shapes, dtypes):
        return AdditionOP()

def foo():
    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])

    a.attach_grad()
    b.attach_grad()

    print("REC")
    with mx.autograd.record():
        c = mx.nd.Custom(a, b, op_type='AdditionOP')

    dc = mx.nd.array([7, 8, 9])
    c.backward(dc)

    assert_almost_equal(a + b, c)
    assert_almost_equal(a.grad, dc)
    assert_almost_equal(b.grad, dc)

    print('Okay :-)')
    print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

def main():
    ex = futures.ProcessPoolExecutor(1)
    r = ex.submit(foo)
    r.result()

if __name__ == '__main__':
    main()

from mobulaop.

YutingZhang commented on September 23, 2024

So mx.nd.Custom is the actual problem ... MxNet just has lots of bugs when running in subprocess ...

from mobulaop.

wkcn commented on September 23, 2024

Yes.

from mobulaop.

YutingZhang commented on September 23, 2024

@wkcn Send you an email to your live.cn email :)

from mobulaop.

wkcn commented on September 23, 2024

Mail received. Thank you! : )

from mobulaop.

wkcn commented on September 23, 2024

Hi @YutingZhang , the two testcases you gave have been passed in the latest MXNet and MobulaOP : )

from mobulaop.

YutingZhang commented on September 23, 2024

@wkcn Thanks a lot! Did you work around the problem in MobulaOP? Or is it due to MxNet's update on CustomOP (you also contributed to this)?

from mobulaop.

wkcn commented on September 23, 2024

@YutingZhang It is due to MXNet’s update, and other contributors fixed it.

from mobulaop.

wkcn commented on September 23, 2024

Close it since the problem has been addressed. : )

from mobulaop.

Not working with multiple processes about mobulaop HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent