Code Monkey home page Code Monkey logo

mobulaop's Introduction

MobulaOP

Linux Windows Coverage Badge
Linux Build Status Windows Build Status Coverage Status 996.icu

What is it?

MobulaOP is a simple and flexible cross framework operators toolkit.

You can write custom operators by Python/C++/C/CUDA/HIP/TVM without rebuilding deep learning framework from source.

How to use it?

[中文教程]

[Tutorial]

  • Add an addition operator [Code]
import mobula

@mobula.op.register
class MyFirstOP:
    def forward(self, x, y):
        return x + y
    def backward(self, dy): 
        return [dy, dy]
    def infer_shape(self, in_shape):
        assert in_shape[0] == in_shape[1]
        return in_shape, [in_shape[0]]

# MXNet
import mxnet as mx
a = mx.nd.array([1, 2, 3])
b = mx.nd.array([4, 5, 6])
c = MyFirstOP(a, b)
print (c) # [5, 7, 9]

# PyTorch
import torch
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
c = MyFirstOP(a, b)
print (c) # [5, 7, 9]

# NumPy
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
op = MyFirstOP[np.ndarray]()
c = op(a, b)
print (c) # [5, 7, 9]

# CuPy
import cupy as cp
a = cp.array([1, 2, 3])
b = cp.array([4, 5, 6])
op = MyFirstOP[cp.ndarray]()
c = op(a, b)
print(c) # [5, 7, 9]
  • Use custom operators without rebuilding the source of deep learning framework [Code]
# Use ROIAlign operator
import mxnet as mx
import numpy as np
import mobula

# Load ROIAlign Module
mobula.op.load('ROIAlign')

ctx = mx.cpu(0)
dtype = np.float32
N, C, H, W = 2, 3, 4, 4

data = mx.nd.array(np.arange(N*C*H*W).astype(dtype).reshape((N,C,H,W)))
rois = mx.nd.array(np.array([[0, 1, 1, 3, 3]], dtype = dtype))

data.attach_grad()
with mx.autograd.record():
    # mx.nd.NDArray and mx.sym.Symbol are both available as the inputs.
    output = mobula.op.ROIAlign(data = data, rois = rois, pooled_size = (2,2), spatial_scale = 1.0, sampling_ratio = 1)

print (output.asnumpy(), data.grad.asnumpy())
  • Import Custom C++ Operator Dynamically [Code]
import mobula
# Import Custom Operator Dynamically
mobula.op.load('./AdditionOP')

import mxnet as mx
a = mx.nd.array([1,2,3])
b = mx.nd.array([4,5,6])
c = mobula.op.AdditionOP(a, b)

print ('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

How to get it?

# Clone the project
git clone https://github.com/wkcn/MobulaOP

# Enter the directory
cd MobulaOP

# Install MobulaOP
pip install -v -e .

mobulaop's People

Contributors

chop2 avatar damnull avatar kohillyang avatar merrymercy avatar mgno32 avatar wkcn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobulaop's Issues

[Question] Using types other than float32?

Is there a way to specify the data type of the outputs (other than always using float32)?

And, in general, does MobulaOP support mixed types when implementing a kernel?

Thanks!

undefined symbol: MXShallowCopyNDArray

When I test the tutorials, I get the warning:

/root/test/MobulaOP/mobula/glue/mx.py:44: UserWarning: Using asynchronous execution for MXNet failed, since /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so: undefined symbol: MXShallowCopyNDArray
It will drop the performance.

But I have try different version of mxnet: 1.5.0, 1.5.1, 1.6.0b20190729
same warning.

compile error

【python test_mul_func.py】,error occurs

[10:21:12] src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
/wls/tf_workspace/MobulaOP/mobula/glue/mx.py:44: UserWarning: Using asynchronous execution for MXNet failed, since /home/weishuyi/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so: undefined symbol: MXShallowCopyNDArray
It will drop the performance.
Recommend using the latest version of MXNet
Recommend using the latest version of MXNet""".format(e))
mkdir -p /wls/tf_workspace/MobulaOP/mobula/build/cpu/src
g++ /wls/tf_workspace/MobulaOP/mobula/src/defines.cpp -std=c++11 -DUSING_CUDA=0 -DUSING_HIP=0 -DUSING_OPENMP=0 -DHOST_NUM_THREADS=40 -O3 -DUSING_CBLAS=0 -I/wls/tf_workspace/MobulaOP/mobula/./ -I/wls/tf_workspace/MobulaOP/mobula/./inc -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/dlpack/include -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/tvm_packed_func -fPIC -Werror -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wmissing-include-dirs -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wundef -fdiagnostics-show-option -c -o /wls/tf_workspace/MobulaOP/mobula/build/cpu/src/defines.o
g++ /wls/tf_workspace/MobulaOP/mobula/src/context.cpp -std=c++11 -DUSING_CUDA=0 -DUSING_HIP=0 -DUSING_OPENMP=0 -DHOST_NUM_THREADS=40 -O3 -DUSING_CBLAS=0 -I/wls/tf_workspace/MobulaOP/mobula/./ -I/wls/tf_workspace/MobulaOP/mobula/./inc -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/dlpack/include -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/tvm_packed_func -fPIC -Werror -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wmissing-include-dirs -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wundef -fdiagnostics-show-option -c -o /wls/tf_workspace/MobulaOP/mobula/build/cpu/src/context.o
mkdir -p MulElemWise/build/MulElemWise/build/cpu
g++ MulElemWise/build/cpu/MulElemWise_wrapper.cpp -std=c++11 -DUSING_CUDA=0 -DUSING_HIP=0 -DUSING_OPENMP=0 -DHOST_NUM_THREADS=40 -O3 -DUSING_CBLAS=0 -I/wls/tf_workspace/MobulaOP/mobula/./ -I/wls/tf_workspace/MobulaOP/mobula/./inc -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/dlpack/include -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/tvm_packed_func -fPIC -Werror -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wmissing-include-dirs -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wundef -fdiagnostics-show-option -c -o MulElemWise/build/MulElemWise/build/cpu/MulElemWise_wrapper.o
In file included from /wls/tf_workspace/MobulaOP/mobula/./inc/mobula_op.h:5:0,
from MulElemWise/build/cpu/MulElemWise_wrapper.cpp:8:
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h: In function ‘void RegisterMXAPI(void*, void*, void*, void*, void*)’:
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:45:76: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXShallowCopyNDArray)>(shallow_copy_ndarray);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:46:73: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
MXNDArrayFree = reinterpret_cast<decltype(MXNDArrayFree)>(ndarray_free);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:48:74: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXNDArrayGetContext)>(ndarray_get_context);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:50:70: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXNDArrayToDLPack)>(ndarray_to_dlpack);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:52:73: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXEnginePushSyncND)>(engine_push_sync_nd);

Custom Operators Zoo

Hi there, this issue is to summarize some custom operators to be supported.
Please feel free to add it if you want any operator : )

Low performance in gpu mode

I wrote my first demo of Mobula op. The directory of my project:

mobula_test
  │  main.py
  └──TestOP
      └───TestOP.cpp

The content of files:
main.py:

import mobula
import mxnet as mx
from mxnet import nd
from tqdm import tqdm


if __name__ == '__main__':
    mobula.op.load('TestOP')
    ctx = mx.cpu()
    a = nd.ones((5000, 5000), ctx=ctx)
    b = nd.ones((5000, 5000), ctx=ctx)
    out = nd.empty(a.shape, ctx=ctx)

    print("cpu")
    for i in tqdm(range(1000)):
        mobula.func.TestOP(a.size, a, b, out)

    ctx = mx.gpu()
    a = nd.ones((5000, 5000), ctx=ctx)
    b = nd.ones((5000, 5000), ctx=ctx)
    out = nd.empty(a.shape, ctx=ctx)

    print("gpu")
    for i in tqdm(range(1000)):
        mobula.func.TestOP(a.size, a, b, out)

TestOP.cpp:

template<typename DType>
MOBULA_KERNEL TestOP_kernel(const int n, const DType* a, const DType* b, DType* out)
{
    parfor(n, [&](int i)
    {
        out[i] = a[i] + b[i];
    });
}

time cost: cpu 14s, gpu 226s on i7-7700k & 1080ti. The usage of cpu and gpu is both 100%
os environment: win10 1809, cuda 10.0

Not working with multiple processes

When calling MobulaOP in a subprocess, it gets stuck.

Environment: lastest mxnet nightly build and Python 3.6.5

An example code modified from dynamic_import_op.py to replicate this error.

from concurrent import futures

import sys
import mxnet as mx

def foo():
    import mobula
    # Import Custom Operator Dynamically
    mobula.op.load('./AdditionOP')
    AdditionOP = mobula.op.AdditionOP

    a = mx.nd.array([1, 2, 3])
    b = mx.nd.array([4, 5, 6])

    a.attach_grad()
    b.attach_grad()

    with mx.autograd.record():
        c = AdditionOP(a, b)

    dc = mx.nd.array([7, 8, 9])
    c.backward(dc)

    assert ((a + b).asnumpy() == c.asnumpy()).all()
    assert (a.grad.asnumpy() == dc.asnumpy()).all()
    assert (b.grad.asnumpy() == dc.asnumpy()).all()

    print('Okay :-)')
    print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))

def main():
    ex = futures.ProcessPoolExecutor(1)
    r = ex.submit(foo)
    r.result()

if __name__ == "__main__":
    main()

CustomOp in python and C++ for prediction

Hello, it's very useful! I have a problem, I define a custom operator in mxnet(python) and get a model. Now I want to load the model(.json & .params ) by mxnet(C++). Can you give me some advice? thanks.

Rename the package

The current package name is mobula in this project.
However, the name is duplicated with the project mobula.

I will rename the package of MobulaOP.

Leveraging framework specific math helpers

Hi @wkcn , really nice work.
I'd like to ask is it possible for Mobula to leverage existing math helpers in deep learning frameworks like ATen in Pytorch and Mshadow in MXNet?
Writing everything from vanilla C++ is prohibitively cumbersome and thus prevent the adoption of Mobula in real practice.

Is MobulaOP support cupy?

Is possible support cupy (NumPy-like API accelerated with CUDA)?

for example:
a = cupy.array([1,2,3])
b = cupy.array([4,5,6])
out = cupy.empty(a.shape)
mobula.func.mul_elemwise(a.size, a, b, out)

Traceback (most recent call last):
File "D:\Miniconda3\envs\python35\lib\site-packages\mobula\func.py", line 208, in call
var, ptype, template_mapping, using_async)
File "D:\Miniconda3\envs\python35\lib\site-packages\mobula\func.py", line 273, in _get_tensor_info
raise TypeError()
TypeError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test_mul_func.py", line 9, in
mobula.func.mul_elemwise(aa.size, aa, bb, outa)
File "D:\Miniconda3\envs\python35\lib\site-packages\mobula\func.py", line 239, in call
self.name, self.func.arg_types, list(map(type, args))))
TypeError: Unmatched parameters list of the function mul_elemwise:
[const int32_t, <typename const T*>, <typename const T*>, <typename T*>]
vs
[<class 'int'>, <class 'cupy.core.core.ndarray'>, <class 'cupy.core.core.ndarray'>, <class 'cupy.core.core.ndarray'>]

where to find these keywords of mxnet

hi
i am learning from your project and want to where to find these keywords of mxnet?

like those in check_backend(b):
func_names = ['get_pointer', 'dev_id', 'wait_to_read', 'wait_to_write', 'OpGen']
is there any pages online about meanings of these keywords?

GPU backend does not work for pytorch

The following code produces wrong output.
If I change .cuda() to .cpu(), I can get correct output.

(Fix #10 is required to run this example)

# Use ROIAlign operator
import sys
sys.path.append('../') # Add MobulaOP path
import numpy as np
import mobula
# Load ROIAlign Module
mobula.op.load('ROIAlign')

dtype = np.float32
N, C, H, W = 2, 3, 4, 4

import torch

data = torch.tensor(np.arange(N*C*H*W).astype(dtype).reshape((N,C,H,W))).cuda()
rois = torch.tensor(np.array([[0, 1, 1, 3, 3]], dtype = dtype)).cuda()

output = mobula.op.ROIAlign(data = data, rois = rois, pooled_size = (2,2), spatial_scale = 1.0, sampling_ratio = 1)

print("= OUTPUT =")
print (output)

LICENSE Problem

In this project, I use the header file functional-gcc4_9.h with GPL License to address the problem of ABI compability. I need to resolve the License problem.

I have removed the files under GPL in master branch.
In addition, there is a branch under GPL: https://github.com/wkcn/MobulaOP/tree/master-GPL
, which keeps the compatibility of gcc.

Lack of comments

There are too many codes which don't have comments. I need to add them.

Todo List:

  • Python Code
  • C++/C Code

gluon is supported?

firstly, thanks for the work! make it easy to use mxnet.

my question as following:
op created with Mobula may be called by gluon?

Question on Example on PyTorch

HI, I have tried your basic example on mulelementwise, but got this kind of error.

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/alexhu/Source/MobulaOP/mobula/glue/common.py", line 158, in __call__ return backend.op_gen(glue_mod, op=self.op, name=self.name)(*args, **new_kwargs) File "/home/alexhu/Source/MobulaOP/mobula/glue/th.py", line 41, in __call__ return self.cache[self.name](*pars[0], **pars[1])(*inputs) File "/home/alexhu/anaconda3/envs/slr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "/home/alexhu/Source/MobulaOP/mobula/glue/th.py", line 105, in forward return torch_func.apply(self, *args, **kwargs) File "/home/alexhu/Source/MobulaOP/mobula/glue/th.py", line 59, in forward out = self._forward(*args, **kwargs) File "/home/alexhu/Source/MobulaOP/docs/tutorial/MulElemWise/MulElemWise.py", line 7, in forward mobula.func.mul_elemwise(a.size, a, b, self.y) File "/home/alexhu/Source/MobulaOP/mobula/func.py", line 148, in __call__ data, var_dev_id, ctype = self._get_scalar_info(var, ptype) File "/home/alexhu/Source/MobulaOP/mobula/func.py", line 277, in _get_scalar_info var, ctypes.c_void_p) else ptype.ctype(var) TypeError: an integer is required (got type builtin_function_or_method)

ROIAlign custom op runs slowly

Hi, I have tried to use the ROIAlign custom op provided by the repo, and I run faster rcnn examples, I just simplily replace the symbol code:

roi_pool = mx.symbol.ROIPooling(name='roi_pool', data=conv_new_1_relu, rois=rois, 
           pooled_size=(7, 7), spatial_scale=spatial_scale)

with

roi_pool = mobula.op.ROIAlign(name='roi_pool', data=conv_new_1_relu, rois=rois,
           pooled_size=(7, 7), spatial_scale=spatial_scale, sampling_ratio=0)

the running speed decrease from 0.1s to 1~2s, and when use multi-gpu, the code cannot run parallel, and become much more slower.
My mxnet version is 1.3.0-cu92 from pip install. What might be the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.