Code Monkey home page Code Monkey logo

pytorch_open_registration_example's People

Contributors

bdhirsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pytorch_open_registration_example's Issues

Failure to run and build in nightly 2.4

Several issues

  1. issues with changed const and missing copy_data from Allocator class
  2. torch.register_privateuse1_backend -> +torch.utils.rename_privateuse1_backend

However even with all that I still get failure during runtime:

Traceback (most recent call last):
  File "/tmp/pytorch_open_registration_example/open_registration_example.py", line 102, in <module>
    x = torch.ones(4, 4, device='foo:2')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'torch.foo'

My modifications:

diff --git a/cpp_extensions/open_registration_extension.cpp b/cpp_extensions/open_registration_extension.cpp
index b666253..89d4486 100644
--- a/cpp_extensions/open_registration_extension.cpp
+++ b/cpp_extensions/open_registration_extension.cpp
@@ -43,7 +43,7 @@ at::Tensor custom_add_Tensor(const at::Tensor & self, const at::Tensor & other,
 // A dummy allocator for our custom device, that secretly uses the CPU
 struct DummyCustomAllocator final : at::Allocator {
   DummyCustomAllocator() = default;
-  at::DataPtr allocate(size_t nbytes) const override {
+  at::DataPtr allocate(size_t nbytes) override {
     std::cout << "Custom allocator's allocate() called!" << std::endl;
     void* data = c10::alloc_cpu(nbytes);
     return {data, data, &ReportAndDelete, at::Device(at::DeviceType::PrivateUse1, 0)};
@@ -57,6 +57,11 @@ struct DummyCustomAllocator final : at::Allocator {
     c10::free_cpu(ptr);
   }
 
+  void copy_data(void* dest, const void* src, std::size_t count) const override
+  {
+        default_copy_data(dest,src,count);
+  }
+
   at::DeleterFnPtr raw_deleter() const override {
     return &ReportAndDelete;
   }
diff --git a/open_registration_example.py b/open_registration_example.py
index 9b31d69..1c1e048 100644
--- a/open_registration_example.py
+++ b/open_registration_example.py
@@ -88,7 +88,7 @@ def test(x, y):
 # Option 1: Use torch.register_privateuse1_backend("foo"), which will allow
 # "foo" as a device string to work seamlessly with pytorch's API's.
 # You may need a more recent nightly of PyTorch for this.
-torch.register_privateuse1_backend('foo')
+torch.utils.rename_privateuse1_backend('foo')
 
 # Show that in general, passing in a custom device string will fail.
 try:
``

core dumped error

I cloned the project and run by python3.8.
It runs with core dumped error like follow:

[root@ffef2eaecd50 pytorch_open_registration_example]# python open_registration_example.py
Using /root/.cache/torch_extensions/py38_cpu as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py38_cpu/custom_device_extension...
Emitting ninja build file /root/.cache/torch_extensions/py38_cpu/custom_device_extension/build.ninja...
Building extension module custom_device_extension...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF open_registration_extension.o.d -DTORCH_EXTENSION_NAME=custom_device_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/root/pytorch_open_registration_example/cpp_extensions -isystem /usr/local/lib64/python3.8/site-packages/torch/include -isystem /usr/local/lib64/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib64/python3.8/site-packages/torch/include/TH -isystem /usr/local/lib64/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -g -c /root/pytorch_open_registration_example/cpp_extensions/open_registration_extension.cpp -o open_registration_extension.o
[2/2] c++ open_registration_extension.o -shared -L/usr/local/lib64/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o custom_device_extension.so
Loading extension module custom_device_extension...
terminate called after throwing an instance of 'c10::Error'
  what():
Mismatch in kernel C++ signatures
  operator: aten::empty.memory_format(int[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None, int? memory_format=None) -> Tensor
    registered at /root/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  kernel 1: at::Tensor (c10::SymIntArrayRef, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>)
    dispatch key: BackendSelect
    registered at /root/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:726
  kernel 2: at::Tensor (c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>)
    dispatch key: PrivateUse1
    registered at /root/pytorch_open_registration_example/cpp_extensions/open_registration_extension.cpp:258

Exception raised from registerKernel at /root/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:117 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7fb9c5896855 in /usr/local/lib64/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7fb9c58938c1 in /usr/local/lib64/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::impl::OperatorEntry::registerKernel(c10::Dispatcher const&, c10::optional<c10::DispatchKey>, c10::KernelFunction, c10::optional<c10::impl::CppSignature>, std::unique_ptr<c10::FunctionSchema, std::default_delete<c10::FunctionSchema> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x341 (0x7fb9c6a90521 in /usr/local/lib64/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: c10::Dispatcher::registerImpl(c10::OperatorName, c10::optional<c10::DispatchKey>, c10::KernelFunction, c10::optional<c10::impl::CppSignature>, std::unique_ptr<c10::FunctionSchema, std::default_delete<c10::FunctionSchema> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x168 (0x7fb9c6a844e8 in /usr/local/lib64/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::Library::_impl(char const*, torch::CppFunction&&) & + 0x485 (0x7fb9c6abc3a5 in /usr/local/lib64/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::Library& torch::Library::impl<char const*, at::Tensor (*)(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>)>(char const*, at::Tensor (*&&)(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>)) & + 0x63 (0x7fb9c1043201 in /root/.cache/torch_extensions/py38_cpu/custom_device_extension/custom_device_extension.so)
frame #6: <unknown function> + 0x6e0bb (0x7fb9c10290bb in /root/.cache/torch_extensions/py38_cpu/custom_device_extension/custom_device_extension.so)
frame #7: torch::detail::TorchLibraryInit::TorchLibraryInit(torch::Library::Kind, void (*)(torch::Library&), char const*, c10::optional<c10::DispatchKey>, char const*, unsigned int) + 0x8b (0x7fb9c10308d3 in /root/.cache/torch_extensions/py38_cpu/custom_device_extension/custom_device_extension.so)
frame #8: <unknown function> + 0x6fe2c (0x7fb9c102ae2c in /root/.cache/torch_extensions/py38_cpu/custom_device_extension/custom_device_extension.so)
frame #9: <unknown function> + 0x6fe67 (0x7fb9c102ae67 in /root/.cache/torch_extensions/py38_cpu/custom_device_extension/custom_device_extension.so)
frame #10: <unknown function> + 0xfeca (0x7fb9d44aceca in /lib64/ld-linux-x86-64.so.2)
frame #11: <unknown function> + 0xffca (0x7fb9d44acfca in /lib64/ld-linux-x86-64.so.2)
frame #12: _dl_catch_exception + 0xdc (0x7fb9d2bc646c in /lib64/libc.so.6)
frame #13: <unknown function> + 0x14108 (0x7fb9d44b1108 in /lib64/ld-linux-x86-64.so.2)
frame #14: _dl_catch_exception + 0x84 (0x7fb9d2bc6414 in /lib64/libc.so.6)
frame #15: <unknown function> + 0x138fe (0x7fb9d44b08fe in /lib64/ld-linux-x86-64.so.2)
frame #16: <unknown function> + 0x11ba (0x7fb9d33d71ba in /lib64/libdl.so.2)
frame #17: _dl_catch_exception + 0x84 (0x7fb9d2bc6414 in /lib64/libc.so.6)
frame #18: _dl_catch_error + 0x33 (0x7fb9d2bc64d3 in /lib64/libc.so.6)
frame #19: <unknown function> + 0x1939 (0x7fb9d33d7939 in /lib64/libdl.so.2)
frame #20: dlopen + 0x4a (0x7fb9d33d725a in /lib64/libdl.so.2)
<omitting python frames>

Aborted (core dumped)

My PyTorch version is 1.13.0a0+git97b2dff.
Can you do me a favor? Thank you!

Error : Could not run aten::normal_' with arguments from the 'PrivateUse1' backend.

I have run the code successfully in this repository, but when I run the code below:

import torch
from utils.custom_device_mode import foo_module, enable_foo_device
a = torch.randn(4, device='privateuseone')

I meet the following error. How can I solve it?

Using /home/david/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Emitting ninja build file /home/david/.cache/torch_extensions/py38_cu121/custom_device_extension/build.ninja...
Building extension module custom_device_extension...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module custom_device_extension...
Custom aten::empty.memory_format() called!
Custom allocator's delete() called!
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    a = torch.randn(4, device='privateuseone')
NotImplementedError: Could not run 'aten::normal_' with arguments from the 'PrivateUse1' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, CUDA, Meta, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

CPU: registered at /home/david/PytorchTrans/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31085 [kernel]
CUDA: registered at /home/david/PytorchTrans/pytorch/build/aten/src/ATen/RegisterCUDA.cpp:44060 [kernel]
Meta: registered at /dev/null:219 [kernel]
SparseCsrCPU: registered at /home/david/PytorchTrans/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1128 [kernel]
SparseCsrCUDA: registered at /home/david/PytorchTrans/pytorch/build/aten/src/ATen/RegisterSparseCsrCUDA.cpp:1269 [kernel]
BackendSelect: fallthrough registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /home/david/PytorchTrans/pytorch/build/aten/src/ATen/RegisterFunctionalization_0.cpp:21491 [kernel]
Named: fallthrough registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp:4733 [kernel]
AutogradOther: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradCPU: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradCUDA: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradHIP: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradXLA: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradMPS: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradIPU: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradXPU: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradHPU: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradVE: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradLazy: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradMeta: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradMTIA: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradPrivateUse1: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradPrivateUse2: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradPrivateUse3: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
AutogradNestedTensor: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp:15862 [autograd kernel]
Tracer: registered at /home/david/PytorchTrans/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp:15894 [kernel]
AutocastCPU: fallthrough registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastCUDA: fallthrough registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/functorch/BatchRulesRandomness.cpp:383 [kernel]
Batched: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:37 [kernel]
FuncTorchGradWrapper: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /home/david/PytorchTrans/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.