Code Monkey home page Code Monkey logo

Comments (4)

shinobiultra avatar shinobiultra commented on September 22, 2024 3

[pip3] torch==2.1.2
[conda] pytorch 2.2.1 py3.11_cuda12.1_cudnn8.9.2_0 pytorch

There's something unexpected going on here. Try from a fresh env?

Oh my god this is it! I've tried my hardest not to mix & match pip / conda but somehow this completely eluded me. Simple pip uninstall torch and keeping the conda's version fixed the issue completely.

Thank you so much for unbelievable reply speed and your time, closed!

from vision.

NicolasHug avatar NicolasHug commented on September 22, 2024 1

[pip3] torch==2.1.2
[conda] pytorch 2.2.1 py3.11_cuda12.1_cudnn8.9.2_0 pytorch

There's something unexpected going on here. Try from a fresh env?

from vision.

NicolasHug avatar NicolasHug commented on September 22, 2024

Note I installed with CUDA 12.1 support but the environment pasted below (from collect_env) is from a CPU only server

This might be why. Can you try installing the CPU-only version of torch/torchvision e.g.



conda install pytorch torchvision torchaudio cpuonly -c pytorch

from vision.

shinobiultra avatar shinobiultra commented on September 22, 2024

Note I installed with CUDA 12.1 support but the environment pasted below (from collect_env) is from a CPU only server

This might be why. Can you try installing the CPU-only version of torch/torchvision e.g.



conda install pytorch torchvision torchaudio cpuonly -c pytorch

Well, I debug on CPU-only instances, then switch to GPUs when needed. So I just switched to a GPU instance now and the problem remains unchanged. See the result of the three imports (from the post) and collect_env output below:

Error:

{
	"name": "ValueError",
	"message": "Could not find the operator torchvision::nms. Please make sure you have already registered the operator and (if registered from C++) loaded it via torch.ops.load_library.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 3
      1 import torch
      2 import torch.nn as nn
----> 3 import torchvision.transforms.v2 as transforms

File ~/.conda/envs/TOMI_Conda/lib/python3.11/site-packages/torchvision/__init__.py:6
      3 from modulefinder import Module
      5 import torch
----> 6 from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
      8 from .extension import _HAS_OPS
     10 try:

File ~/.conda/envs/TOMI_Conda/lib/python3.11/site-packages/torchvision/_meta_registrations.py:163
    153     torch._check(
    154         grad.dtype == rois.dtype,
    155         lambda: (
   (...)
    158         ),
    159     )
    160     return grad.new_empty((batch_size, channels, height, width))
--> 163 @torch._custom_ops.impl_abstract(\"torchvision::nms\")
    164 def meta_nms(dets, scores, iou_threshold):
    165     torch._check(dets.dim() == 2, lambda: f\"boxes should be a 2d tensor, got {dets.dim()}D\")
    166     torch._check(dets.size(1) == 4, lambda: f\"boxes should have 4 elements in dimension 1, got {dets.size(1)}\")

File ~/.local/lib/python3.11/site-packages/torch/_custom_ops.py:253, in impl_abstract.<locals>.inner(func)
    252 def inner(func):
--> 253     custom_op = _find_custom_op(qualname, also_check_torch_library=True)
    254     custom_op.impl_abstract(_stacklevel=3)(func)
    255     return func

File ~/.local/lib/python3.11/site-packages/torch/_custom_op/impl.py:1076, in _find_custom_op(qualname, also_check_torch_library)
   1072 if not also_check_torch_library:
   1073     raise RuntimeError(
   1074         f\"Could not find custom op \\\"{qualname}\\\". Did you register it via \"
   1075         f\"the torch._custom_ops API?\")
-> 1076 overload = get_op(qualname)
   1077 result = custom_op_from_existing(overload)
   1078 return result

File ~/.local/lib/python3.11/site-packages/torch/_custom_op/impl.py:1062, in get_op(qualname)
   1060 opnamespace = getattr(torch.ops, ns)
   1061 if not hasattr(opnamespace, name):
-> 1062     error_not_found()
   1063 packet = getattr(opnamespace, name)
   1064 if not hasattr(packet, 'default'):

File ~/.local/lib/python3.11/site-packages/torch/_custom_op/impl.py:1052, in get_op.<locals>.error_not_found()
   1051 def error_not_found():
-> 1052     raise ValueError(
   1053         f\"Could not find the operator {qualname}. Please make sure you have \"
   1054         f\"already registered the operator and (if registered from C++) \"
   1055         f\"loaded it via torch.ops.load_library.\")

ValueError: Could not find the operator torchvision::nms. Please make sure you have already registered the operator and (if registered from C++) loaded it via torch.ops.load_library."
}

collect_env:

Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: linux (x86_64)
GCC version: (GCC) 10.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.11.4 (main, Jul  5 2023, 14:15:25) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-513.9.1.el8_9.x86_64-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: 12.2.91
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 550.54.14
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             72
On-line CPU(s) list:                0-71
Thread(s) per core:                 2
Core(s) per socket:                 18
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Stepping:                           4
CPU MHz:                            3700.000
CPU max MHz:                        3700.0000
CPU min MHz:                        1200.0000
BogoMIPS:                           5400.00
Virtualization:                     VT-x
L1d cache:                          1.1 MiB
L1i cache:                          1.1 MiB
L2 cache:                           36 MiB
L3 cache:                           49.5 MiB
NUMA node0 CPU(s):                  0-17,36-53
NUMA node1 CPU(s):                  18-35,54-71
Vulnerability Gather data sampling: Vulnerable: No microcode
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                  Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke flush_l1d

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.3
[pip3] numpydoc==1.6.0
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py311h5eee18b_1  
[conda] mkl_fft                   1.3.8           py311h5eee18b_0  
[conda] mkl_random                1.2.4           py311hdb19cb5_0  
[conda] numpy                     1.26.4          py311h08b1b3b_0  
[conda] numpy-base                1.26.4          py311hf175353_0  
[conda] numpydoc                  1.6.0              pyhd8ed1ab_0    conda-forge
[conda] pytorch                   2.2.1           py3.11_cuda12.1_cudnn8.9.2_0    pytorch
[conda] pytorch-cuda              12.1                 ha16c6d3_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                2.2.1               py311_cu121    pytorch
[conda] torchinfo                 1.8.0              pyhd8ed1ab_0    conda-forge
[conda] torchtriton               2.2.0                     py311    pytorch
[conda] torchvision               0.17.1              py311_cu121    pytorch

from vision.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.