Code Monkey home page Code Monkey logo

keops's Introduction

logo


PyPI version PyPI downloads CRAN version CRAN downloads

Please visit our website for documentation, contribution guidelines and tutorials.

Kernel Operations on the GPU, with autodiff, without memory overflows

The KeOps library lets you compute reductions of large arrays whose entries are given by a mathematical formula or a neural network. It combines efficient C++ routines with an automatic differentiation engine and can be used with Python (NumPy, PyTorch), Matlab and R.

It is perfectly suited to the computation of kernel matrix-vector products, K-nearest neighbors queries, N-body interactions, point cloud convolutions and the associated gradients. Crucially, it performs well even when the corresponding kernel or distance matrices do not fit into the RAM or GPU memory. Compared with a PyTorch GPU baseline, KeOps provides a x10-x100 speed-up on a wide range of geometric applications, from kernel methods to geometric deep learning.

Symbolic matrices

Why using KeOps? Math libraries represent most objects as matrices and tensors:

  • (a) Dense matrices. Variables are often encoded as dense numerical arrays Mi,j = M[i,j]. This representation is convenient and well-supported, but also puts a heavy load on the memories of our computers. Unfortunately, large arrays are expensive to move around and may not even fit in RAM or GPU memories.

    In practice, this means that a majority of scientific programs are memory-bound. Run times for most neural networks and mathematical computations are not limited by the raw capabilities of our CPUs and CUDA cores, but by the time-consuming transfers of large arrays from memory circuits to arithmetic computing units.

  • (b) Sparse matrices. To work around this problem, a common solution is to rely on sparse matrices: tensors that have few non-zero coefficients. We represent these objects using lists of indices (in,jn) and values Mn = Min,jn that correspond to a small number of non-zero entries. Matrix-vector operations are then implemented with indexing methods and scattered memory accesses.

    This method is elegant and allows us to represent large arrays with a small memory footprint. But unfortunately, it does not stream well on GPUs: parallel computing devices are wired to perform block-wise memory accesses and have a hard time dealing with lists of random indices (in,jn). As a consequence, when compared with dense arrays, sparse encodings only speed up computations for matrices that have less than 1% non-zero coefficients. This restrictive condition prevents sparse matrices from being very useful outside of graph and mesh processing.

  • (c) Symbolic matrices. KeOps provides another solution to speed up tensor programs. Our key remark is that most of the large arrays that are used in machine learning and applied mathematics share a common mathematical structure. Distance matrices, kernel matrices, point cloud convolutions and attention layers can all be described as symbolic tensors: given two collections of vectors (xi) and (yj), their coefficients Mi,j at location (i,j) are given by mathematical formulas F(xi,yj) that are evaluated on data samples xi and yj.

    These objects are not "sparse" in the traditional sense... but can nevertheless be described efficiently using a mathematical formula F and relatively small data arrays (xi) and (yj). The main purpose of the KeOps library is to provide support for this abstraction with all the perks of a deep learning library:

    • A transparent interface with CPU and GPU integration.
    • Numerous tutorials and benchmarks.
    • Full support for automatic differentiation, batch processing and approximate computations.

In practice, KeOps symbolic tensors are both fast and memory-efficient. We take advantage of the structure of CUDA registers to bypass costly memory transfers between arithmetic and memory circuits. This allows us to provide a x10-x100 speed-up to PyTorch GPU programs in a wide range of settings.

Using our Python interface, a typical sample of code looks like:

# Create two arrays with 3 columns and a (huge) number of lines, on the GPU
import torch  # NumPy, Matlab and R are also supported
M, N, D = 1000000, 2000000, 3
x = torch.randn(M, D, requires_grad=True).cuda()  # x.shape = (1e6, 3)
y = torch.randn(N, D).cuda()                      # y.shape = (2e6, 3)

# Turn our dense Tensors into KeOps symbolic variables with "virtual"
# dimensions at positions 0 and 1 (for "i" and "j" indices):
from pykeops.torch import LazyTensor
x_i = LazyTensor(x.view(M, 1, D))  # x_i.shape = (1e6, 1, 3)
y_j = LazyTensor(y.view(1, N, D))  # y_j.shape = ( 1, 2e6,3)

# We can now perform large-scale computations, without memory overflows:
D_ij = ((x_i - y_j)**2).sum(dim=2)  # Symbolic (1e6,2e6,1) matrix of squared distances
K_ij = (- D_ij).exp()               # Symbolic (1e6,2e6,1) Gaussian kernel matrix

# We come back to vanilla PyTorch Tensors or NumPy arrays using
# reduction operations such as .sum(), .logsumexp() or .argmin()
# on one of the two "symbolic" dimensions 0 and 1.
# Here, the kernel density estimation   a_i = sum_j exp(-|x_i-y_j|^2)
# is computed using a CUDA scheme that has a linear memory footprint and
# outperforms standard PyTorch implementations by two orders of magnitude.
a_i = K_ij.sum(dim=1)  # Genuine torch.cuda.FloatTensor, a_i.shape = (1e6, 1), 

# Crucially, KeOps fully supports automatic differentiation!
g_x = torch.autograd.grad((a_i ** 2).sum(), [x])

KeOps allows you to get the most out of your hardware without compromising on usability. It provides:

  • Linear (instead of quadratic) memory footprint for numerous types of computations.
  • Support for a wide range of mathematical formulas that can be composed at will.
  • Seamless computation of derivatives and gradients, up to arbitrary orders.
  • Sum, LogSumExp, Min, Max but also ArgMin, ArgMax or K-min reductions.
  • A conjugate gradient solver for large-scale spline interpolation and Gaussian process regression.
  • Transparent integration with standard packages, such as the SciPy solvers for linear algebra.
  • An interface for block-sparse and coarse-to-fine strategies.
  • Support for multi GPU configurations.

More details are provided below:

Projects using KeOps

Symbolic matrices are to geometric learning what sparse matrices are to graph processing. KeOps can thus be used in a wide range of settings, from shape analysis (registration, geometric deep learning, optimal transport...) to machine learning (kernel methods, k-means, UMAP...), Gaussian processes, computational biology and physics.

KeOps provides core routines for the following projects and libraries:

  • GPyTorch (from the universities of Cornell, Columbia, Pennsylvania) and Falkon (from the university of Genoa and the Sierra Inria team), two libraries for Gaussian Process regression that now scale up to billion-scale datasets.

  • Deformetrica, a computational anatomy software from the Aramis Inria team.

  • The Gudhi library for topological data analysis and higher dimensional geometry understanding, from the DataShape Inria team.

  • GeomLoss, a PyTorch package for Chamfer (Hausdorff) distances, Kernel (Sobolev) divergences and Earth Mover's (Wasserstein) distances. It provides optimal transport solvers that scale up to millions of samples in seconds.

  • The deep graph matching consensus module, for learning and refining structural correspondences between graphs.

  • FshapesTk and the Shapes toolbox, two research-oriented LDDMM toolkits.

  • HyenaDNA for parallel computations of the Vandermonde matrix multiplication kernel and reductions used in the S4D kernel.

Licensing, citation, academic use

This library is licensed under the permissive MIT license, which is fully compatible with both academic and commercial applications.

If you use this code in a research paper, please cite our original publication:

Charlier, B., Feydy, J., Glaunès, J. A., Collin, F.-D. & Durif, G. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research 22, 1–6 (2021).

@article{JMLR:v22:20-275,
  author  = {Benjamin Charlier and Jean Feydy and Joan Alexis Glaunès and François-David Collin and Ghislain Durif},
  title   = {Kernel Operations on the GPU, with Autodiff, without Memory Overflows},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {74},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v22/20-275.html}
}

For applications to geometric (deep) learning, you may also consider our NeurIPS 2020 paper:

@article{feydy2020fast,
    title={Fast geometric learning with symbolic matrices},
    author={Feydy, Jean and Glaun{\`e}s, Joan and Charlier, Benjamin and Bronstein, Michael},
    journal={Advances in Neural Information Processing Systems},
    volume={33},
    year={2020}
}

Authors

Please contact us for any bug report, question or feature request by filing a report on our GitHub issue tracker!

Core library - KeOps, PyKeOps, KeOpsLab:

R bindings - RKeOps:

Contributors:

Beyond explicit code contributions, KeOps has grown out of numerous discussions with applied mathematicians and machine learning experts. We would especially like to thank Alain Trouvé, Stanley Durrleman, Gabriel Peyré and Michael Bronstein for their valuable suggestions and financial support.

KeOps was awarded an open science prize by the French Ministry of Higher Education and Research in 2023 ("Espoir - Documentation").

keops's People

Contributors

adam-coogan avatar amelievernay avatar bcharlier avatar chloesrcb avatar davidlapous avatar djsutherland avatar dogukantai avatar dvolgyes avatar fradav avatar fwilliams avatar gdurif avatar haguettaz avatar jeanfeydy avatar joanglaunes avatar keckj avatar kpoeppel avatar kshitij12345 avatar louis-pujol avatar mdiazmel avatar mvinyard avatar rubenalv avatar tanglef avatar turakar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keops's Issues

Compile Issue - No module named 'libKeOpstorchd5f55273e3'

I'm trying to get KeOps working and I'm having compiler issues that appear to be related to compiling Pytorch bindings.

I read through similar issues (https://github.com/getkeops/keops/issues/28, https://github.com/getkeops/keops/issues/49, https://github.com/getkeops/keops/issues/8), but these mention fixes in v1.4 release. I'm running into the following issue using the v1.4 pip install.

Test script:

import pykeops
pykeops.verbose = True
pykeops.build_type = 'Debug'
pykeops.clean_pykeops()
pykeops.test_torch_bindings()

Terminal output:

Compiling libKeOpstorchd5f55273e3 in /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3);
       dtype  : float32
... -- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Autodetected CUDA architecture(s):  3.7
-- Using shared_obj_name: libKeOpstorchd5f55273e3
-- First i variables detected is 0
-- First j variables detected is 1
-- Compiled formula is Sum_Reduction(SqNorm2(x - y),1); auto x = Vi(0,3); auto y = Vj(1,3); where the number of args is 2.
-- Found PythonInterp: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/python3.6 (found suitable version "3.6.5", minimum required is "3.6")
-- Found PythonLibs: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/libpython3.6m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(79): error: inline specifier allowed on function declarations only

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: argument list for class template "std::pair" is missing

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ")"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: template parameter "_T1" may not be redeclared in this scope

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ";"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/utility(366): error: inline specifier allowed on function declarations only

6 errors detected in the compilation of "/tmp/tmpxft_00002eb0_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake:279 (message):
  Error generating file
  /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorchd5f55273e3.dir/rule] Error 2
make: *** [libKeOpstorchd5f55273e3] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorchd5f55273e3', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorchd5f55273e3
make[1]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E cmake_progress_start /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorchd5f55273e3.dir/all
make[2]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/depend
make[3]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E make_directory /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/.
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Debug -D generated_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.cubin.txt -P /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
-- Generating dependency file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D input_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/local/lib/python3.5/dist-packages/cmake/data/share/cmake-3.13/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp to /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E copy_if_different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp and /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:331: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all' failed
make[2]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:306: recipe for target 'CMakeFiles/libKeOpstorchd5f55273e3.dir/rule' failed
make[1]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
Makefile:196: recipe for target 'libKeOpstorchd5f55273e3' failed

--------------------- ----------- -----------------
Done.
Compiling libKeOpstorchd5f55273e3 in /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3);
       dtype  : float32
... -- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Autodetected CUDA architecture(s):  3.7
-- Using shared_obj_name: libKeOpstorchd5f55273e3
-- First i variables detected is 0
-- First j variables detected is 1
-- Compiled formula is Sum_Reduction(SqNorm2(x - y),1); auto x = Vi(0,3); auto y = Vj(1,3); where the number of args is 2.
-- Found PythonInterp: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/python3.6 (found suitable version "3.6.5", minimum required is "3.6")
-- Found PythonLibs: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/libpython3.6m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(79): error: inline specifier allowed on function declarations only

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: argument list for class template "std::pair" is missing

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ")"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: template parameter "_T1" may not be redeclared in this scope

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ";"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/utility(366): error: inline specifier allowed on function declarations only

6 errors detected in the compilation of "/tmp/tmpxft_0000308a_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake:279 (message):
  Error generating file
  /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorchd5f55273e3.dir/rule] Error 2
make: *** [libKeOpstorchd5f55273e3] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorchd5f55273e3', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorchd5f55273e3
make[1]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E cmake_progress_start /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorchd5f55273e3.dir/all
make[2]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/depend
make[3]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E make_directory /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/.
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Debug -D generated_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.cubin.txt -P /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
-- Generating dependency file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D input_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/local/lib/python3.5/dist-packages/cmake/data/share/cmake-3.13/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp to /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E copy_if_different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp and /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:331: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all' failed
make[2]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:306: recipe for target 'CMakeFiles/libKeOpstorchd5f55273e3.dir/rule' failed
make[1]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
Makefile:196: recipe for target 'libKeOpstorchd5f55273e3' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/test/install.py", line 55, in test_torch_bindings
    if torch.allclose(my_conv(x, y).view(-1), torch.tensor(expected_res).type(torch.float32)):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorchd5f55273e3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Optimal Transport/compile_test.py", line 5, in <module>
    pykeops.test_torch_bindings()    # perform the compilation
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/test/install.py", line 66, in test_torch_bindings
    print(my_conv(x, y))
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorchd5f55273e3'

Compiler settings:

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
g++ -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

Compilers are installed locally using Anaconda.

cmake version 3.13.3

nvcc Install

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Any assistance with this would be helpful.

Double broadcasting strange behaviour

Hello. I am trying to do `double broadcasting', and I am getting some strange behavior.
Consider 4 point-clouds of 64 points in 3d space.

import torch 
from pykeops.torch import LazyTensor
l = torch.randn(4,64,3) #4 point-clouds

#Get the sum of distances-squared between each point cloud: 
l0= l[:,None,:,None,:]
l1= l[None,:,None,:,:]
print(((l0-l1)**2).sum(4).sum(3).sum(2))

#Now try with pykeops
l0=LazyTensor(l0)
l1=LazyTensor(l1)
print(((l0-l1)**2).sum(4).shape) #The reported size is wrong (!), but ...
print(((l0-l1)**2).sum(4).sum(3).sum(2)[:,:,0]) #... output is right, but with an extra singleton dimension.

Output

tensor([[21983.7969, 23900.2793, 23794.3164, 22217.5625],
        [23900.2773, 24554.8984, 24448.7891, 23295.2148],
        [23794.3164, 24448.7871, 23484.6074, 22879.8828],
        [22217.5625, 23295.2129, 22879.8828, 21782.1055]])
(4, 1, 64, 64)
tensor([[21983.7949, 23900.2793, 23794.3164, 22217.5625],
        [23900.2773, 24554.8984, 24448.7891, 23295.2129],
        [23794.3164, 24448.7871, 23484.6055, 22879.8828],
        [22217.5625, 23295.2129, 22879.8848, 21782.1055]])

Shared object compiled without CUDA support

Hi, teams! I met an issue in the test sample

issue

My compiling configuration is

g++ Version: 7.3.0
gcc Version: 7.3.0
cmake Version: 3.14.0
nvcc Version: 10.0.130
Nvidia Driver Version: 440.64
CUDA Version: 10.1
Pytorch Version: 1.3.1

Many thanks!

Support for multiprocessing within PyTorch data loaders

Hello. Can pykeops be used in subprocesses, i.e. in PyTorch DataLoader methods, please?

import torch
import pykeops.torch

def nearest(a,b):
    print(a.shape,b.shape)
    a=a[:,None,:]
    b=b[None,:,:]
    a= pykeops.torch.LazyTensor(a)
    b= pykeops.torch.LazyTensor(b)
    return ((a-b)**2).sum(2).argmin(1).flatten()

class dataset(torch.utils.data.Dataset):
    def __len__(self):
        return 10
    def __getitem__(self,k):
        return nearest(torch.randn(10,3),torch.randn(10,3))

for x in torch.utils.data.DataLoader(dataset(), batch_size=None, num_workers=0):
    print(x)
#This works with num_workers==0
    
for x in torch.utils.data.DataLoader(dataset(), batch_size=None, num_workers=1):
    print(x)
#This fails for num_workers==1
# RuntimeError: DataLoader worker (pid(s) 6743) exited unexpectedly

pytorch_scatter improvement

Hello there,

I would like to ask some questions about keops.

I have been using a lot pytorch geometric for my work on graphs.
It uses pytorch scatter has its core: https://github.com/rusty1s/pytorch_scatter
And the MessagePassing https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/nn/conv/message_passing.py which is using torch.select_index for the message.

I also found this paper implementing a smarter hierarchical scatter method
image

I was wondering if keops could be used to implement a symbolic message function and maybe also the HAG aggregation within a new pytorch_scatter.

I would not only reduce drastically the memory, but also could speed up training / inference.

What are you thoughts on that ?

Best,
Thomas Chaton.

pybind11 does not find python interpreter if not present in system path

This issue is similar to issue #49, but this time the shipped CMakeLists.txt does not find the python interpreter even if the version is supplied. At some point pybind11 invokes FindPythonInterp.cmake which fails because PYTHON_EXECUTABLE has not been defined and no matching python executable can be found in system path.

As the path to the running python interpreter can be obtained with sys.executable, a simple fix consists into inserting '-DPYTHON_EXECUTABLE=' + sys.executable, to pykeops/common/compile_routines.py:54.

--- compile_routines.py	2020-05-05 13:47:43.688013050 +0000
+++ compile_routines.py	2020-05-05 13:48:17.340202073 +0000
@@ -51,6 +51,7 @@
                      '-Dshared_obj_name=' + dllname,
                      '-D__TYPE__=' + c_type[dtype],
                      '-DPYTHON_LANG=' + lang,
+                     '-DPYTHON_EXECUTABLE=' + sys.executable,
                      '-DPYBIND11_PYTHON_VERSION=' + str(sys.version_info.major) + '.' +str(sys.version_info.minor),
                      '-DC_CONTIGUOUS=1',
                     ] + optional_flags

unary('Max')

I was using ((LazyTensor(XX[:,None,:])-LazyTensor(XX[None,:,:]))**2).sum(-1) for norm L^2 and naively tried (LazyTensor(XX[:,None,:])-LazyTensor(XX[None,:,:])).abs().max(-1) for the sup norm, but it doesn't seem to be implemented. Adding a struct Max (similar to Sum) that derives from UnaryOp lets me call unary('Max',dimres=1) and seems to work. The only tricky parts are avoiding std::max (or other host functions) and getting the initial value right (I kept 0 which was fine for a sup norm, but in general I guess it should be -infinity or lowest, from numeric_limits, depending on the type), unless we take advantage of F::DIM>0, initialize with outF[0], and start the iteration from k=1.
Is max (and min) missing because of a lack of time and demand, or is there some reason why it would be a bad idea?

This KeOps shared object has been compiled without cuda - Failed to build bindings

Hello friends,
Thanks a lot for keops, amazing library, and great examples:)
Unfortunately, I couldn't build this example, nor the pykeops.test_torch_bindings()

Following are my machine specs, and the error itself
nvcc version -

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

gcc

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 

g++

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

clang

clang version 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

cmake

cmake version 3.10.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

pytorch 1.4

(testenv2) name@station:~/repos/docBert$ python
Python 3.7.7 (default, May  6 2020, 10:21:04) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pykeops
>>> 
>>> pykeops.verbose = True
>>> pykeops.clean_pykeops()  
/home/name/.cache/pykeops-1.4-cpython-37/libKeOpstorchc33cb27a33.so has been removed.
/home/name/.cache/pykeops-1.4-cpython-37/libKeOpstorchc33cb27a33.cpython-37m-x86_64-linux-gnu.so has been removed.
>>> pykeops.test_torch_bindings() 
Compiling libKeOpstorch11f5758313 in /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float32
... -- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using shared_obj_name: libKeOpstorch11f5758313
-- First i variables detected is 0
-- First j variables detected is 1
-- Compiled formula is Sum_Reduction(SqNorm2(x - y),1); auto x = Vi(0,3); auto y = Vj(1,3); where the number of args is 2.
-- Found PythonInterp: /home/name/anaconda3/envs/testenv2/bin/python3.7 (found suitable version "3.7.7", minimum required is "3.7") 
-- Found PythonLibs: /home/name/anaconda3/envs/testenv2/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313

/usr/bin/cmake -H/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch11f5758313
make[1]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles 4
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch11f5758313.dir/all
make[2]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch11f5758313.dir/build.make CMakeFiles/keopslibKeOpstorch11f5758313.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
cd /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/DependInfo.cmake --color=
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/DependInfo.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/depend.internal".
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/depend.internal".
Scanning dependencies of target keopslibKeOpstorch11f5758313
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch11f5758313.dir/build.make CMakeFiles/keopslibKeOpstorch11f5758313.dir/build
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[ 25%] Building CXX object CMakeFiles/keopslibKeOpstorch11f5758313.dir/keops/core/link_autodiff.cpp.o
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++  -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch11f5758313 -DSUM_SCHEME=1 -DUSE_CUDA=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D__TYPEACC__=float -D__TYPE__=float -DkeopslibKeOpstorch11f5758313_EXPORTS -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include  -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -fPIC   -include libKeOpstorch11f5758313.h -std=gnu++14 -o CMakeFiles/keopslibKeOpstorch11f5758313.dir/keops/core/link_autodiff.cpp.o -c /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cpp
[ 50%] Linking CXX shared library libKeOpstorch11f5758313.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/keopslibKeOpstorch11f5758313.dir/link.txt --verbose=1
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++ -fPIC -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/name/anaconda3/envs/testenv/lib -Wl,-rpath-link,/home/name/anaconda3/envs/testenv/lib -L/home/name/anaconda3/envs/testenv/lib -shared -Wl,-soname,libKeOpstorch11f5758313.so -o libKeOpstorch11f5758313.so CMakeFiles/keopslibKeOpstorch11f5758313.dir/keops/core/link_autodiff.cpp.o 
/usr/bin/cmake -E copy /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/libKeOpstorch11f5758313.so /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/../
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[ 50%] Built target keopslibKeOpstorch11f5758313
/usr/bin/make -f CMakeFiles/libKeOpstorch11f5758313.dir/build.make CMakeFiles/libKeOpstorch11f5758313.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
cd /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/DependInfo.cmake --color=
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/DependInfo.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/depend.internal".
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/depend.internal".
Scanning dependencies of target libKeOpstorch11f5758313
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/make -f CMakeFiles/libKeOpstorch11f5758313.dir/build.make CMakeFiles/libKeOpstorch11f5758313.dir/build
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[ 75%] Building CXX object CMakeFiles/libKeOpstorch11f5758313.dir/torch/generic/generic_red.cpp.o
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++  -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch11f5758313 -DSUM_SCHEME=1 -DUSE_CUDA=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D__TYPEACC__=float -D__TYPE__=float -DlibKeOpstorch11f5758313_EXPORTS -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/pybind11/include -I/home/name/anaconda3/envs/testenv2/include/python3.7m  -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -fPIC -fvisibility=hidden   -flto -fno-fat-lto-objects -include torch_headers.h -std=gnu++14 -o CMakeFiles/libKeOpstorch11f5758313.dir/torch/generic/generic_red.cpp.o -c /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.cpp
[100%] Linking CXX shared module libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/libKeOpstorch11f5758313.dir/link.txt --verbose=1
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++ -fPIC -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -Wl,-rpath,$ORIGIN -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/name/anaconda3/envs/testenv/lib -Wl,-rpath-link,/home/name/anaconda3/envs/testenv/lib -L/home/name/anaconda3/envs/testenv/lib -shared  -o libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so CMakeFiles/libKeOpstorch11f5758313.dir/torch/generic/generic_red.cpp.o -Wl,-rpath,/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 -flto libKeOpstorch11f5758313.so 
/usr/bin/strip /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E copy /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/../
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[100%] Built target libKeOpstorch11f5758313
make[2]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles 0
make[1]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'

Done.
Traceback (most recent call last):
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/test/install.py", line 55, in test_torch_bindings
    if torch.allclose(my_conv(x, y).view(-1), torch.tensor(expected_res).type(torch.float32)):
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 44, in forward
    result = myconv.genred_pytorch(tagCPUGPU, tag1D2D, tagHostDevice, device_id, ranges, *args)
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support: 
 1) to perform computations on CPU, simply set tagHostDevice to 0
 2) to perform computations on GPU, please recompile the formula with a working version of cuda.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/test/install.py", line 66, in test_torch_bindings
    print(my_conv(x, y))
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 44, in forward
    result = myconv.genred_pytorch(tagCPUGPU, tag1D2D, tagHostDevice, device_id, ranges, *args)
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support: 
 1) to perform computations on CPU, simply set tagHostDevice to 0
 2) to perform computations on GPU, please recompile the formula with a working version of cuda.

Eigen Values and eigen vectors with keops

I would like to implement a formula that involve eigen values and eigen vectors of the gram matrix. Do you think its possible to compute it with keops (e.g. using the KernelSolve) or not?

compilation error in backwards pass of sumsoftmaxweight

Hi,

thanks for the great library, I can still hardly believe the amazing performance.

When executing this script, I get an compilation error in the backwards pass through the sumsoftmaxweight reduction.

import torch
import pykeops
pykeops.verbose = True
from pykeops.torch import LazyTensor

N, D = 1000, 10
v = torch.randn((1, N, D), dtype=torch.float32, requires_grad=True).cuda()

v_i = LazyTensor(v[:, :, None])
v_j = LazyTensor(v[:, None, :])
D_ij = v_i - v_j

result = LazyTensor.sumsoftmaxweight(D_ij.sum(-1), D_ij, axis=1)

loss = (1. * result).sum()
print(f'loss: {loss}') # forward is succesful
loss.backward()

This is the output:

loss: 6528.2978515625
Compiling libKeOpstorch509fe71999 in /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37//build-libKeOpstorch509fe71999:
       formula: Grad_WithSavedForward(Max_SumShiftExpWeight_Reduction(Sum((Var(0,10,0) - Var(1,10,1))),1,Concat(IntCst(1),(Var(0,10,0) - Var(1,10,1)))), Var(0,10,0), Var(2,12,1), Var(3,12,1))
       aliases: Var(0,10,0); Var(1,10,1); Var(2,12,1); Var(3,12,1); 
       dtype  : float32
... /home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Extract.h(23): error: static assertion failed with "Index out of bound in Extract"
          detected during:
            instantiation of class "keops::Extract<F, START, DIM_> [with F=keops::Extract<keops::Var<2, 12, 1>, 1, 11>, START=1, DIM_=11]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(134): here
            instantiation of class "keops::Subtract_Alias<FA, keops::Zero<DIM>> [with FA=keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, DIM=10]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(23): here
            instantiation of type "keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(48): here
            instantiation of type "keops::Subtract_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with FA=keops::Var<0, 10, 0>, FB=keops::Var<1, 10, 1>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(37): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffTG<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(40): here
            [ 2 instantiation contexts not shown ]
            instantiation of type "keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(72): here
            instantiation of type "keops::Sum_Reduction_Impl<F, tagI>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>, void> [with F=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, tagI=1]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Sum_Reduction<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Max_SumShiftExp_Reduction.h(114): here
            instantiation of type "keops::Max_SumShiftExp_Reduction<F, tagI, G_>::DiffT<keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>> [with F=keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, tagI=1, G_=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(20): here
            instantiation of type "keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(134): error: static assertion failed with "Dimensions must be the same for Subtract"
          detected during:
            instantiation of class "keops::Subtract_Alias<FA, keops::Zero<DIM>> [with FA=keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, DIM=10]" 
(23): here
            instantiation of type "keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>" 
(48): here
            instantiation of type "keops::Subtract_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with FA=keops::Var<0, 10, 0>, FB=keops::Var<1, 10, 1>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(37): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffTG<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(40): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Scal.h(59): here
            instantiation of type "keops::Scal_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with FA=keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, FB=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(72): here
            instantiation of type "keops::Sum_Reduction_Impl<F, tagI>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>, void> [with F=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, tagI=1]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Sum_Reduction<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Max_SumShiftExp_Reduction.h(114): here
            instantiation of type "keops::Max_SumShiftExp_Reduction<F, tagI, G_>::DiffT<keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>> [with F=keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, tagI=1, G_=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(20): here
            instantiation of type "keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Add.h(152): error: static assertion failed with "Dimensions must be the same for Add"
          detected during:
            instantiation of class "keops::Add_Alias<keops::Zero<DIM>, FB> [with FB=keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>, DIM=10]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/norms/Scalprod.h(33): here
            instantiation of type "keops::Add<keops::Zero<10>, keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(40): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Scal.h(59): here
            instantiation of type "keops::Scal_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with FA=keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, FB=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(72): here
            instantiation of type "keops::Sum_Reduction_Impl<F, tagI>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>, void> [with F=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, tagI=1]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Sum_Reduction<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Max_SumShiftExp_Reduction.h(114): here
            instantiation of type "keops::Max_SumShiftExp_Reduction<F, tagI, G_>::DiffT<keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>> [with F=keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, tagI=1, G_=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(20): here
            instantiation of type "keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Add.h(41): error: static assertion failed with "Dimensions must be the same for Add"
          detected during:
            instantiation of class "keops::Add_Impl<FA, FB> [with FA=keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::SumT<keops::Mult<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Scalprod<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>>, 10>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::SumT<keops::Mult<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Scalprod<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>>, 10>>>, FB=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Add<keops::Zero<10>, keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Reduction.h(26): here
            instantiation of class "keops::Reduction<F_, tagI_> [with F_=keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>, tagI_=0]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(22): here
            instantiation of class "keops::Sum_Reduction_Impl<F, tagI> [with F=keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>, tagI=0]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/pre_headers.h(40): here
            instantiation of class "keops::KeopsNS<F> [with F=keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>]" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

4 errors detected in the compilation of "/tmp/tmpxft_00003e29_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch509fe71999.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch509fe71999.dir/rule] Error 2
make: *** [libKeOpstorch509fe71999] Error 2
-- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=6;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 6.1 6.1 6.1 6.1 6.1 6.1 
-- Using shared_obj_name: libKeOpstorch509fe71999
-- Found PythonInterp: /home_sdc/rremme_tmp/anaconda3/envs/main/bin/python3.7 (found version "3.7.4") 
-- Found PythonLibs: /home_sdc/rremme_tmp/anaconda3/envs/main/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999


--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch509fe71999', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -B/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch509fe71999
make[1]: Entering directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
/usr/bin/cmake -H/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -B/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch509fe71999.dir/all
make[2]: Entering directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch509fe71999.dir/build.make CMakeFiles/keopslibKeOpstorch509fe71999.dir/depend
make[3]: Entering directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
cd /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core && /usr/bin/cmake -E make_directory /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/.
cd /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.cubin.txt -P /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
-- Generating dependency file: /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.0/bin/nvcc -M -D__CUDACC__ /home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -o /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorch509fe71999_EXPORTS -DMAXIDGPU=6 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch509fe71999 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch509fe71999.h -DNVCC -I/usr/local/cuda-10.0/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops -I/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp to /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend
-- Removing /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp and /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
/usr/local/cuda-10.0/bin/nvcc /home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorch509fe71999_EXPORTS -DMAXIDGPU=6 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch509fe71999 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch509fe71999.h -DNVCC -I/usr/local/cuda-10.0/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops -I/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include/torch/csrc/api/include
-- Removing /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorch509fe71999.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/keopslibKeOpstorch509fe71999.dir/all' failed
make[2]: Leaving directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpstorch509fe71999.dir/rule' failed
make[1]: Leaving directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
Makefile:118: recipe for target 'libKeOpstorch509fe71999' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "sumsoftmaxweight_bug.py", line 17, in <module>
    loss.backward()
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
    return self._forward_cls.backward(self, *args)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 123, in backward
    grad = genconv(formula_g, aliases_g, backend, dtype, device_id, ranges, *args_g)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
    ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch509fe71999'

I was able to run all the pytorch examples without any problems.

Efficiency of covariance calculation with high dimensionality

Hi,
First of all thank you for this amazing library.

I encountered a behaviour which was unexpected on my part and was wondering if there is anything I can do to fix it.
When calculating a kernel between N x D matrices with large D keops seems to slow down a lot.

Simple code sample to reproduce this is:

import torch
from pykeops.torch import Genred
import timeit

a = torch.randn(10000, 700, requires_grad=False, dtype=torch.float64)
c = torch.randn(10000, 700, requires_grad=False, dtype=torch.float64)
v = torch.randn(10000, 2, requires_grad=False, dtype=torch.float64)

formula = '(X|Y) * v'
aliases = [
    'X = Vi(%d)' % (a.shape[1]),
    'Y = Vj(%d)' % (c.shape[1]),
    'v = Vi(%d)' % (v.shape[1]),
]
mmv = Genred(formula, aliases, reduction_op='Sum', axis=1, dtype='float64')
mmv(a, c, v)

timeit.repeat("mmv(a, c, v, backend='GPU_1D'); torch.cuda.synchronize()", globals=globals(), number=1, repeat=5)
timeit.repeat('(a @ c.T) @ v', globals=globals(), number=1, repeat=5)

The keops function takes ~6 seconds to run (on the GPU) while the naive pytorch takes ~0.4s (on a 24-core CPU). I find this interesting since if we reduce D to e.g. 7 KeOps is massively faster!

I'm sure there is something simple that I am clearly missing. Please let me know if this is the case.
Thanks,
Giacomo

Pykeops cannot find cuda

Hello, great work !

With this sample code, pykeops cannot see the GPUs with cuda 9.1 and cmake 3.12.1, is that normal ?

(base) hjanati@drago3:~/code/nips19$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

I made sure I installed pykeops with all dependencies pykeops[full].

import numpy as np
import pykeops
pykeops.verbose = True
from pykeops.numpy import Genred

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9 ).reshape(-1, 3).astype('float32')

my_conv = Genred('-SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

Here is the output:

Compiling libKeOpsnumpy73a835aa5f in /home/parietal/hjanati/.cache/pykeops-1.0.2/:
       formula: Sum_Reduction(-SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... CMake Warning at /home/parietal/hjanati/miniconda3/share/cmake-3.14/Modules/FindCUDA.cmake:893 (message):
  Expecting to find librt for libcudart_static, but didn't find it.
Call Stack (most recent call first):
  keops/cuda.cmake:8 (find_package)
  CMakeLists.txt:11 (include)


-- No GPU detected. USE_CUDA set to FALSE.
-- Using shared_obj_name: libKeOpsnumpy73a835aa5f
-- pybind11 v2.2.4
-- Configuring done
-- Generating done
-- Build files have been written to: /home/parietal/hjanati/.cache/pykeops-1.0.2

Scanning dependencies of target keopslibKeOpsnumpy73a835aa5f
[ 25%] Building CXX object CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/keops/core/link_autodiff.cpp.o
[ 50%] Linking CXX shared library libKeOpsnumpy73a835aa5f.so
[ 50%] Built target keopslibKeOpsnumpy73a835aa5f
Scanning dependencies of target libKeOpsnumpy73a835aa5f
[ 75%] Building CXX object CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o
[100%] Linking CXX shared module libKeOpsnumpy73a835aa5f.cpython-36m-x86_64-linux-gnu.so
[100%] Built target libKeOpsnumpy73a835aa5f

Done. 
Traceback (most recent call last):
  File "keopstest.py", line 10, in <module>
    print(my_conv(x, y))
  File "/home/parietal/hjanati/miniconda3/lib/python3.6/site-packages/pykeops/numpy/generic/generic_red.py", line 224, in __call__
    out = self.myconv.genred_numpy(nx, ny, tagCpuGpu, tag1D2D, 0, device_id, ranges, *args)
RuntimeError: [KeOps]\xa0This KeOps shared object has been compiled without cuda support: 
 1) to perform computations on CPU, simply set tagHostDevice to 0
 2) to perform computations on GPU, please recompile the formula with a working version of cuda.

[Bug] Error when back-propagating through matmul with large dimension

When back-propagating through operations that involve matmul with large matrix dimension, I run into the following error:

RuntimeError: [KeOps] Arg number 6 : is not contiguous. Please provide 'contiguous' data array, as KeOps does not support strides. If you're getting this error in the 'backward' pass of a code using torch.sum() on the output of a KeOps routine, you should consider replacing 'a.sum()' with '(1. * a).sum()' or 'torch.dot(a.view(-1), torch.ones_like(a).view(-1))'. 

This happens at exactly dim 80, so I was able to trace this back to the special casing here:

if pykeops.gpu_available and v_.shape[-1] > 80 :
# custom method when last dim of v is large
# we have :
# K._shape = (batchdimsK,M,N,1)
# v_.shape = (batchdimsv,1,N,Nv)
# we expand v_ to get same shape as K :
v_ = self.tools.view(v_,[1]*(len(self._shape)-len(v_.shape))+list(v_.shape)) # (1,..,1,batchdimsv,1,N,Nv)
# (NB if K has less batch dims than v it does nothing)
# now we shift the Nv dim from last to first position
v_ = self.tools.permute(v_,[len(v_.shape)-1]+list(range(0,len(v_.shape)-1))) # (Nv,1,..,1,batchdimsv,1,N)
v_ = self.tools.contiguous(v_)
# we add a dummy dimension at the end (maybe not necessary ?)
v_ = self.tools.view(v_,list(v_.shape)+[1]) # (Nv,1,..,1,batchdimsv,1,N,1)
v_ = LazyTensor(v_)
Kv = (self*v_).sum(dim=len(v_._shape)-2) # (Nv,outbatchdims,M,1)
Kv = self.tools.permute(Kv,list(range(1,len(Kv.shape)))+[0]) # (outbatchdims,M,1,Nv)
Kv = self.tools.contiguous(Kv)
Kv = self.tools.view(Kv,list(Kv.shape[:-2])+[Kv.shape[-1]]) # (outbatchdims,M,Nv)

If I comment out that block and use the else for everything I don't observe this issue, so there must be something problematic going on there.

I don't have an stripped down repro, but the following, using gpytorch, is pretty concise:
keops_backward_issue.ipynb.txt

cc @jacobrgardner, @gpleiss

Issue when discrepancy between available CUDA device at build time / runtime

Hey, first off, thanks for the library !

I have had some weird issues today when trying to use a kernel on 'cuda:1' when the kernel was built on a machine with only 2 gpus. I run into this because I use a shared home filesystem (and hence shared .cache folder) on a cluster where I have access to machines with various number of GPUS.

Here is how to reproduce, on a machine with 2 GPUs:

test.py :

import torch
from pykeops.torch import LazyTensor

def test(data):
	neigh_state = LazyTensor(data[None, :, :])
	state = LazyTensor(data[:, None, :])
	all_distances = ((neigh_state - state) ** 2).sum(dim=2)
	return (- all_distances).logsumexp(dim=1)

tensor = torch.randn(10,128).to('cuda:0')
print(torch.cuda.device_count())
test(tensor)

run CUDA_VISIBLE_DEVICES=0 python test.py. This should build a kernel.
then change 'cuda:0' to 'cuda:1' in test.py
run python test.py.

This fails with error :
invalid Gpu device number. If the number of available Gpus is > 12, add required lines at the end of function SetGpuProps and recompile.

Recompiling is not a great option for me, as I might run different experiments using the same kernel but on machines with different number of available gpus.

minres solver

Hi,

Thank you for working on this amazing project! I've had a lot of luck using it for large-scale GP regression. I was wondering if there are any plans for implementing a minres solver? That would be very useful for kernels that are not necessarily positive definite. As an example, I'm interested in using KeOps for radial basis function interpolation with the conditionally positive definite cubic and thin-plate spline kernels. In this setting, you need to solve linear systems with a symmetric, but not positive definite, kernel matrix.

Best,
David

compilation error with test script

Dear keops team,
thank you for providing such an amazing package!
When trying to set up pykeops on one of my machines, I got a compilation error on the test scripts, which I do not understand.

Specifications:
Cuda 10.1
GCC 7.4.0
Python 3.7.4
Pykeops 1.2

Here is the script:

import numpy as np
import pykeops
pykeops.verbose = True

import pykeops.numpy as pknp

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9).reshape(-1, 3).astype('float32')

my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

Here is the output:

(keops_roman) sdamrich@sirherny:~/mirrored_code/mod_shift/keops$ python keops_numpy_test.py 
Compiling libKeOpsnumpy5ac3d464a2 in /home/sdamrich/.cache/pykeops-1.2-cpython-37//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... -- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 10.1.105
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 
-- Using shared_obj_name: libKeOpsnumpy5ac3d464a2
-- Found PythonInterp: /home/sdamrich/anaconda3/envs/keops_roman/bin/python3.7 (found version "3.7.4") 
-- Found PythonLibs: /home/sdamrich/anaconda3/envs/keops_roman/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2

/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6688:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’ without object
       __p->_M_set_sharable();
       ~~~~~~~~~^~
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
CMake Error at keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/rule] Error 2
make: *** [libKeOpsnumpy5ac3d464a2] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpsnumpy5ac3d464a2', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpsnumpy5ac3d464a2
make[1]: Entering directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/cmake -H/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/all
make[2]: Entering directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend
make[3]: Entering directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
cd /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -E make_directory /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/.
cd /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.cubin.txt -P /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
-- Generating dependency file: /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
-- Generating temporary cmake readable file: /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp to /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
-- Removing /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp and /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
-- Removing /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/all' failed
make[2]: Leaving directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/rule' failed
make[1]: Leaving directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
Makefile:118: recipe for target 'libKeOpsnumpy5ac3d464a2' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "keops_numpy_test.py", line 12, in <module>
    my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
  File "/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/numpy/generic/generic_red.py", line 114, in __init__
    self.myconv = LoadKEops(self.formula, self.aliases, self.dtype, 'numpy').import_module()
  File "/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpsnumpy5ac3d464a2'

When running the same script in the same conda environment on a different machine with older CUDA (9.2), everything works like a charm:

(keops_roman) sdamrich@sfb1129gpu02:~/keops$ python keops_numpy_test.py 
Compiling libKeOpsnumpy5ac3d464a2 in /export/home/sdamrich/.cache/pykeops-1.2-cpython-37//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... -- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=7;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152;-DMAXTHREADSPERBLOCK7=1024;-DSHAREDMEMPERBLOCK7=49152
-- The CUDA compiler identification is NVIDIA 9.2.148
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 6.1 6.1 6.1 6.1 6.1 6.1 6.1 
-- Using shared_obj_name: libKeOpsnumpy5ac3d464a2
-- Found PythonInterp: /export/home/sdamrich/anaconda3/envs/keops_roman/bin/python3.7 (found version "3.7.4") 
-- Found PythonLibs: /export/home/sdamrich/anaconda3/envs/keops_roman/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2

Generated /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o successfully.
/usr/bin/cmake -H/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpsnumpy5ac3d464a2
make[1]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/cmake -H/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/all
make[2]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -E make_directory /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/.
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.cubin.txt -P /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake
-- Removing /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
-- Generating dependency file: /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -o /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=7 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -DMAXTHREADSPERBLOCK7=1024 -DSHAREDMEMPERBLOCK7=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
-- Generating temporary cmake readable file: /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp to /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
-- Removing /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp and /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/local/cuda/bin/nvcc /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=7 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -DMAXTHREADSPERBLOCK7=1024 -DSHAREDMEMPERBLOCK7=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake --color=
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend.internal".
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend.internal".
Scanning dependencies of target keopslibKeOpsnumpy5ac3d464a2
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 40%] Linking CUDA device code CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/cmake_device_link.o
/usr/bin/cmake -E cmake_link_script CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/dlink.txt --verbose=1
/usr/local/cuda/bin/nvcc   -O3 -DNDEBUG -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -o CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/cmake_device_link.o  -L/usr/local/cuda/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda/targets/x86_64-linux/lib 
[ 60%] Linking CXX shared library libKeOpsnumpy5ac3d464a2.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC  -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3  -shared -Wl,-soname,libKeOpsnumpy5ac3d464a2.so -o libKeOpsnumpy5ac3d464a2.so CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/cmake_device_link.o  -L/usr/local/cuda/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda/targets/x86_64-linux/lib /usr/local/cuda/lib64/libcudart_static.a -lpthread -ldl /usr/lib/x86_64-linux-gnu/librt.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl 
/usr/bin/cmake -E copy /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.so /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/../
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 60%] Built target keopslibKeOpsnumpy5ac3d464a2
/usr/bin/make -f CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/depend
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake --color=
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/depend.internal".
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/depend.internal".
Scanning dependencies of target libKeOpsnumpy5ac3d464a2
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/build
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 80%] Building CXX object CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/numpy/generic/generic_red.cpp.o
/usr/bin/c++  -DCUDA_BLOCK_SIZE=192 -DC_CONTIGUOUS=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMAXIDGPU=7 -DMAXTHREADSPERBLOCK0=1024 -DMAXTHREADSPERBLOCK1=1024 -DMAXTHREADSPERBLOCK2=1024 -DMAXTHREADSPERBLOCK3=1024 -DMAXTHREADSPERBLOCK4=1024 -DMAXTHREADSPERBLOCK5=1024 -DMAXTHREADSPERBLOCK6=1024 -DMAXTHREADSPERBLOCK7=1024 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -DSHAREDMEMPERBLOCK0=49152 -DSHAREDMEMPERBLOCK1=49152 -DSHAREDMEMPERBLOCK2=49152 -DSHAREDMEMPERBLOCK3=49152 -DSHAREDMEMPERBLOCK4=49152 -DSHAREDMEMPERBLOCK5=49152 -DSHAREDMEMPERBLOCK6=49152 -DSHAREDMEMPERBLOCK7=49152 -DUSE_CUDA=1 -DUSE_DOUBLE=1 -D_FORCE_INLINES -D_GLIBCXX_USE_CXX11_ABI=0 -D__TYPE__=double -DlibKeOpsnumpy5ac3d464a2_EXPORTS -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 -I/usr/local/cuda/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/pybind11/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/include/python3.7m  -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -fPIC -fvisibility=hidden   -flto -fno-fat-lto-objects -include libKeOpsnumpy5ac3d464a2.h -std=gnu++14 -o CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/numpy/generic/generic_red.cpp.o -c /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/numpy/generic/generic_red.cpp
[100%] Linking CXX shared module libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC  -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -Wl,-rpath,$ORIGIN -shared  -o libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/numpy/generic/generic_red.cpp.o -Wl,-rpath,/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 -flto libKeOpsnumpy5ac3d464a2.so /usr/local/cuda/lib64/libcudart_static.a -lpthread -ldl -lrt 
/usr/bin/strip /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E copy /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/../
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[100%] Built target libKeOpsnumpy5ac3d464a2
make[2]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/cmake -E cmake_progress_start /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles 0
make[1]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'

Done.
[[63.]
 [90.]]

Do you have an idea on how I could get Keops to run on the first machine?

Feature suggestion: SIMD vectorization on CPU

Hi, thanks for the presentation you gave at Inria Parietal today ;)

I just wanted to give a heads up on https://github.com/QuantStack/xsimd which might be a useful tool to make kernel computation more efficient on modern CPUs which could be useful for people who don't have an nvidia GPU at hand.

Also you might be interested in xtensor by the same developers who provide a lazy C++ API for n-dimensional array manipulation:

https://github.com/QuantStack/xtensor

And also xeus / cling for interactive C++ development in jupyter notebook:

https://github.com/QuantStack/xeus-cling (interactive demo):

Matrix multiplication of two LazyTensors

Hello, thanks for the nice package!

I am running into a problem when trying to decompose a matrix as the product of two LazyTensors. The matrix I want to represent has shape [N, N] and can be expressed as the product of an [M, N] matrix and its transpose. The snippet below shows what I'd like to do in more detail:

import torch
from pykeops.torch import LazyTensor

# Set up inputs
M, N, d = 10, 5, 2
x, y = torch.rand([M, d]), torch.rand([N, d])

# Construct kernel matrix
x_i = LazyTensor(x[:, None, :])  # (M, 1, 2)
y_j = LazyTensor(y[None, :, :])  # (1, N, 2)
D_ij = ((x_i - y_j) ** 2).sum(-1)   # (M, N): squared distances
sqrt_K = (-D_ij).exp()

K = sqrt_K.t() @ sqrt_K  # does not work
# next step: run K.solve(...)

The last line does not work since __matmul__ calls view() on its argument, which is not supported by LazyTensor. I also don't see how to construct a reduction formula for the matrix K since it seems like these can only involve two indices, while three are needed here. Is there some other way I can construct the matrix K?

Prebuilt binaries do not work with pytorch v1.5.0

Torch 1.5.0 has been released four days ago and breaks the installation of pykeops using pip.

The following commands install pykeops using pip

pip3.6 install pykeops
pip3.7 install pykeops
pip3.8 install pykeops

but yield the same error on module import for every python version I tried.

This seems to be linked to a change of interface in the torch library:
pykeops-1.4-cpython-38/libKeOpstorch4770b04be2.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZTIN3c1021AutogradMetaInterfaceE

A simple temporary fix is to downgrade the version of torch manually using pip3 install --upgrade pykeops torch==1.4.0

Pytorch `.contiguous()` not enough to make tensors contiguous for keops?

Hi,

Suppose I have a PyTorch Tensor X. I then take a slice of this dataset and make it contiguous by X_ = X[:10].contiguous(). I then use Keops to do some computation on X_. According to the PyTorch docs, this should be enough to make the data contiguous since it basically clones the original Tensor.

Keops works fine if I compute using X. However, it won't work on X_ despite the contiguous call. Somehow Keops still sees that this isn't a contiguous array with the following message:

RuntimeError: [KeOps] Arg number 3 : is not contiguous. Please provide 'contiguous' data array, as KeOps does not support strides. If you're getting this error in the 'backward' pass of a code using torch.sum() on the output of a KeOps routine, you should consider replacing 'a.sum(
)' with '(1. * a).sum()' or 'torch.dot(a.view(-1), torch.ones_like(a).view(-1))'.

Is this expected? And is there another way of making my Tensor contiguous enough for Keops?

Library missing: libKeOpstorchf44721a1c0

I have tried to run: https://www.kernel-operations.io/keops/_auto_benchmarks/plot_benchmark_convolutions.html#sphx-glr-auto-benchmarks-plot-benchmark-convolutions-py

Timings for 10000x10000 convolutions:
kernel: gaussian
Compiling libKeOpstorchf44721a1c0 in /home/thomas/.cache/pykeops-1.2-cpython-36//build-libKeOpstorchf44721a1c0:
formula: Sum_Reduction((Exp( -(WeightedSqDist(G_0,X_0,Y_0))) * B_0),0)
aliases: G_0 = Pm(0,1); X_0 = Vi(1,3); Y_0 = Vj(2,3); B_0 = Vj(3,3);
dtype : float32
...
--------------------- CMAKE DEBUG -----------------
Command '['cmake', '/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops', '-DCMAKE_BUILD_TYPE=Release', '-DFORMULA_OBJ=Sum_Reduction((Exp( -(WeightedSqDist(G_0,X_0,Y_0))) * B_0),0)', '-DVAR_ALIASES=auto G_0 = Pm(0,1); auto X_0 = Vi(1,3); auto Y_0 = Vj(2,3); auto B_0 = Vj(3,3); ', '-Dshared_obj_name=libKeOpstorchf44721a1c0', '-D__TYPE__=float', '-DPYTHON_LANG=torch', '-DC_CONTIGUOUS=1', '-DPYTORCH_INCLUDE_DIR=/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include;/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include/torch/csrc/api/include', '-DcommandLine=cmake /home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops -DCMAKE_BUILD_TYPE=Release -DFORMULA_OBJ=Sum_Reduction((Exp( -(WeightedSqDist(G_0,X_0,Y_0))) * B_0),0) -DVAR_ALIASES=auto G_0 = Pm(0,1); auto X_0 = Vi(1,3); auto Y_0 = Vj(2,3); auto B_0 = Vj(3,3); -Dshared_obj_name=libKeOpstorchf44721a1c0 -D__TYPE__=float -DPYTHON_LANG=torch -DC_CONTIGUOUS=1 -DPYTORCH_INCLUDE_DIR=/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include;/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include/torch/csrc/api/include']' returned non-zero exit status 1.
-- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 9.1.85
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 7.5
-- Using shared_obj_name: libKeOpstorchf44721a1c0
-- Found PythonInterp: /home/thomas/.pyenv/shims/python3.7 (found version "1.4")
-- Configuring incomplete, errors occurred!
See also "/home/thomas/.cache/pykeops-1.2-cpython-36/build-libKeOpstorchf44721a1c0/CMakeFiles/CMakeOutput.log".
See also "/home/thomas/.cache/pykeops-1.2-cpython-36/build-libKeOpstorchf44721a1c0/CMakeFiles/CMakeError.log".


--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorchf44721a1c0', '--', 'VERBOSE=1']' returned non-zero exit status 2.


Done.

ModuleNotFoundError Traceback (most recent call last)
in
16 }
17
---> 18 g_keops = kernel_product(params, xc, yc, bc, mode='sum').cpu()
19 torch.cuda.synchronize()
20 speed_pykeops[k] = np.array(timeit.repeat(

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/kernel_product/kernels.py in kernel_product(params, x, y, mode, backend, dtype, cuda_type, *bs)
410 if not y.class in [tuple, list]: y = (y,)
411
--> 412 return FeaturesKP(kernel, gamma, x, y, bs, mode=mode, backend=backend, dtype=dtype)

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/kernel_product/features_kernels.py in FeaturesKP(kernel, gs, xs, ys, bs, mode, backend, dtype)
163 genconv = Genred(formula, aliases, reduction_op=red, axis=axis, dtype=dtype)
164
--> 165 return genconv(*full_args, backend=backend)

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in call(self, backend, device_id, ranges, *args)
349
350 """
--> 351 out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
352 nx, ny = get_sizes(self.aliases, *args)
353 nout = nx if self.axis==1 else ny

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in forward(ctx, formula, aliases, backend, dtype, device_id, ranges, *args)
19
20 myconv = LoadKEops(formula, aliases, dtype, 'torch',
---> 21 ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
22
23 # Context variables: save everything to compute the gradient:

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/common/keops_io.py in import_module(self)
50
51 def import_module(self):
---> 52 return importlib.import_module(self.dll_name)

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/init.py in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
127
128

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/_bootstrap.py in find_and_load(name, import)

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/_bootstrap.py in find_and_load_unlocked(name, import)

ModuleNotFoundError: No module named 'libKeOpstorchf44721a1c0'

ABI incompatibility: -D_GLIBCXX_USE_CXX11_ABI=0

I was getting a weird error message when I tried to use pykeops.
Basically this one:

undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

After doing some sleuthing I figured out my distribution (NixOS) compiles pytorch with gcc7 and thus uses the newer ABI that was introduced with gcc4.8 I think. Then I fixed the issue locally deleting the line in CMakeLists.txt that adds sets -D_GLIBCXX_USE_CXX11_ABI=0.

Then I dug into the documentation of pytorch but couldn't find any reference to setting -D_GLIBCXX_USE_CXX11_ABI=0while compiling and also couldn't really figure out what the recommended version of gcc is for pytorch. I guess I am currently trying to figure whose responsible for the ABI incompatibility (nix, pytorch documentation, or pykeops), so that I can open an issue at the correct place but are people still compiling pytorch with gcc4.8?

Why does PyKeOps require GCC >= 7 ?

Hi,

The installation instructions for PyKeOps lists the following requirements:

A C++ compiler compatible with std=c++14: g++ version >=7 or clang++ version >=8.

But according to the GCC website

https://gcc.gnu.org/projects/cxx-status.html

already the GCC versions 5.x and 6.x should fully implement c++14. Should it therefore be possible to install PyKeOps already with GCC 6.3.0 ? Or is there another reason that you ask for GCC >= 7 ? This version would already fully implement c++17.

Best regards

Sam

ImportError due to pybind11 autodetecting python version in CMakeLists.txt

Steps to reproduce:

  1. Have two or more different versions of python 3 installed (python3.5 and python3.6 for example):
sudo apt-get install python3.5-dev python3.6-dev
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.5 get-pip.py
python3.6 get-pip.py
  1. Install pykeops using pip (or from sources):
pip3.5 install numpy && pip3.5 install pykeops
pip3.6 install numpy && pip3.6 install pykeops

This will install two different keops modules:
<module 'pykeops' from '/usr/local/lib/python3.5/dist-packages/pykeops/init.py'>
<module 'pykeops' from '/usr/local/lib/python3.6/dist-packages/pykeops/init.py'>

  1. Create python file test.py containing:
import pykeops
pykeops.clean_pykeops()          # just in case old build files are still present
pykeops.test_numpy_bindings()    # perform the compilation
  1. The test script will fail for at least one python version:
  • python3.5 test.py fails with ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpyb10acd1892)
  • python3.6 test.py succeeds with pyKeOps with numpy bindings is working!

What really happens:

This happens because pykeops/common/compile_routines.py calls cmake on pykeops/CMakeLists.txt which contains add_subdirectory(pybind11) which will detect python3.6 by default (in this specific case). This generates a shared library <CACHE_DIR>/pykeops-1.3-cpython-35/libKeOpsnumpyb10acd1892.so targeting python 3.6 instead of 3.5 and the importlib module fails to load the library.

How to fix:

The simplest solution I found is to enforce the python version directly in the build script by using the PYBIND11_PYTHON_VERSION variable. Adding set(PYBIND11_PYTHON_VERSION 3.5) at the beginning of /usr/local/lib/python3.5/dist-packages/pykeops/CMakeLists.txt fixes the problem.

I imagine this could be done automatically by detecting python version during build / before installation. This fix could solve many issues such as #2 #8 #28 #37 and others.

Feature Request: Support FP16

Hi!

Thank you for your this library, I greatly enjoy using it!

This is more of a request for KeOps to support FP16 in pytorch, so we could combine KeOps with apex for even faster GPU computation.

Thank you for you consideration!

Best regards,
Robert

Is it possible to parallelize computations across GPUs?

Hi,

I'm trying to parallel computations on multiple GPUs with Keops, but it seems like the computation happens sequantially across the GPUs. What I'm doing is:

from gpytorch.kernels.keops import RBFKernel

# Instantiate a Module on every GPU
rbfs = [RBFKernel().to(d) for d in range(2)]

# Instantiate the tensors on every GPU
xs = [torch.randn(5000, 1).to(d) for d in range(2)]

# Create a wrapper around a keops.torch.LazyTensor on each device that carries out the kernel matrix multiplication 
lztsrs = [rbf.forward(x, x) for rbf, x in zip(rbfs, xs)]

# Get the actual Pytorch kernel tensors by multiplying by the identity matrix
res = [t.evaluate() for t in lztsrs]

However, according to the GPU usage in nvidia-smi, the matrix multiplications are happening sequentially since only one GPU has 100% utilization at a time.

On the other hand, in pytorch for example, the following will dispatch the computations in parallel and all GPUs will simultaneously have high usage:

import torch

xs = [torch.randn(30000, 30000, device=f"cuda:{i}") for i in range(2)]
res = [x @ x for x in xs]

Is there anyway to do keops computations on each GPU in parallel in the same way?

ImportError: dynamic module does not define module export function

Hi all,

I'm facing the same problem as well. I installed gcc 7.4 and nvcc 10.0 and still getting the same problem. Any ideas ?

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
gcc --version gcc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0 Copyright (C) 2017 Free Software Foundation, Inc.

`
Compiling libKeOpsnumpy73a835aa5f in /home/hassanhaija/.cache/pykeops-1.0.2/:
formula: Sum_Reduction(-SqNorm2(x-y),1)
aliases: x = Vi(0,3); y = Vj(1,3);
dtype : float64
... -- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=1;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 6.1
-- Using shared_obj_name: libKeOpsnumpy73a835aa5f
-- Found PythonInterp: /home/hassanhaija/anaconda3/bin/python3.7 (found version "3.7.1")
-- Found PythonLibs: /home/hassanhaija/anaconda3/lib/libpython3.7m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.2.4
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/hassanhaija/.cache/pykeops-1.0.2

[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/keops/core/keopslibKeOpsnumpy73a835aa5f_generated_link_autodiff.cu.o
Scanning dependencies of target keopslibKeOpsnumpy73a835aa5f
[ 40%] Linking CUDA device code CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/cmake_device_link.o
[ 60%] Linking CXX shared library libKeOpsnumpy73a835aa5f.so
[ 60%] Built target keopslibKeOpsnumpy73a835aa5f
Scanning dependencies of target libKeOpsnumpy73a835aa5f
[ 80%] Building CXX object CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o
[100%] Linking CXX shared module libKeOpsnumpy73a835aa5f.cpython-37m-x86_64-linux-gnu.so
[100%] Built target libKeOpsnumpy73a835aa5f

Done.
Traceback (most recent call last):
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 45, in load_keops
return importlib.import_module(dll_name)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpsnumpy73a835aa5f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 29, in _safe_compile_and_load
return importlib.import_module(dll_name)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpsnumpy73a835aa5f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "keopstest.py", line 9, in
my_conv = Genred('-SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/numpy/generic/generic_red.py", line 114, in init
self.myconv = load_keops(self.formula, self.aliases, self.dtype, 'numpy')
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 48, in load_keops
return _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/utils.py", line 70, in wrapper_filelock
return func(*args, **kwargs)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 34, in _safe_compile_and_load
return importlib.import_module(dll_name)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 658, in _load_unlocked
File "", line 571, in module_from_spec
File "", line 922, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy73a835aa5f)
`

Originally posted by @hassanhaija in #2 (comment)

Sample Test and GeomLoss Sample Error

Hello,

Thank you very much for your amazing work!

I am running in Ubuntu 18.04.3. I installed Python3.7 using Anaconda. Then I installed CUDA 10.1 and Pytorch, and then KeOps. I did not receive any error when I install pykeops. But when I tested installation, I passed none of the scripts.

The error message from the Pytorch script was:

/usr/include/crt/host_config.h:121:2: error: #error -- unsupported GNU version! gcc versions later than 6 are not supported!
#error -- unsupported GNU version! gcc versions later than 6 are not supported!
^~~~~
CMake Error at keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o.Release.cmake:219 (message):
Error generating
/home/velysianp/.cache/pykeops-1.2-cpython-37/build-
libKeOpstorch91c92bd508/CMakeFiles/keopslibKeOpstorch91c92bd508.dir/keops/core/./keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o

make[3]: *** [CMakeFiles/keopslibKeOpstorch91c92bd508.dir/keops/core/keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch91c92bd508.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch91c92bd508.dir/rule] Error 2
make: *** [libKeOpstorch91c92bd508] Error 2

I checked my system with gcc --version, and it returned gcc-7.4.0 which seems to be Ubuntu 18.04.3 x86_64 default. So I am not sure what should I do with this error.

Then I also received error when I tried to run sample code from GeomLoss:

Traceback (most recent call last):
File "plot_optimal_transport_2D.py", line 149, in gradient_descent( SamplesLoss("sinkhorn", p=2, blur=.1) )
File "plot_optimal_transport_2D.py", line 107, in gradient_descent L_αβ = loss(x_i, y_j)
File "/home/me/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/samples_loss.py", line 237, in forward verbose = self.verbose )
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/sinkhorn_samples.py", line 102, in sinkhorn_online C_xx, C_yy, C_xy, C_yx, ε_s, ρ, debias=debias )
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/sinkhorn_divergence.py", line 151, in sinkhorn_loop a_x = λ * softmin(ε, C_xx, α_log ) # OT(α,α)
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/sinkhorn_samples.py", line 69, in softmin_online
return - ε * log_conv( x, y, f_y.view(-1,1), torch.Tensor([1/ε]).type_as(x) ).view(-1)
File "/home/me/anaconda3/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 351, in call out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
File "/home/me/anaconda3/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 43, in forward *args)
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support: try to set tagHostDevice to 0 or recompile the formula with a working version of cuda.

Also, I checked my CUDA and Pytorch installation, it turned out that they are working all right. So I am really curious what should I do with those errors?

Thank you very much!
Elyson

Impossible to install RKeops

When I try
install.packages("rkeops")
I get

Warning in install.packages :
  package ‘rkeops’ is not available (for R version 3.6.2) 

When I try
devtools::install_git("https://github.com/getkeops/keops", subdir = "rkeops", args="--recurse-submodules='keops/lib/sequences'")
I get

Erreur : Failed to install 'unknown package' from Git:
  Error in 'git2r_remote_ls': there is no TLS stream available

I'm running R3.6.2 on Ubuntu 18.04.4 :

platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          6.2                         
year           2019                        
month          12                          
day            12                          
svn rev        77560                       
language       R                           
version.string R version 3.6.2 (2019-12-12)
nickname       Dark and Stormy Night 

Any idea on the origin of the problem ?

Changes between 1.1.2 and 1.1.1?

After the update from 1.1.1 to 1.1.2, I'm having issues with using keops on two different computers when running the basic installation code:

import torch
import pykeops.torch as pktorch

x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)

my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

On one device, I'm getting

>>> import torch
>>> import pykeops.torch as pktorch
>>> 
>>> x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
>>> y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)
>>> 
>>> my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
>>> print(my_conv(x, y))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 351, in __call__
    out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
    ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed

On another device, I'm getting

>>> import torch
>>> import pykeops.torch as pktorch
7.4.0
>>> 
>>> x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
>>> y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)
>>> 
>>> my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
>>> print(my_conv(x, y))
Compiling libKeOpstorch91c92bd508 in /home/alex_w/.cache/pykeops-1.1.2-cpython-37//build-libKeOpstorch91c92bd508:
       formula: Sum_Reduction(SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float32
... CMake Error at pybind11/tools/FindPythonLibsNew.cmake:127 (message):
  Python config failure: Python is 0-bit, chosen compiler is 64-bit
Call Stack (most recent call first):
  pybind11/tools/pybind11Tools.cmake:16 (find_package)
  pybind11/CMakeLists.txt:33 (include)



--------------------- CMAKE DEBUG -----------------
Command '['cmake', '/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops', '-DCMAKE_BUILD_TYPE=Release', '-DFORMULA_OBJ=Sum_Reduction(SqNorm2(x-y),1)', '-DVAR_ALIASES=auto x = Vi(0,3); auto y = Vj(1,3); ', '-Dshared_obj_name=libKeOpstorch91c92bd508', '-D__TYPE__=float', '-DPYTHON_LANG=torch', '-DC_CONTIGUOUS=1', '-DPYTORCH_INCLUDE_DIR=/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include;/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include/torch/csrc/api/include', '-DcommandLine=cmake /home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops -DCMAKE_BUILD_TYPE=Release -DFORMULA_OBJ=Sum_Reduction(SqNorm2(x-y),1) -DVAR_ALIASES=auto x = Vi(0,3); auto y = Vj(1,3);  -Dshared_obj_name=libKeOpstorch91c92bd508 -D__TYPE__=float -DPYTHON_LANG=torch -DC_CONTIGUOUS=1 -DPYTORCH_INCLUDE_DIR=/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include;/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include/torch/csrc/api/include']' returned non-zero exit status 1.
-- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=7;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152;-DMAXTHREADSPERBLOCK7=1024;-DSHAREDMEMPERBLOCK7=49152
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 
-- Using shared_obj_name: libKeOpstorch91c92bd508
-- Found PythonInterp: /home/alex_w/miniconda3/envs/rl/bin/python3.7 (found version "3.7.4") 
-- Configuring incomplete, errors occurred!
See also "/home/alex_w/.cache/pykeops-1.1.2-cpython-37/build-libKeOpstorch91c92bd508/CMakeFiles/CMakeOutput.log".
See also "/home/alex_w/.cache/pykeops-1.1.2-cpython-37/build-libKeOpstorch91c92bd508/CMakeFiles/CMakeError.log".

--------------------- ----------- -----------------
make: *** No rule to make target 'libKeOpstorch91c92bd508'.  Stop.

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch91c92bd508', '--', 'VERBOSE=1']' returned non-zero exit status 2.

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 351, in __call__
    out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
    ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

I'm wondering if anything about the builds changed between 1.1 and 1.2? Also how can I go about getting debugging these issues? I tried PYKEOPS_VERBOSE=1 and clearing the cache, but it hasn't given me any additional debug information.

ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy5ac3d464a2)

Hello!
There's something wrong when I run the example code as follow.
Actually I see this bug in #17, but I think they are not the same because it's ok for the first time in #17 but failed in my machine.

code

import numpy as np
import pykeops.numpy as pknp

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9).reshape(-1, 3).astype('float32')
my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
res = my_conv(x, y, backend='CPU')
assert res.shape == (2, 1)
print("okay")

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

cmake --version

cmake version 3.15.3

which python

/home/lowen/anaconda3/envs/lowenEnv/bin/python

python --version

Python 3.6.8 :: Anaconda, Inc.

gcc --version

gcc (GCC) 7.4.0
Copyright © 2017 Free Software Foundation, Inc.


output

Compiling libKeOpsnumpy5ac3d464a2 in /home/lowen/.cache/pykeops-1.1.2-cpython-36//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... Done.
Traceback (most recent call last):
  File "xjbx.py", line 972, in <module>
    test_geomloss()
  File "xjbx.py", line 936, in test_geomloss
    my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
  File "/home/lowen/anaconda3/envs/lowenEnv/lib/python3.6/site-packages/pykeops/numpy/generic/generic_red.py", line 114, in __init__
    self.myconv = LoadKEops(self.formula, self.aliases, self.dtype, 'numpy').import_module()
  File "/home/lowen/anaconda3/envs/lowenEnv/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/lowen/anaconda3/envs/lowenEnv/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy5ac3d464a2)

Error on GPU when trying to replicate the behaviour of torch.bmm().

Hello,
For the sake of it, I am trying to replicate the behavior of torch.bmm().
Here is a minimal example (with device and keops_backend appropriatly set):

formula = "TensorDot(a, b, Ind(2,2), Ind(2,2), Ind(1), Ind(0))"
alias = ["a=Vi(4)", "b=Vi(4)"]
keops_bmm = Genred(formula, alias, reduction_op='Sum', axis=1, dtype='float32')

A = torch.rand(N, 2, 2, device=device)
B = torch.rand(N, 2, 2, device=device)

print(torch.allclose(torch.bmm(A, B), keops_bmm(A.view(-1, 4), B.view(-1, 4), backend=keops_backend).view(-1, 2, 2)))

On CPU, the code works correctly. However, when choosing either GPU backends, the program stops and outputs "Instruction non permise (core dumped)".

Let me know if you need additional information.

Best regards,
Lex

Example for 3D convolutions

Thanks a lot for this useful code, the benchmarks are impressive.

I wanted to try it out for convolutions (anisotropic kernels) on gridded data (3D images). Could you point me to an example code for this use case (couldn't find it)?

That'd be great, thanks a lot in advance!

ModuleNotFoundError: No module named 'libKeOpstorch99c715f463' while test bindings work

Hello.
In my previous issue I was not able to install keops python bindings, the problem was with cuda toolkit installed by conda not being sufficient, thus after installing nvcc using Nvidia drivers the following script finishes with not errors

import pykeops
pykeops.verbose = True
pykeops.clean_pykeops()  
pykeops.test_torch_bindings() 

Unfortunatly, I am trying to run the following example,
But receive the following error

Compiling libKeOpstorch99c715f463 in /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463:
       formula: Max_SumShiftExp_Reduction(((-(WeightedSqDist(G_0,X_0,Y_0))) + B_0),0)
       aliases: G_0 = Vj(0,4); X_0 = Vi(1,100); Y_0 = Vj(2,2); B_0 = Vj(3,1); 
       dtype  : float32
... /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(35): error: static assertion failed with "Dimensions must be the same for Subtract"
          detected during:
            instantiation of class "keops::Subtract_Impl<FA, FB> [with FA=keops::_X<1, 100>, FB=keops::_Y<2, 2>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqNorm.h(26): here
            instantiation of type "keops::WeightedSqNorm<keops::_Y<0, 4>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqDist.h(14): here
            instantiation of type "keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Mult.h(30): error: static assertion failed with "Dimensions of FA and FB must be the same for Mult"
          detected during:
            instantiation of class "keops::Mult_Impl<FA, FB> [with FA=keops::_Y<0, 4>, FB=keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(45): here
            instantiation of class "keops::UnaryOp_base<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(61): here
            instantiation of class "keops::UnaryOp<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Sum.h(20): here
            instantiation of class "keops::Sum<F> [with F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/pre_headers.h(43): here
            instantiation of class "keops::KeopsNS<F> [with F=keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>]" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

2 errors detected in the compilation of "/tmp/tmpxft_00000684_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch99c715f463.dir/rule] Error 2
make: *** [libKeOpstorch99c715f463] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch99c715f463', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch99c715f463
make[1]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch99c715f463.dir/all
make[2]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make CMakeFiles/keopslibKeOpstorch99c715f463.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -E make_directory /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/.
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.cubin.txt -P /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
-- Generating dependency file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp to /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp and /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/all' failed
make[2]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpstorch99c715f463.dir/rule' failed
make[1]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
Makefile:118: recipe for target 'libKeOpstorch99c715f463' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 210, in <module>
    cost = model.neglog_likelihood(word_features_reduced)  # Cost to minimize.
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 156, in neglog_likelihood
    ll = self.log_likelihoods(sample)
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 152, in log_likelihoods
    return kernel_product(self.params, sample, self.mu, self.weights_log(), mode='lse')
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/kernels.py", line 412, in kernel_product
    return FeaturesKP(kernel, gamma, x, y, bs, mode=mode, backend=backend, dtype=dtype)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/features_kernels.py", line 165, in FeaturesKP
    return genconv(*full_args, backend=backend)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch99c715f463'

Ideas?
Thanks!

python ~/repos/docBert/models/gmm/torch_kops_gmm.py 
Compiling libKeOpstorch99c715f463 in /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463:
       formula: Max_SumShiftExp_Reduction(((-(WeightedSqDist(G_0,X_0,Y_0))) + B_0),0)
       aliases: G_0 = Vj(0,4); X_0 = Vi(1,100); Y_0 = Vj(2,2); B_0 = Vj(3,1); 
       dtype  : float32
... /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(35): error: static assertion failed with "Dimensions must be the same for Subtract"
          detected during:
            instantiation of class "keops::Subtract_Impl<FA, FB> [with FA=keops::_X<1, 100>, FB=keops::_Y<2, 2>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqNorm.h(26): here
            instantiation of type "keops::WeightedSqNorm<keops::_Y<0, 4>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqDist.h(14): here
            instantiation of type "keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Mult.h(30): error: static assertion failed with "Dimensions of FA and FB must be the same for Mult"
          detected during:
            instantiation of class "keops::Mult_Impl<FA, FB> [with FA=keops::_Y<0, 4>, FB=keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(45): here
            instantiation of class "keops::UnaryOp_base<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(61): here
            instantiation of class "keops::UnaryOp<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Sum.h(20): here
            instantiation of class "keops::Sum<F> [with F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/pre_headers.h(43): here
            instantiation of class "keops::KeopsNS<F> [with F=keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>]" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

2 errors detected in the compilation of "/tmp/tmpxft_00000684_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch99c715f463.dir/rule] Error 2
make: *** [libKeOpstorch99c715f463] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch99c715f463', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch99c715f463
make[1]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch99c715f463.dir/all
make[2]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make CMakeFiles/keopslibKeOpstorch99c715f463.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -E make_directory /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/.
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.cubin.txt -P /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
-- Generating dependency file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp to /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp and /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/all' failed
make[2]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpstorch99c715f463.dir/rule' failed
make[1]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
Makefile:118: recipe for target 'libKeOpstorch99c715f463' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 210, in <module>
    cost = model.neglog_likelihood(word_features_reduced)  # Cost to minimize.
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 156, in neglog_likelihood
    ll = self.log_likelihoods(sample)
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 152, in log_likelihoods
    return kernel_product(self.params, sample, self.mu, self.weights_log(), mode='lse')
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/kernels.py", line 412, in kernel_product
    return FeaturesKP(kernel, gamma, x, y, bs, mode=mode, backend=backend, dtype=dtype)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/features_kernels.py", line 165, in FeaturesKP
    return genconv(*full_args, backend=backend)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch99c715f463'

Windows support

(Py37A) C:\Users\Francois>pip install pykeops
Collecting pykeops
  Downloading https://files.pythonhosted.org/packages/48/72/d1576e0841b1fa6dd65de4ef203362e5eb7748215005ace2975e12ac2679/pykeops-1.3.tar.gz (301kB)
     |████████████████████████████████| 307kB 731kB/s
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\francois\venvs\py37a\scripts\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Francois\\AppData\\Local\\Temp\\pip-install-hm6v38ko\\pykeops\\setup.py'"'"'; __file__='"'"'C:\\Users\\Francois\\AppData\\Local\\Temp\\pip-install-hm6v38ko\\pykeops\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\pip-egg-info'
         cwd: C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\
    Complete output (9 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\setup.py", line 11, in <module>
        from pykeops import __version__ as current_version
      File "C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\pykeops\__init__.py", line 34, in <module>
        from .common.utils import clean_pykeops
      File "C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\pykeops\common\utils.py", line 1, in <module>
        import fcntl
    ModuleNotFoundError: No module named 'fcntl'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

(Py37A) C:\Users\Francois>pip install fcntl
ERROR: Could not find a version that satisfies the requirement fcntl (from versions: none)
ERROR: No matching distribution found for fcntl

a litlle more search shows that fcntl is only supported on Mac/Linux. It seems it is only used in one place to lock a file, which could be done in a more portable way with portalocker?

Compilation hangs for a specific example of tensordot

Hi. I think the following example makes the compilation hang

from pykeops.torch import LazyTensor
import torch as T

a = LazyTensor(T.rand(1, 1, 27 * 3 * 64).cuda())
b = LazyTensor(T.rand(2, 1, 20 * 8 * 27).cuda())
c = b.keops_tensordot(a, (20, 8, 27), (27, 3, 64), (2,), (0,))
output = c.sum(1)

The stdout is like this

Compiling libKeOpstorch5b4bdc3ab7 in /home/adnguyen/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch5b4bdc3ab7:
       formula: Sum_Reduction(TensorDot(Var(0,4320,0), Var(1,5184,2), Ind(20,8,27), Ind(27,3,64), Ind(2), Ind(0)),0)
       aliases: Var(0,4320,0); Var(1,5184,2); 
       dtype  : float32
... 

I am able to run other tensordot examples as well as others, so I would not presume the problem is the environment. Please have a look at the problem. Thanks!

PyTorch test script fails

Hello.

I pass the numpy test script, but not the pyTorch. I get the following output:

`(venv) jad@jad-Aspire-A717-71G ~/venv $ python3
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch

import pykeops.torch as pktorch

x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)

my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))
Traceback (most recent call last):
File "", line 1, in
File "/home/jad/venv/lib/python3.5/site-packages/pykeops/torch/generic/generic_red.py", line 351, in call
out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
File "/home/jad/venv/lib/python3.5/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
File "/home/jad/venv/lib/python3.5/site-packages/pykeops/common/keops_io.py", line 52, in import_module
return importlib.import_module(self.dll_name)
File "/home/jad/venv/lib/python3.5/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 986, in _gcd_import
File "", line 969, in _find_and_load
File "", line 958, in _find_and_load_unlocked
File "", line 666, in _load_unlocked
File "", line 577, in module_from_spec
File "", line 906, in create_module
File "", line 222, in _call_with_frames_removed
ImportError: /home/jad/.cache/pykeops-1.2-cpython-35/libKeOpstorch91c92bd508.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1011CPUTensorIdEv
`

example .pt file, and tools to convert binary image to .pt file?

Hi, I'm interested in testing the KeOps, and am trying to go through the "Surface registration" tutorial here:
http://kernel-operations.io/keops/_auto_tutorials/surface_registration/plot_LDDMM_Surface.html#sphx-glr-auto-tutorials-surface-registration-plot-lddmm-surface-py

In the tutorial, it requires the "*.pt" as the import data file:
hippos.pt” : original data (6611 vertices), etc...

I'm assuming it's a point cloud data format. May I ask if there is any example ".pt" file format that we can use to test? Furthermore, would there be tools that can be used to convert a binary mask image to a ".pt" file format?

Thank you!

PyTorch 1.3. Deprecation Warnings

Hi,

with the new release of PyTorch 1.3, the usage of data<...>() is now deprecated in favor of data_ptr<...>(). This results in a bunch of warnings when compiling KeOps kernels. Other libraries solve this problem via PyTorch version checking, e.g., see here.

Compilation Failure under JupyterLab

System:

  • Ubuntu 16.04 docker
  • Python 3.6.7
  • PyTorch 1.1
  • NVCC 10.0
  • g++ 5.4.0-6
  • cmake 3.14.4
  • GNU make 4.1
  • JupyterLab 0.35.4

When running one of the example scripts from the command line (either line by line in a python3 shell, or as 'python3 test.py') everything works fine. When running from what I believe to be a properly configured JupyterLab, I get cmake and make errors.

The chosen test script:

import torch
import pykeops.torch as pktorch

x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)

my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

The first part of the errors, showing make and cmake errors while python is still running:

Compiling libKeOpstorch91c92bd508 in /root/.cache/pykeops-1.0.2/:
       formula: Sum_Reduction(SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float32
... 
--------------------- CMAKE DEBUG -----------------
Command '['cmake', '/opt/conda/lib/python3.6/site-packages/pykeops', '-DCMAKE_BUILD_TYPE=Release', '-DFORMULA_OBJ=Sum_Reduction(SqNorm2(x-y),1)', '-DVAR_ALIASES=auto x = Vi(0,3); auto y = Vj(1,3); ', '-Dshared_obj_name=libKeOpstorch91c92bd508', '-D__TYPE__=float', '-DPYTHON_LANG=torch', '-DPYTORCH_INCLUDE_DIR=/opt/conda/lib/python3.6/site-packages/torch/include;/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include']' returned non-zero exit status 1.
-- Configuring incomplete, errors occurred!

--------------------- ----------- -----------------

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch91c92bd508']' returned non-zero exit status 1.

--------------------- ----------- -----------------
Done. 

Followed immediately by the python errors themselves:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in load_keops(formula, aliases, dtype, lang, optional_flags)
     44         # high frequency path
---> 45         return importlib.import_module(dll_name)
     46     except ImportError:

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)
     28             # already compiled, just load
---> 29             return importlib.import_module(dll_name)
     30         except ImportError:

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-387377d109a7> in <module>
      7 
      8 my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
----> 9 print(my_conv(x, y))

/opt/conda/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in __call__(self, backend, device_id, ranges, *args)
    311 
    312         """
--> 313         out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
    314         nx, ny = get_sizes(self.aliases, *args)
    315         nout = nx if self.axis==1 else ny

/opt/conda/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in forward(ctx, formula, aliases, backend, dtype, device_id, ranges, *args)
     17     def forward(ctx, formula, aliases, backend, dtype, device_id, ranges, *args):
     18 
---> 19         myconv = load_keops(formula, aliases, dtype, 'torch', ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)])
     20 
     21         # Context variables: save everything to compute the gradient:

/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in load_keops(formula, aliases, dtype, lang, optional_flags)
     46     except ImportError:
     47         # could not import (ie not compiled), safely compile/import
---> 48         return _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)

/opt/conda/lib/python3.6/site-packages/pykeops/common/utils.py in wrapper_filelock(*args, **kwargs)
     68             with open(build_folder + '/' + lock_file_name, 'w') as f:
     69                 with FileLock(f):
---> 70                     return func(*args, **kwargs)
     71 
     72         return wrapper_filelock

/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)
     32             # print(dll_name + " not found")
     33             compile_generic_routine(formula, aliases, dll_name, dtype, lang, optional_flags)
---> 34             return importlib.import_module(dll_name)
     35 
     36     # create the name from formula, aliases and dtype.

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

I will note that after running various test scripts successfully from the command line, the following contents are in /root/.cache/pykeops-1.0.2, which is in the sys.path as printed from the jupyter script.

14 Jun 30 00:32 .
5 Jun 29 20:15 ..
CMakeCache.txt
CMakeFiles
Makefile
cmake_install.cmake
detect_cuda_compute_capabilities.cu
detect_cuda_props.cu
libKeOpstorch91c92bd508.cpython-36m-x86_64-linux-gnu.so
libKeOpstorch91c92bd508.h
libKeOpstorch91c92bd508.so
pybind11
pykeops_build.lock
torch_headers.h

Compilation error on test script

I seem to be facing a rather weird issue on my local test with CUDA 10.2, gcc5.4, cmake3.10 and cmake3.12 (Fails with both cmakes)

>>> import numpy as np
>>> import pykeops.numpy as pknp
>>> x = np.arange(1, 10).reshape(-1, 3).astype('float32')
>>> y = np.arange(3, 9).reshape(-1, 3).astype('float32')
>>> my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
Compiling libKeOpsnumpy5ac3d464a2 in /home/cg260486/.cache/pykeops-1.2-cpython-35//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... make: Warning: File 'Makefile' has modification time 54 s in the future
make[1]: Warning: File 'CMakeFiles/Makefile2' has modification time 54 s in the future
make[2]: Warning: File 'CMakeFiles/Makefile2' has modification time 54 s in the future
make[3]: Warning: File 'CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/flags.make' has modification time 54 s in the future
In file included from /usr/include/c++/5/type_traits:35:0,
                 from /neurospin/optimed/Chaithya/Environments/CSMRI_sparkling/venv/lib/python3.5/site-packages/pykeops/keops/lib/sequences/include/tao/seq/concatenate.hpp:7,
                 from /neurospin/optimed/Chaithya/Environments/CSMRI_sparkling/venv/lib/python3.5/site-packages/pykeops/keops/core/formulas/maths/TensorDot.h:8,
                 from /neurospin/optimed/Chaithya/Environments/CSMRI_sparkling/venv/lib/python3.5/site-packages/pykeops/keops/keops_includes.h:33,
                 from /home/cg260486/.cache/pykeops-1.2-cpython-35/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.h:13,
                 from <command-line>:0:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
CMake Error at keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake:219 (message):
  Error generating
  /home/cg260486/.cache/pykeops-1.2-cpython-35/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/rule] Error 2
make: *** [libKeOpsnumpy5ac3d464a2] Error 2

I had faced similar issue while trying to build pykeops locally, and I found that adding -std=c++11 arg for nvcc helped. Now, I am not sure if this is an issue in keops or cmake. Note that this works fine on Google Colab. So I think this is surely more about my environment, and I could not find sufficient resources online other than -std=c++11 to fix this.

Kernel with norm matrix indexed by i and j

Hi! I am trying to figure out how to construct a kernel where the matrix defining the norm has i and j indices. More specifically, I want the kernel evaluated at the points x_i and y_j to be

k(x_i, y_j) = exp(-1/2 d_ij @ inverse(S_i + S_j) @ d_ij) / sqrt(det(S_i + S_j)),
d_ij = x_i - y_j.

For my specific problem, x_i and y_i are vectors of length d=2. I want to use k with KernelSolve, and I need to use the pytorch backend and to be able to take gradients.

I was wondering how to best do this in keops. Since d is 2, I can write out explicit expressions for everything in terms of the components of the vectors and elements of the matrices S_i and S_j. However, this could be optimized substantially by only computing det(S_i + S_j) once for each (i, j). Is there any way I can write formulas like this, or would it require something lower-level in keops? Thanks!

For reference, here's how I'm constructing the kernel and solver now:

d_0 = "(Elem(x_i, 0) - Elem(y_j, 0))"
d_1 = "(Elem(x_i, 1) - Elem(y_j, 1))"
s_00 = "(Elem(s_i, 0) + Elem(s_j, 0))"
s_01 = "(Elem(s_i, 1) + Elem(s_j, 1))"
s_11 = "(Elem(s_i, 3) + Elem(s_j, 3))"
det = f"({s_00} * {s_11} - Square({s_01}))"
s_inv_00 = f"({s_11} / {det})"
s_inv_01 = f"(-{s_01} / {det})"
s_inv_11 = f"({s_00} / {det})"
formula = (
    "Exp(-("
    f"Square({d_0}) * {s_inv_00} + "
    f"IntCst(2) * {d_0} * {d_1} * {s_inv_01} +"
    f"Square({d_1}) * {s_inv_11}"
    ") * IntInv(2))"
    f" * alpha2 * Rsqrt({det}) * theta_j"
)

aliases = [
    "x_i = Vi(2)",
    "y_j = Vj(2)",
    "s_i = Vi(4)",
    "s_j = Vj(4)",
    "alpha2 = Pm(1)",
    "theta_j = Vj(1)"
]

K = Genred(formula, aliases, axis=1)
K_inv = KernelSolve(formula, aliases, "theta_j", axis=1)

Extracting a band diagonal with KeOps?

I want to do matrix multiplication of 2D tensors where I only care about a few diagonals of the resulting matrix, and I want to run this on GPU on PyTorch. Is this something that can be done with keops ?

For illustration, here's the numpy code, but what I really need is PyTorch/GPU

import numpy as np
M = 16000  # huge, can't do the O(n^2) operations
N = 64
c = 100  # number of diagonals, a lot smaller than M
t1 = np.random.rand(M, N)  
t2 = np.random.rand(M, N) 
r = np.zeros((M, d))

for i in range(M):
    for j in range(-c, c):  
        r[i][j] = np.dot(t1[i], t2[i + j])  # `(i + j)` should be `min(max(i + j, 0), M - 1)`
                                            # to take care of boundry condition, but let's skip that for now

PS: I am running into compilation errors with the tutorial examples, but will ask about that later.

ImportError with PyInit_libKeOpsnumpy73a835aa5f module

Hello, great work !

I couldn't run the sample code below with cuda 10 and and cmake 3.14.4

(base) [hicham@gpuserver ~]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

I made sure I installed pykeops with all dependencies pykeops[full].

import numpy as np
import pykeops
pykeops.verbose = True
from pykeops.numpy import Genred

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9 ).reshape(-1, 3).astype('float32')

my_conv = Genred('-SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

Here is the output:

(base) [hicham@gpuserver ~]$ python keopstest.py 
Compiling libKeOpsnumpy73a835aa5f in /home/hicham/.cache/pykeops-1.0.2/:
       formula: Sum_Reduction(-SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... -- Compute properties automatically set to: -DMAXIDGPU=12;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152;-DMAXTHREADSPERBLOCK7=1024;-DSHAREDMEMPERBLOCK7=49152;-DMAXTHREADSPERBLOCK8=1024;-DSHAREDMEMPERBLOCK8=49152;-DMAXTHREADSPERBLOCK9=1024;-DSHAREDMEMPERBLOCK9=49152;-DMAXTHREADSPERBLOCK10=1024;-DSHAREDMEMPERBLOCK10=49152;-DMAXTHREADSPERBLOCK11=1024;-DSHAREDMEMPERBLOCK11=49152;-DMAXTHREADSPERBLOCK12=1024;-DSHAREDMEMPERBLOCK12=49152
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s):  6.0 6.0 6.0 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7
-- Using shared_obj_name: libKeOpsnumpy73a835aa5f
-- pybind11 v2.2.4
-- Configuring done
-- Generating done
-- Build files have been written to: /home/hicham/.cache/pykeops-1.0.2

In file included from /home/hicham/miniconda3/lib/python3.7/site-packages/pykeops/numpy/generic/generic_red.cpp:2:0:
/home/hicham/miniconda3/lib/python3.7/site-packages/torch/include/pybind11/numpy.h:288:5: erreur: 'is_trivially_copyable' is not a member of 'std'
     std::is_trivially_copyable<T>,
     ^
/home/hicham/miniconda3/lib/python3.7/site-packages/torch/include/pybind11/numpy.h:288:5: erreur: 'is_trivially_copyable' is not a member of 'std'
compilation terminated due to -fmax-errors=2.
gmake[3]: *** [CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o] Error 1
gmake[2]: *** [CMakeFiles/libKeOpsnumpy73a835aa5f.dir/all] Error 2
gmake[1]: *** [CMakeFiles/libKeOpsnumpy73a835aa5f.dir/rule] Error 2
gmake: *** [libKeOpsnumpy73a835aa5f] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpsnumpy73a835aa5f']' returned non-zero exit status 2.
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/keops/core/keopslibKeOpsnumpy73a835aa5f_generated_link_autodiff.cu.o
[ 40%] Linking CUDA device code CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/cmake_device_link.o
[ 60%] Linking CXX shared library libKeOpsnumpy73a835aa5f.so
[ 60%] Built target keopslibKeOpsnumpy73a835aa5f
Scanning dependencies of target libKeOpsnumpy73a835aa5f
[ 80%] Building CXX object CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o

--------------------- ----------- -----------------
Done. 
Traceback (most recent call last):
  File "/home/hicham/miniconda3/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 45, in load_keops
    return importlib.import_module(dll_name)
  File "/home/hicham/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 670, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 583, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1043, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy73a835aa5f)

Compilation failures on 1.1.2

I am getting compilation failures for operations after updating to v1.1.2 that I did not previously get. To easily reproduce, all I have to do is enter keops/pykeops/test and run python unit_tests_numpy.py, and I get the error at the end of the file on the first (and every) test.

Using git bisect, I was able to determine that 466eaf2 is the first commit on which this error occurs (e.g., on the commit before this things work and on this commit I receive the error below). A lot happened in this commit however, so I wasn't able to determine an easy fix.

Error log

Compiling libKeOpsnumpy60ff1f2397 in /home/jake.gardner/git/keops/pykeops/common/../build//build-libKeOpsnumpy60ff1f2397:
       formula: Sum_Reduction(Inv(Exp((IntCst(1) + Sum((Square((Var(0,1,0) + (Var(1,3,0) * Var(2,3,1)))) + Var(3,1,2)))))),0)
       aliases: Var(0,1,0); Var(1,3,0); Var(2,3,1); Var(3,1,2); 
       dtype  : float32
... In file included from /usr/include/c++/5/type_traits:35:0,
                 from /home/jake.gardner/git/keops/pykeops/../keops/lib/sequences/include/tao/seq/is_all.hpp:13,
                 from /home/jake.gardner/git/keops/pykeops/../keops/lib/sequences/include/tao/seq/is_any.hpp:10,
                 from /home/jake.gardner/git/keops/pykeops/../keops/lib/sequences/include/tao/seq/contains.hpp:8,
                 from /home/jake.gardner/git/keops/pykeops/../keops/core/formulas/tensordot.h:4,
                 from /home/jake.gardner/git/keops/pykeops/../keops/core/formulas/maths.h:11,
                 from /home/jake.gardner/git/keops/pykeops/../keops/core/formulas/newsyntax.h:10,
                 from /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/libKeOpsnumpy60ff1f2397.h:16,
                 from <command-line>:0:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
CMake Error at keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.Release.cmake:219 (message):
  Error generating
  /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpsnumpy60ff1f2397.dir/rule] Error 2
make: *** [libKeOpsnumpy60ff1f2397] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpsnumpy60ff1f2397', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -S/home/jake.gardner/git/keops/pykeops -B/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpsnumpy60ff1f2397
make[1]: Entering directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -S/home/jake.gardner/git/keops/pykeops -B/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397 --check-build-system CMakeFiles/Makefile.cmake 0
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -E cmake_progress_start /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpsnumpy60ff1f2397.dir/all
make[2]: Entering directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/build.make CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/depend
make[3]: Entering directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o
cd /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core && /home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -E make_directory /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/.
cd /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core && /home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.cubin.txt -P /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -E remove /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o
-- Generating dependency file: /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ /home/jake.gardner/git/keops/pykeops/../keops/core/link_autodiff.cu -o /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpsnumpy60ff1f2397_EXPORTS -DMAXIDGPU=1 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy60ff1f2397 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-Wall\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_75,code=sm_75 --use_fast_math --compiler-options=-fPIC --expt-relaxed-constexpr --pre-include=libKeOpsnumpy60ff1f2397.h -DNVCC -I/usr/local/cuda/include -I/home/jake.gardner/git/keops/pykeops -I/home/jake.gardner/git/keops/pykeops/../keops -I/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397
CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
CMakeFiles/Makefile2:331: recipe for target 'CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/all' failed
make[2]: Leaving directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
CMakeFiles/Makefile2:269: recipe for target 'CMakeFiles/libKeOpsnumpy60ff1f2397.dir/rule' failed
make[1]: Leaving directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
Makefile:183: recipe for target 'libKeOpsnumpy60ff1f2397' failed

--------------------- ----------- -----------------
Done.

Why compute gradient cost in this example?

Hi,
I am referencing this specific example in which the goal is to fit a GMM with flexible number of mixtures to some 2d points denoted x.

I do not understand the point of x.requires_grad = True here. In other examples, (shape matching) you do update the data values (x) so that the original sample matches some other target sample.
However here x is (if I'm correct) the target sample.

Removing this line with fixed seed yields the same results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.