falkonml / falkon Goto Github PK
View Code? Open in Web Editor NEWLarge-scale, multi-GPU capable, kernel solver
Home Page: https://falkonml.github.io/falkon/
License: MIT License
Large-scale, multi-GPU capable, kernel solver
Home Page: https://falkonml.github.io/falkon/
License: MIT License
NotImplementedError: Could not run 'falkon::cuda_2d_copy_async' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend,
Please reply at the earliest.
Regards,
Hi again,
And thanks for the help last time again, will as mentioned try to make a PR as soon we confirm our experiment/method it working.
I get:
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1616554786529/work/aten/src/THC/THCCachingHostAllocator.cpp:278
when I try to set M=10^5 (I am on a laptop). Should I be getting this error and can I expect it to work if I am using V100 with 32GB?
Thank you!
Best regards,
Robert
Hi, is there a reason for using the patched version of KeOps? https://github.com/FalkonML/falkon/blob/master/setup.py#L176-L177
Hi,
The present documentation has only example related to binary cross entropy loss (logistic regression). I was wondering would it be possible to include example for cross entropy loss (for more than two classes) ?
In many CUDA operations we use separate allocations for all tensors, which may lead to memory fragmentation and therefore to out of CUDA memory errors when there is a lot of memory pressure.
@WackoToe
1 options = falkon.FalkonOptions(keops_active="force")
3 kernel = falkon.kernels.GaussianKernel(sigma=1, opt=options)
----> 4 flk = falkon.Falkon(kernel=kernel, penalty=1e-5, M=5000, options=options)
yields this error message which I am unable to debug. Please help.
File ~/.conda/envs/falkon_env/lib/python3.10/site-packages/falkon/models/falkon.py:132, in Falkon.__init__(self, kernel, penalty, M, center_selection, maxiter, seed, error_fn, error_every, weight_fn, options)
130 self.maxiter = maxiter
131 self.weight_fn = weight_fn
--> 132 self._init_cuda()
133 self.beta_ = None
File ~/.conda/envs/falkon_env/lib/python3.10/site-packages/falkon/models/model_utils.py:70, in FalkonBase._init_cuda(self)
68 if self.use_cuda_:
69 torch.cuda.init()
---> 70 self.num_gpus = devices.num_gpus(self.options)
File ~/.conda/envs/falkon_env/lib/python3.10/site-packages/falkon/utils/devices.py:212, in num_gpus(opt)
210 global __COMP_DATA
211 if len(__COMP_DATA) == 0:
--> 212 get_device_info(opt)
213 return len([c for c in __COMP_DATA.keys() if c >= 0])
File ~/.conda/envs/falkon_env/lib/python3.10/site-packages/falkon/utils/devices.py:200, in get_device_info(opt)
197 return __COMP_DATA
199 for g in range(0, tcd.device_count()):
--> 200 __COMP_DATA = _get_gpu_device_info(opt, g, __COMP_DATA)
202 if len(__COMP_DATA) == 0:
203 raise RuntimeError("No suitable device found. Enable option 'use_cpu' "
204 "if no GPU is available.")
File ~/.conda/envs/falkon_env/lib/python3.10/site-packages/falkon/utils/devices.py:92, in _get_gpu_device_info(opt, g, data_dict)
83 # try:
84 # from ..cuda.cudart_gpu import cuda_meminfo
85 # except Exception as e:
(...)
89 # Some of the CUDA calls in here may change the current device,
90 # this ensures it gets reset at the end.
91 with tcd.device(g):
---> 92 mem_free, mem_total = cuda_mem_get_info(g)
93 mem_used = mem_total - mem_free
94 # noinspection PyUnresolvedReferences
RuntimeError: Not compiled with CUDA support
Hello,
I am trying to pass a FixedSelector instance to center_selection in the Falkon constructor, however, I do not obtain a different error to the default "Uniform" selector, which leads me to suspect that it uses the default instead.
This is the code I am using:
indices_torch = torch.from_numpy(indices).reshape(-1,1)
X_centers_init = Xtrain[indices].clone()
Y_centers_init = Ytrain[indices].clone()
selector = FixedSelector(X_centers_init,Y_centers_init,indices_torch)
kernel = kernels.GaussianKernel(sigma=1.352)
model = Falkon(
maxiter=100,
kernel=kernel,
penalty=1.07e-06,
M=20000,
center_selection=selector,
options=options
)
Could you advise on what is going wrong here?
We are stuck at 51% code coverage because CI does not have CUDA.
This requires more effort to install the library in the CI pipeline.
Hi, I am trying to use stochastic objective function in hopt to do gradient based hyperparameter optimization. Tried running it and the first iteration takes forever for some reason. My falkon solver works without problems now. I take a look at the code and wrote a small replication script based on how stoch_new_compreg.py is implemented. Anything I did wrong in the following script?
import numpy as np
import falkon
import torch
from falkon.center_selection import FixedSelector
# generate a tiny dataset
n = 100
d = 5
X, Y = datasets.make_regression(n, d, random_state=11)
num_train = int(0.8 * n)
X = X.astype(np.float64)
Y = Y.astype(np.float64).reshape(-1, 1)
X_train, y_train = torch.from_numpy(X[:num_train]), torch.from_numpy(Y[:num_train])
X_test, y_test = torch.from_numpy(X[num_train:]), torch.from_numpy(Y[num_train:])
m = 10
X_centers = X_train[:m, :].clone()
center_selector = FixedSelector(centers=X_centers)
options = falkon.FalkonOptions(keops_active="no", debug=True, cpu_preconditioner=True, max_gpu_mem=12*10**9,
chol_force_ooc=True, min_cuda_iter_size_64=300000, cg_tolerance=1e-10)
sigma_init = torch.as_tensor(np.array([np.sqrt(d)]*d), dtype=torch.float64)
kernel = falkon.kernels.GaussianKernel(sigma=sigma_init, opt=options)
ridge = 1e-6
maxiter = 50
def error_fn(t, p):
return torch.sqrt(torch.mean((t - p) ** 2)).item(), "RMSE"
# solve falkon first before running through gradient
flk = falkon.Falkon(kernel=kernel, center_selection=center_selector,
penalty=ridge, M=m, options=options, error_every=1, maxiter=maxiter, error_fn=error_fn)
flk.fit(X_train, y_train, X_train, y_train)
# gist of backward process in stoch_new_compreg.py.
# Remove trace part and only focus on the derivatives of model fitting term w.r.t. kernel bandwidths
# ridge and centers are set to non-trainable
optimize_centers = False
optimize_ridge = False
def calc_dfit_bwd(zy_knm_solve_zy, zy_solve_knm_knm_solve_zy, zy_solve_kmm_solve_zy, pen_n, t,
include_kmm_term):
"""Nystrom regularized data-fit backward"""
dfit_bwd = -(
2 * zy_knm_solve_zy[t:].sum() -
zy_solve_knm_knm_solve_zy[t:].sum()
)
print(dfit_bwd)
print(dfit_bwd.shape)
if include_kmm_term:
print(zy_solve_kmm_solve_zy[t:].sum().shape)
dfit_bwd += pen_n * zy_solve_kmm_solve_zy[t:].sum()
print(pen_n * zy_solve_kmm_solve_zy[t:].sum())
return dfit_bwd
solve_zy = flk.alpha_.clone().to("cuda:0", copy=False)
X_centers_dev = X_centers.to("cuda:0", copy=False).requires_grad_(optimize_centers)
solve_zy_dev = solve_zy.to("cuda:0", copy=False)
penalty_dev = torch.as_tensor(ridge).to("cuda:0", copy=False).requires_grad_(optimize_ridge)
sigma_init = torch.as_tensor(np.array([np.sqrt(d)]), dtype=torch.float64).requires_grad_(True)
kernel = falkon.kernels.GaussianKernel(sigma=sigma_init, opt=options)
with torch.autograd.enable_grad():
kernel_dev = kernel.to("cuda:0")
kmm_dev = kernel_dev(X_centers_dev, X_centers_dev, opt=options)
zy_solve_kmm_solve_zy = (kmm_dev @ solve_zy_dev * solve_zy_dev).sum(0)
k_mn_zy = kernel_dev.mmv(X_centers_dev, X_train, y_train, opt=options) # M x (T+P)
zy_knm_solve_zy = k_mn_zy.mul(solve_zy_dev).sum(0)
zy_solve_knm_knm_solve_zy = kernel_dev.mmv(X_train, X_centers_dev, solve_zy_dev, opt=options).square().sum(0)
pen_n = penalty_dev * num_train
dfit_bwd = calc_dfit_bwd(
zy_knm_solve_zy, zy_solve_knm_knm_solve_zy, zy_solve_kmm_solve_zy, pen_n, 0,
include_kmm_term=True)
grads = torch.autograd.grad(
dfit_bwd, list(kernel_dev.diff_params.values()), retain_graph=False, allow_unused=False)
I am also wondering if we implement the gradient computation this way, we would not able to use multi-GPU in the backward pass. Am I right?
Thanks!
Hi there,
I'm trying to use the hopt features for a regression problem, so I'm currently trying to adapt the hopt example using a regression dataset. This includes:
.to(dtype=torch.float32)
mclass_loss
). I'm using the following so far:def mclass_loss(true, pred):
mae = torch.nn.L1Loss()
return mae(true, pred)
But I'm getting the following error right off the bat (before even calling mclass_loss
):
Traceback (most recent call last):
File "/gpfsssd/scratch/rech/tta/uam43iy/tests/falkon_opt_pivOF/opt_hp.py", line 57, in <module>
loss = model(X_train, Y_train)
File "/gpfslocalsup/pub/anaconda-py3/2021.05/envs/pytorch-gpu-1.11.0+py3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/linkhome/rech/genimp01/uam43iy/.local/lib/python3.9/site-packages/falkon/hopt/objectives/exact_objectives/sgpr.py", line 32, in forward
L, A, AAT, LB, c = self._calc_intermediate(X, Y)
File "/linkhome/rech/genimp01/uam43iy/.local/lib/python3.9/site-packages/falkon/hopt/objectives/exact_objectives/sgpr.py", line 82, in _calc_intermediate
c = torch.triangular_solve(AY, LB, upper=False).solution / sqrt_var
RuntimeError: torch.triangular_solve: Expected b to have at least 2 dimensions, but it has 1 dimensions instead
(pytorch-gpu-1.11.0+py3.9.12) bash-4.4$ python opt_hp.py
/linkhome/rech/genimp01/uam43iy/.local/lib/python3.9/site-packages/falkon/hopt/objectives/exact_objectives/sgpr.py:75: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at /gpfs7kro/gpfslocalsup/src/pub/anaconda-py3/2021.05/pytorch-1.11.0+py3.9.12/pytorch-1.11.0/aten/src/ATen/native/BatchLinearAlgebra.cpp:1672.)
A = torch.triangular_solve(kmn, L, upper=False).solution / sqrt_var
Traceback (most recent call last):
File "/gpfsssd/scratch/rech/tta/uam43iy/tests/falkon_opt_pivOF/opt_hp.py", line 57, in <module>
loss = model(X_train, Y_train)
File "/gpfslocalsup/pub/anaconda-py3/2021.05/envs/pytorch-gpu-1.11.0+py3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/linkhome/rech/genimp01/uam43iy/.local/lib/python3.9/site-packages/falkon/hopt/objectives/exact_objectives/sgpr.py", line 32, in forward
L, A, AAT, LB, c = self._calc_intermediate(X, Y)
File "/linkhome/rech/genimp01/uam43iy/.local/lib/python3.9/site-packages/falkon/hopt/objectives/exact_objectives/sgpr.py", line 82, in _calc_intermediate
c = torch.triangular_solve(AY, LB, upper=False).solution / sqrt_var
RuntimeError: torch.triangular_solve: Expected b to have at least 2 dimensions, but it has 1 dimensions instead
Is this related to the one-hot representation used in the classification problem of the example? Can the hyperparameter optimization methods be used for regression problems here?
thanks,
Arthur
This is an issue with the upload script which doesn't know about which old wheels had previously been uploaded.
I'm trying to install falkon on macOS 10.14.6 for CPU. However I'm having some issues after running 'python setup.py develop' as suggested in Issue #2. I'm following the installation instructions using a clean virtual environment with pytorch version 1.8.1 installed, Python 3.8.9, GCC 10.2.0_4, and cmake version 3.20.0. I have already installed keops. The problem seems to be something with the file 'falkon/sparse/cpp/sparse_matmul.cpp' and cmake / gcc. Any suggestions for how to fix this are much appreciated.
running develop
running egg_info
writing falkon.egg-info/PKG-INFO
writing dependency_links to falkon.egg-info/dependency_links.txt
writing requirements to falkon.egg-info/requires.txt
writing top-level names to falkon.egg-info/top_level.txt
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/utils/cpp_extension.py:369: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'falkon.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching 'notebooks/**' found anywhere in distribution
warning: no previously-included files matching 'doc/_build/**' found anywhere in distribution
warning: no previously-included files matching '**/.ipynb_checkpoints/**' found anywhere in distribution
warning: no previously-included files matching '__pycache__' found anywhere in distribution
warning: no previously-included files matching '*.py[co]' found anywhere in distribution
warning: no previously-included files matching 'keops/**' found anywhere in distribution
warning: no previously-included files matching 'benchmark/**' found anywhere in distribution
writing manifest file 'falkon.egg-info/SOURCES.txt'
running build_ext
building 'falkon.sparse.sparse_helpers' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I./falkon/sparse -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/TH -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/THC -I/Users/user/.pyenv/versions/falkon/include -I/Users/user/.pyenv/versions/3.8.9/include/python3.8 -c ./falkon/sparse/sparse_extension.cpp -o build/temp.macosx-10.14-x86_64-3.8/./falkon/sparse/sparse_extension.o -Xpreprocessor -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=sparse_helpers -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I./falkon/sparse -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/TH -I/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/THC -I/Users/user/.pyenv/versions/falkon/include -I/Users/user/.pyenv/versions/3.8.9/include/python3.8 -c ./falkon/sparse/cpp/sparse_matmul.cpp -o build/temp.macosx-10.14-x86_64-3.8/./falkon/sparse/cpp/sparse_matmul.o -Xpreprocessor -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_clang" -DPYBIND11_STDLIB="_libcpp" -DPYBIND11_BUILD_ABI="_cxxabi1002" -DTORCH_EXTENSION_NAME=sparse_helpers -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from ./falkon/sparse/cpp/sparse_matmul.cpp:2:
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: error: redefinition of 'parallel_for'
inline void parallel_for(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelNative.h:34:13: note: previous definition is here
inline void parallel_for(
^
In file included from ./falkon/sparse/cpp/sparse_matmul.cpp:2:
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:64:17: error: redefinition of 'parallel_reduce'
inline scalar_t parallel_reduce(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelNative.h:58:17: note: previous definition is here
inline scalar_t parallel_reduce(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<unsigned char>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<signed char>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<double>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<float>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<int>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<long long>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
./falkon/sparse/cpp/sparse_matmul.cpp:14:9: error: no matching function for call to 'parallel_for'
torch::parallel_for(0, N, 2048, [&](int64_t start, int64_t end) {
^~~~~~~~~~~~~~~~~~~
./falkon/sparse/cpp/sparse_matmul.cpp:132:9: note: in instantiation of function template specialization 'run_parallel<short>' requested here
run_parallel<scalar_t>(
^
/Users/user/.pyenv/versions/falkon/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:15:13: note: candidate template ignored: substitution
failure [with F = (lambda at ./falkon/sparse/cpp/sparse_matmul.cpp:14:41)]
inline void parallel_for(
^
9 errors generated.
error: command 'gcc' failed with exit status 1
Hi again,
Thanks for the help these days. Running into:
cudaSafeCall() failed at /data/greyostrich/not-backed-up/nvme00/rhu/miniconda3/envs/new_nnenv/lib/python3.8/site-packages/pykeops/cmake_scripts/script_keops_formula/../../keops/core/mapreduce/GpuConv1D.cu:432 : out of memory
when running FALKON.
Setup:
X: 10^9x3
Y: 10^9x1
GaussianKernel with ls=3
penalty=1e-5
GPU: V100 32GB
CPU RAM: 180GB
8 CPU processors
Thank you for the help!
Best regards,
Robert
When running the following example with Falkon, I run into a cuda runtime error.
Example:
`from sklearn import datasets, model_selection
import numpy as np
import torch
import falkon
from falkon.models import Falkon
from falkon.kernels import GaussianKernel
from falkon.options import FalkonOptions
Xtrain = np.random.randn(80000, 1536)
Xtest = np.random.randn(10000, 1536)
Ytrain = np.random.randn(80000, 20)
Ytest = np.random.randn(10000, 20)
Xtrain = torch.from_numpy(Xtrain)
Xtest = torch.from_numpy(Xtest)
Ytrain = torch.from_numpy(Ytrain)
Ytest = torch.from_numpy(Ytest)
print("X TRAIN SHAPE: ", Xtrain.shape, Ytrain.shape, "TEsT SHAPES: ", Xtest.shape, Ytest.shape)
kernel = GaussianKernel(sigma=5)
flk = Falkon(kernel=kernel, penalty=1e-5, M=Xtrain.shape[0])
flk.fit(Xtrain, Ytrain)`
Error:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=304 : OS call failed or operation not supported on this OS Traceback (most recent call last): File "falkon_test.py", line 26, in <module> flk.fit(Xtrain, Ytrain) File "/home/nehap/anaconda3/envs/falkon/lib/python3.7/site-packages/falkon/models/falkon.py", line 197, in fit ny_points = ny_points.pin_memory() RuntimeError: cuda runtime error (304) : OS call failed or operation not supported on this OS at /opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/THC/THCCachingHostAllocator.cpp:278
Here is my .yml file:
name: falkon
channels:
I'm currently using a 1 TITAN RTX GPUโฉ with 24 GB memory and my CPU has 128 GB memory. The example works if we reduce the number of dimensions from 1536 to 20, but with larger datasets it seems to be running into this issue. We would appreciate any help with this issue - thank you!
Hello,
I followed the installation steps and did not run into any error while compiling the library.
However, when trying the kernel ridge regression notebook, I cannot load the module.
I get the following error trace:
import falkon
Traceback (most recent call last):
File "", line 1, in
File "/home/vignac/falkon/falkon/init.py", line 3, in
from . import kernels, sparse, center_selection, preconditioner, optim
File "/home/vignac/falkon/falkon/kernels/init.py", line 1, in
from .kernel import Kernel
File "/home/vignac/falkon/falkon/kernels/kernel.py", line 6, in
from falkon.mmv_ops.fmm_cpu import fmm_cpu_sparse, fmm_cpu
File "/home/vignac/falkon/falkon/mmv_ops/fmm_cpu.py", line 10, in
from falkon.sparse.sparse_tensor import SparseTensor
File "/home/vignac/falkon/falkon/sparse/init.py", line 2, in
from .sparse_ops import sparse_norm, sparse_square_norm, sparse_matmul
File "/home/vignac/falkon/falkon/sparse/sparse_ops.py", line 4, in
from falkon.sparse.sparse_helpers import norm_sq, norm_
ModuleNotFoundError: No module named 'falkon.sparse.sparse_helpers'
Is it just a path that is incorrect, or do you think it is a bigger problem with the installation?
Thanks,
Clement
Packages version:
nvcc: 11.4
g++: 7.5.0
cmake: 3.18.2
$ pip list
Package Version
certifi 2021.10.8
cycler 0.11.0
falkon 0.6.3
joblib 1.1.0
kiwisolver 1.3.2
matplotlib 3.4.3
numpy 1.21.4
Pillow 8.4.0
pip 21.2.4
psutil 5.8.0
pykeops 1.4.2
pyparsing 3.0.5
python-dateutil 2.8.2
scikit-learn 1.0.1
scipy 1.7.2
setuptools 58.0.4
six 1.16.0
threadpoolctl 3.0.0
torch 1.10.0+cu113 (works fine on gpu)
typing-extensions 3.10.0.2
wheel 0.37.0
Installation traces: everything seemed to be fine:
$ pip install ./keops
Processing ./keops
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at pypa/pip#7555.
Requirement already satisfied: numpy in /home/vignac/.conda/envs/falkon/lib/python3.9/site-packages (from pykeops==1.4.2) (1.21.4)
Building wheels for collected packages: pykeops
Building wheel for pykeops (setup.py) ... done
Created wheel for pykeops: filename=pykeops-1.4.2-py3-none-any.whl size=478011 sha256=dac861f7bd93a552854c4566aaf688a1deaaf737518a862f4818e04bc8b8d16d
Stored in directory: /tmp/pip-ephem-wheel-cache-bq1e39qr/wheels/36/47/f5/4be78e0d60dfe330cfb4652a2e21c469d4f6ea7bb0d0d767df
Successfully built pykeops
Installing collected packages: pykeops
Successfully installed pykeops-1.4.2
$ pip install .
Processing /home/vignac/falkon
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at pypa/pip#7555.
Requirement already satisfied: torch>=1.4 in /home/vignac/.conda/envs/falkon/lib/python3.9/site-packages (from falkon==0.6.3) (1.10.0+cu113)
Collecting scipy
Downloading scipy-1.7.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (39.8 MB)
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 39.8 MB 5.5 MB/s
Requirement already satisfied: numpy in /home/vignac/.conda/envs/falkon/lib/python3.9/site-packages (from falkon==0.6.3) (1.21.4)
Collecting scikit-learn
Downloading scikit_learn-1.0.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.7 MB)
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 24.7 MB 108.8 MB/s
Collecting psutil
Downloading psutil-5.8.0-cp39-cp39-manylinux2010_x86_64.whl (293 kB)
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 293 kB 125.6 MB/s
Requirement already satisfied: typing-extensions in /home/vignac/.conda/envs/falkon/lib/python3.9/site-packages (from torch>=1.4->falkon==0.6.3) (3.10.0.2)
Collecting threadpoolctl>=2.0.0
Using cached threadpoolctl-3.0.0-py3-none-any.whl (14 kB)
Collecting joblib>=0.11
Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Building wheels for collected packages: falkon
Building wheel for falkon (setup.py) ... done
Created wheel for falkon: filename=falkon-0.6.3-cp39-cp39-linux_x86_64.whl size=1270129 sha256=a55a9db44d82a77908f9c190aae18500e199d29563d74434bbeeff2ed977fe41
Stored in directory: /tmp/pip-ephem-wheel-cache-qj6av2bg/wheels/42/2f/de/817b4dc8ce9bdfe9d8d5b31d82288a66442e83b0509995d8a1
Successfully built falkon
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn, psutil, falkon
Successfully installed falkon-0.6.3 joblib-1.1.0 psutil-5.8.0 scikit-learn-1.0.1 scipy-1.7.2 threadpoolctl-3.0.0
I've extended the DotProductKernel
with my own custom kernel and have run into some issues with memory.
It seems that when evaluating the function found after doing Kernel Ridge Regression, KeOps gets used.
However, in the fitting phase, prepare_
, apply_
and finalize_
get called which don't use KeOps. Because of this, I run out of memory when trying to fit very large inputs.
Is there a way to use KeOps for the fitting phase? Is there a reason this isn't being done by default?
Thanks in advance!
Hi,
I tried to run FALKON with 3 GPUs but I got the following error:
`Traceback (most recent call last):
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/utils/threading.py", line 15, in run
self.ret = self._target(*self._args, **self._kwargs)
File "/home/"user"//.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 138, in mmv_run_starter
return mmv_run_thread(X1, X2, v, out, kernel, blk_n, blk_m, mem_needed, dev, tid=proc_idx)
File "/home/"user"//.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 251, in mmv_run_thread
flat_gpu = torch.empty(size=(mem_needed,), dtype=m1.dtype, device=dev)
RuntimeError: CUDA out of memory. Tried to allocate 21.00 GiB (GPU 0; 31.75 GiB total capacity; 5.57 GiB already allocated; 20.88 GiB free; 9.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/"user"/.conda/envs/flk4/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/"user"/.conda/envs/flk4/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/"user"/research/knotty/run/main.py", line 38, in
alpha, acc_valid_ep3,nystrom_samples,knots_x,acc_ep2_test= run(**args,wandb_run=wandb_run)
File "/home/"user"/research/knotty/run/run.py", line 225, in run
Falkon_loss, accu_falkon = falkon_run(dataset, kernel_fn, options, p=num_knots, epochs=20,
File "/home/"user"/research/knotty/run/run.py", line 34, in falkon_run
flk.fit(x_train, y_train)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/models/falkon.py", line 264, in fit
beta = optim.solve(
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/optim/conjgrad.py", line 310, in solve
B = self.kernel.mmv(M, X, y_over_n, opt=self.params)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/kernels/kernel.py", line 266, in mmv
return mmv_impl(X1, X2, v, self, out, params)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 734, in fmmv
return KernelMmvFnFull.apply(kernel, opt, out, X1, X2, v, *kernel.diff_params.values())
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 695, in forward
KernelMmvFnFull.run_cpu_gpu(X1, X2, v, out, kernel, opt, False)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 641, in run_cpu_gpu
outputs = _start_wait_processes(mmv_run_starter, args)
File "/home/"user"/conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/utils.py", line 59, in _start_wait_processes
outputs.append(p.join())
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/utils/threading.py", line 22, in join
raise RuntimeError('Exception in thread %s' % (self.name)) from self.exc
RuntimeError: Exception in thread GPU-0
`
It works fine with 1,2 GPUs. I was wondering if using 3 or more GPUs can further make FALKON faster?
Thank you for your help.
Hi FALKON team!
While using Falkon, I stumbled on what looks like a memory bug in the library.
import torch
from falkon.kernels import LinearKernel
from falkon import Falkon
n = 50000
d = 51000
l = 10
X = torch.randn((n, d))
y = torch.nn.functional.one_hot(torch.randint(0, l, (n,))).float()
sigma = 1
penalties = [1e-4, 1e-5]
for i in range(2):
print(f"{i}")
kernel = LinearKernel(sigma=sigma)
model = Falkon(
kernel=kernel,
penalty=penalties[i],
M=40000,
maxiter=10,
seed=0,
)
model.fit(X, y)
predictions = model.predict(X)
# torch.cuda.empty_cache()
# If the line above is commented, FALKON induces a CUDA Out of Memory error.
Fit different models, no errors.
RuntimeError: CUDA out of memory. Tried to allocate 9.91 GiB (GPU 1; 31.75 GiB total capacity; 12.38 GiB already allocated; 8.59 GiB free; 21.75 GiB reserved in total by PyTorch)
In the above code,
torch.cuda.empty_cache()
line eliminates the issue.d = 51000
to d = 30000
eliminates the issue.pytorch 1.9
pip
Let me know if I can provide any further information or assistance in fixing the issue! Thanks!
Hi,
I'm trying to reproduce how falkon computes the Laplacian kernel, as I have to outsource some models I've optimized. So far I'm largely unsuccessful: I get consistent results between my python implementation and using sklearn's metrics.pairwise.laplacian_kernel()
, but completely different results with falkon.kernels.LaplacianKernel()
.
It's hard to tell what's different in the falkon implementation, as I didn't find a simple way to access the _sq_dist()
method used in laplacian_core()
. Is it really a Manhattan distance computed here? Is there any additional normalization? Would you have a simple numpy implementation which would reproduce falkon's results?
thanks a lot,
Arthur
Running the following command with GPU :
kernel = falkon.kernels.GaussianKernel(sigma=1, opt=options)
flk = falkon.Falkon(kernel=kernel, penalty=1e-5, M=5000, options=options)
gives OSError : /opt/conda/lib/python3.10/site-packages/falkon/c_ext/_C.so: undefined symbol: _ZNK5torch8autograd4Node4nameEv.
Please provide the solution at the earliest.
Hi,
I'd like to implement a hyperparameter optimization procedure based on minimizing a loss function computed on a validation set, in order to preserve transferability as much as possible.
I was previously using the built-in hopt classes in the following way:
model = SGPR(
kernel=kernel, penalty_init=penalty_init, centers_init=centers_init,
opt_penalty=True, opt_centers=False)
opt_hp = torch.optim.Adam(model.parameters(), lr=lr)
for epoch in range(100):
opt_hp.zero_grad()
loss = model(X_train, Y_train)
loss.backward()
opt_hp.step()
What I'm trying to implement now should probably look like that:
model = SGPR(
kernel=kernel, penalty_init=penalty_init, centers_init=centers_init,
opt_penalty=True, opt_centers=False)
opt_hp = torch.optim.Adam(model.parameters(), lr=lr)
loss_fn = torch.nn.L1Loss()
for epoch in range(100):
opt_hp.zero_grad()
model(X_train, Y_train)
loss = loss_fn(model.predict(X_val), Y_val)
loss.requires_grad = True
loss.backward()
opt_hp.step()
But the loss doesn't change upon optimization - the hyperparameters are probably not updated at all. Would that be related to the computation of dLoss/dx
? Should I use an instance of falkon.Falkon
instead of one of the falkon.hopt.objectives
to define the model (if I remember well I had issues related to keops or cuda with falkon.Falkon
)?
many thanks,
Arthur
KeOps only works with C-contiguous tensors, which is generally fine since by initializing falkon with C tensors (which is the common thing to do) everything works.
But falkon also should work with F inputs (possibly by transposing them appropriately?).
But currently there is no check for this so if the KeOps path is chosen, falkon will crash with an error such as
RuntimeError: [Keops] Arg at position 0: is not contiguous. Please provide 'contiguous' dara array, as KeOps does not support strides. If you're getting this error in the 'backward' pass of a code using torch.sum() on the output of a KeOps routine, you should consider replacing 'a.sum()' with 'torch.dot(a.view(-1), torch.ones_like(a).view(-1))'.
Autogenerated documentation is incomplete. Missing:
Related to #1
Installation runs with use_cpu=True
but fails on a GPU due to an error in pykeops
. See getkeops/keops#257 for more details.
Is it possible to ship a stable version of pykeops with pip install git+...
?
Hi @Giodiro
I spent a bit of time to implement an automated doc deployment with CircleCI in this PR on another repo so if you want we can have a look at it together for this one.
The config code is present in the PR I linked, apart from that you need to add a deploy key to github and circleCI as detailed in Add a Github Deploy key here
Import falkon gives the following error, any suggestions what is missing as I just followed the steps from the installation ==>
sparse_helpers.so: undefined symbol: _ZN5torch3jit6tracer9addOutputEPNS0_4NodeERKN2at6TensorE
Originally posted by @jaiabhayk in #1 (comment)
This issue collects items of documentation which are missing & should be written.
Error when trying to work with GPU (I do not encounter this on CPU):
Maybe something went wrong when refactoring?
File "falkon/la_helpers/cuda_trsm.py", line 7, in
from falkon.la_helpers.cuda_la_helpers import cuda_transpose
ModuleNotFoundError: No module named 'falkon.la_helpers.cuda_la_helpers'
Do you have any clue how I could circumvent this?
Both fit and predict will make data C-contiguous if decide_keops
returns True, but decide_keops
does not take into account data-dimensionality.
So in case of high-dimensional data, where KeOps would not be used, an unnecessary copy may occur.
Hi, [Sorry, just realized it might be more relevant in the PR section; can close this and edit this as a PR if you wish]
I had to use Falkon, and it worked great. Thanks for all the work you put in!
Still, I ran into a couple of errors before managing to run it on GPU. Here how I fixed them, if you find it useful:
If Falkon is not compiled with Cuda (WITH_CUDA
in your cmake file), the extensions are not built and running Falkon will fail with an ModuleNotFoundError
at from falkon.ooc_ops.cuda import parallel_potrf
in falkon/ooc_ops/ooc_potrf.py
. It's a bit hard to track down that this is an issue coming from the first compilation. Maybe catching the exception and providing an error message would be useful, e.g:
try:
from falkon.ooc_ops.cuda import parallel_potrf
except ModuleNotFoundError as e:
print(f"Got exception {e} when importing `cuda`. Did you compile with Cuda support?")
I had trouble compiling you patched version of PyKeops
. I had only two Cuda compilers available: versions 10.1
and 11.0
.
Cuda 10.1
, had the issue described here: https://forums.developer.nvidia.com/t/cuda-10-1-nvidia-youre-now-fixing-gcc-bugs-that-gcc-doesnt-even-have/71063. It is apparently a cuda specific issue, I did not try solving it.Cuda 11.0
. Except I fell in this problem: getkeops/keops#122c++ 17
flags to Cuda. Solution is to use:Then things worked fine; all in all, I would suggest:
WITH_CUDA
is false in the setup.py
.Cuda 11.0
by checking CMake version and applying the commitAnyway, thanks for the package!
We should only import keops when it's useful. Otherwise if keops is broken for some reason it will attempt to recompile its self-test file every time import falkon
is run
This could allow optimization of kernel parameters with autograd.
Steps:
Hi, I tried to install falkon in Colab.
The installation was successful but trying the KernelRidgeRegression demo I get an error on flk.fit().
[pyKeOps] Compiling libKeOpstorch0977e258bf in /root/.cache/pykeops-1.5-cpython-37:
formula: Sum_Reduction(Exp(SqDist(x1 / g, x2 / g) * IntInv(-2)) * v,0)
aliases: x1 = Vi(0,13); x2 = Vj(1,13); v = Vj(2,1); g = Pm(3,1);
dtype : float64
...
--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'KeOps_formula', '--', 'VERBOSE=1']' returned non-zero exit status 2.
--------------------- ----------- -----------------
[pyKeOps] Compiling pybind11 template libKeOps_template_660bc304e8 in /root/.cache/pykeops-1.5-cpython-37 ...
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
<ipython-input-12-be46bad2abb7> in <module>()
----> 1 flk.fit(Xtrain, Ytrain)
/usr/local/lib/python3.7/dist-packages/falkon/models/falkon.py in fit(self, X, Y, Xts, Yts, warm_start)
261 beta = optim.solve(
262 X, ny_points, Y, self.penalty, initial_solution=warm_start,
--> 263 max_iter=self.maxiter, callback=validation_cback)
264
265 self.alpha_ = precond.apply(beta)
/usr/local/lib/python3.7/dist-packages/falkon/optim/conjgrad.py in solve(self, X, M, Y, _lambda, initial_solution, max_iter, callback)
306 B = incore_fmmv(Knm, y_over_n, None, transpose=True, opt=self.params)
307 else:
--> 308 B = self.kernel.mmv(M, X, y_over_n, opt=self.params)
309 B = self.preconditioner.apply_t(B)
310
/usr/local/lib/python3.7/dist-packages/falkon/kernels/kernel.py in mmv(self, X1, X2, v, out, opt)
267 params = dataclasses.replace(self.params, **dataclasses.asdict(opt))
268 mmv_impl = self._decide_mmv_impl(X1, X2, v, params)
--> 269 return mmv_impl(X1, X2, v, self, out, params)
270
271 def _decide_mmv_impl(self,
/usr/local/lib/python3.7/dist-packages/falkon/kernels/distance_kernel.py in _keops_mmv_impl(self, X1, X2, v, kernel, out, opt)
283 other_vars = [self.sigma.to(device=X1.device, dtype=X1.dtype)]
284
--> 285 return self.keops_mmv(X1, X2, v, out, formula, aliases, other_vars, opt)
286
287 def extra_mem(self) -> Dict[str, float]:
/usr/local/lib/python3.7/dist-packages/falkon/kernels/keops_helpers.py in keops_mmv(self, X1, X2, v, out, formula, aliases, other_vars, opt)
70 return run_keops_mmv(X1=X1, X2=X2, v=v, other_vars=other_vars,
71 out=out, formula=formula, aliases=aliases, axis=1,
---> 72 reduction='Sum', opt=opt)
73
74 def keops_dmmv_helper(self, X1, X2, v, w, kernel, out, differentiable, opt, mmv_fn):
/usr/local/lib/python3.7/dist-packages/falkon/mmv_ops/keops.py in run_keops_mmv(X1, X2, v, other_vars, out, formula, aliases, axis, reduction, opt)
226 if comp_dev_type == 'cpu' and all([ddev.type == 'cpu' for ddev in data_devs]): # incore CPU
227 variables = [X1, X2, v] + other_vars
--> 228 out = fn(*variables, out=out, backend=backend)
229 elif comp_dev_type == 'cuda' and all([ddev.type == 'cuda' for ddev in data_devs]): # incore CUDA
230 variables = [X1, X2, v] + other_vars
/usr/local/lib/python3.7/dist-packages/pykeops/torch/generic/generic_red.py in __call__(self, out, backend, device_id, ranges, *args)
576 ny,
577 out,
--> 578 *args
579 )
580 if self.dtype in ("float16", "half"):
/usr/local/lib/python3.7/dist-packages/pykeops/torch/generic/generic_red.py in forward(ctx, formula, aliases, backend, dtype, device_id, ranges, optional_flags, rec_multVar_highdim, nx, ny, out, *args)
45 optional_flags += ['-DMULT_VAR_HIGHDIM=1']
46 myconv = LoadKeOps(
---> 47 formula, aliases, dtype, "torch", optional_flags, include_dirs
48 ).import_module()
49
/usr/local/lib/python3.7/dist-packages/pykeops/common/keops_io.py in __init__(self, formula, aliases, dtype, lang, optional_flags, include_dirs)
46 pykeops.config.build_type == "Debug"
47 ):
---> 48 self._safe_compile()
49
50 @create_and_lock_build_folder()
/usr/local/lib/python3.7/dist-packages/pykeops/common/utils.py in wrapper_filelock(*args, **kwargs)
75 lock = FileLock(os.path.join(bf, "pykeops_build2.lock"))
76 with lock:
---> 77 func_res = func(*args, **kwargs)
78
79 # clean
/usr/local/lib/python3.7/dist-packages/pykeops/common/keops_io.py in _safe_compile(self)
61 self.optional_flags,
62 self.include_dirs,
---> 63 self.build_folder,
64 )
65
/usr/local/lib/python3.7/dist-packages/pykeops/common/compile_routines.py in compile_generic_routine(formula, aliases, dllname, dtype, lang, optional_flags, include_dirs, build_folder)
244
245 template_name, is_rebuilt = get_or_build_pybind11_template(
--> 246 dtype, lang, include_dirs, use_prebuilt_formula=True
247 )
248
/usr/local/lib/python3.7/dist-packages/pykeops/common/compile_routines.py in get_or_build_pybind11_template(dtype, lang, include_dirs, use_prebuilt_formula)
65 # print('(with dtype=',dtype,', lang=',lang,', include_dirs=',include_dirs,')', flush=True)
66
---> 67 os.mkdir(template_build_folder)
68
69 command_line += ["-Dtemplate_name=" + "'{}'".format(template_name)]
FileExistsError: [Errno 17] File exists: '/root/.cache/pykeops-1.5-cpython-37//build-pybind11_template-libKeOps_template_660bc304e8'
This option has no effect anymore
Hi!
Thank you for writing this library. It's very cool to finally be able to scale kernel regression to a billion points!
Running into the following:
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support:
Thank you for the help!
Best regards,
Robert
I have installed Pytorch 2.0.0 and CUDA 11.7, along with pip install falkon -f https://falkon.dibris.unige.it/torch-2.0.0_cu117.html
. I can access the GPU and use it for other tasks, but when I try to to run Falkon I get the following error:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[43], line 4
1 options = falkon.FalkonOptions(keops_active="no")
3 kernel = falkon.kernels.GaussianKernel(sigma=1, opt=options)
----> 4 flk = falkon.Falkon(kernel=kernel, penalty=1e-5, M=5000, options=options)
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/models/falkon.py:132, in Falkon.__init__(self, kernel, penalty, M, center_selection, maxiter, seed, error_fn, error_every, weight_fn, options)
130 self.maxiter = maxiter
131 self.weight_fn = weight_fn
--> 132 self._init_cuda()
133 self.beta_ = None
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/models/model_utils.py:71, in FalkonBase._init_cuda(self)
69 if self.use_cuda_:
70 torch.cuda.init()
---> 71 self.num_gpus = devices.num_gpus(self.options)
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/utils/devices.py:211, in num_gpus(opt)
209 global __COMP_DATA
210 if len(__COMP_DATA) == 0:
--> 211 get_device_info(opt)
212 return len([c for c in __COMP_DATA if c >= 0])
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/utils/devices.py:199, in get_device_info(opt)
196 return __COMP_DATA
198 for g in range(0, tcd.device_count()):
--> 199 __COMP_DATA = _get_gpu_device_info(opt, g, __COMP_DATA)
201 if len(__COMP_DATA) == 0:
202 raise RuntimeError("No suitable device found. Enable option 'use_cpu' "
203 "if no GPU is available.")
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/utils/devices.py:91, in _get_gpu_device_info(opt, g, data_dict)
82 # try:
83 # from ..cuda.cudart_gpu import cuda_meminfo
84 # except Exception as e:
(...)
88 # Some of the CUDA calls in here may change the current device,
89 # this ensures it gets reset at the end.
90 with tcd.device(g):
---> 91 mem_free, mem_total = mem_get_info(g)
92 mem_used = mem_total - mem_free
93 # noinspection PyUnresolvedReferences
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/c_ext/__init__.py:15, in _make_lazy_cuda_func.<locals>.call_cuda(*args, **kwargs)
14 def call_cuda(*args, **kwargs):
---> 15 from ._backend import _assert_has_ext
16 _assert_has_ext()
17 return getattr(torch.ops.falkon, name)(*args, **kwargs)
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/falkon/c_ext/_backend.py:76
73 if not _HAS_EXT:
74 # try to import the compiled module (via setup.py)
75 lib_path = _get_extension_path("_C")
---> 76 torch.ops.load_library(lib_path)
77 _HAS_EXT = True
79 # Check torch version vs. compilation version
80 # Copyright (c) 2020 Matthias Fey <[email protected]>
81 # https://github.com/rusty1s/pytorch_scatter/blob/master/torch_scatter/__init__.py
File /mambaforge/envs/sobolev/lib/python3.9/site-packages/torch/_ops.py:643, in _Ops.load_library(self, path)
638 path = _utils_internal.resolve_library_path(path)
639 with dl_open_guard():
640 # Import the shared library into the process, thus running its
641 # static (global) initialization code in order to register custom
642 # operators with the JIT.
--> 643 ctypes.CDLL(path)
644 self.loaded_libraries.add(path)
File /mambaforge/envs/sobolev/lib/python3.9/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
371 self._FuncPtr = _FuncPtr
373 if handle is None:
--> 374 self._handle = _dlopen(self._name, mode)
375 else:
376 self._handle = handle
OSError: libcusolver.so.11: cannot open shared object file: No such file or directory
Hi
My system I can use keops on GPU without problem and I install cuda11.6 already. I have a 3090TI on Ubuntu 22.06.
When I install falkon as instructed using command "pip uninstall git+https://github.com/falkonml/falkon.git", everything is fine without warnings/erros. But when I test it in a notebook using import falkon, I got the following error:
OSError Traceback (most recent call last)
/tmp/ipykernel_1096354/295832182.py in
6 plt.style.use('ggplot')
7
----> 8 import falkon
~/anaconda3/envs/repo/lib/python3.8/site-packages/falkon/init.py in
8 "c_ext", [os.path.dirname(file)])
9 if spec is not None:
---> 10 torch.ops.load_library(spec.origin)
11 else:
12 raise ImportError("Failed to find C-extension. Please recompile Falkon.")
~/anaconda3/envs/repo/lib/python3.8/site-packages/torch/_ops.py in load_library(self, path)
571 # static (global) initialization code in order to register custom
572 # operators with the JIT.
--> 573 ctypes.CDLL(path)
574 self.loaded_libraries.add(path)
575
~/anaconda3/envs/repo/lib/python3.8/ctypes/init.py in init(self, name, mode, handle, use_errno, use_last_error, winmode)
371
372 if handle is None:
--> 373 self._handle = _dlopen(self._name, mode)
374 else:
375 self._handle = handle
OSError: /home/mc/anaconda3/envs/repo/lib/python3.8/site-packages/falkon/c_ext.so: undefined symbol: _ZN2at4cuda28getCurrentCUDASolverDnHandleEv
It looks like something wrong with cuda?
Hi,
Is there a simple way to export the alphas out of a falkon.Falkon
or falkon.hopt.objectives
fitted model?
thanks,
Arthur
I was trying to try FALKON with M~256k,512k(number of centers). But the process gets killed. How can I efficiently apply FALKON in these large M cases?
Hi,
I'm currently using the automatic hyperparameter optimization features, and would like to know if the kernel bandwidths can be optimized on a log scale rather than a linear scale.
e.g. outside of the opt_he features, I can pass log-scaled bandwidths to a kernel class in the following way
sigma_exp = torch.randn(X_train.shape[1], dtype=torch.float32)
sigma_ten = torch.pow(torch.full((len(sigma_exp),), 10), sigma_exp).requires_grad_()
kernel = falkon.kernels.GaussianKernel(sigma=sigma_ten, opt=options)
This way, if I want to update the bandwidths, I can operate on the exponents. Can I do something similar when using as a central object a falkon.hopt.objectives
along with a torch optimizer?
Currently blockwise splitting on the CPU is not adaptive to free memory, and only follows the max_cpu_mem
option.
Should attempt to use max(max_cpu_mem, actual free memory).
I tried installing falkon on a MacBook Pro 14 (M2 Pro) from source. It installs without any error, but during runtime, I run into the following error (see call stack) when running fit()
. Is support for Mac planned?
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
[/Users/ag2435/repos/falkon/notebooks/FalkonRegression.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/FalkonRegression.ipynb) Cell 11 line 1
----> [1](vscode-notebook-cell:/Users/ag2435/repos/falkon/notebooks/FalkonRegression.ipynb#X13sZmlsZQ%3D%3D?line=0) model.fit(Xtr, Ytr)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/models/falkon.py:229](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/models/falkon.py:229), in Falkon.fit(self, X, Y, Xts, Yts, warm_start)
227 if self.weight_fn is not None:
228 ny_weight_vec = self.weight_fn(Y[ny_indices], X[ny_indices], ny_indices)
--> 229 precond.init(ny_points, weight_vec=ny_weight_vec)
231 if _use_cuda_mmv:
232 # Cache must be emptied to ensure enough memory is visible to the optimizer
233 torch.cuda.empty_cache()
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/preconditioner/flk_preconditioner.py:101](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/preconditioner/flk_preconditioner.py:101), in FalkonPreconditioner.init(self, X, weight_vec)
99 else: # If sparse tensor we need fortran for kernel calculation
100 C = create_fortran((M, M), dtype=dtype, device=dev, pin_memory=self._use_cuda)
--> 101 self.kernel(X, X, out=C, opt=self.params)
102 if not is_f_contig(C):
103 C = C.T
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/kernels/kernel.py:173](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/kernels/kernel.py:173), in Kernel.__call__(self, X1, X2, diag, out, opt)
171 params = dataclasses.replace(self.params, **dataclasses.asdict(opt))
172 mm_impl = self._decide_mm_impl(X1, X2, diag, params)
--> 173 return mm_impl(self, params, out, diag, X1, X2)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:554](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:554), in fmm(kernel, opt, out, diag, X1, X2)
551 import falkon.kernels
553 if isinstance(kernel, falkon.kernels.DiffKernel):
--> 554 return KernelMmFnFull.apply(kernel, opt, out, diag, X1, X2, *kernel.diff_params.values())
555 else:
556 return KernelMmFnFull.apply(kernel, opt, out, diag, X1, X2)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/torch/autograd/function.py:506](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/torch/autograd/function.py:506), in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:480](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:480), in KernelMmFnFull.forward(ctx, kernel, opt, out, diag, X1, X2, *kernel_params)
478 out = KernelMmFnFull.run_diag(X1, X2, out, kernel, False, is_sparse)
479 elif comp_dev_type == "cpu" and data_dev.type == "cpu":
--> 480 out = KernelMmFnFull.run_cpu_cpu(X1, X2, out, kernel, comp_dtype, opt, False)
481 elif comp_dev_type == "cuda" and data_dev.type == "cuda":
482 out = KernelMmFnFull.run_gpu_gpu(X1, X2, out, kernel, comp_dtype, opt, False)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:354](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:354), in KernelMmFnFull.run_cpu_cpu(X1, X2, out, kernel, dtype, options, diff)
342 @staticmethod
343 def run_cpu_cpu(X1, X2, out, kernel, dtype, options, diff):
344 args = ArgsFmm(
345 X1=X1,
346 X2=X2,
(...)
352 differentiable=diff,
353 )
--> 354 out = _call_direct(mm_run_starter, (args, -1))
355 return out
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/utils.py:86](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/utils.py:86), in _call_direct(target, arg)
84 args_queue.put(arg[0])
85 new_args_tuple = (-1, args_queue, arg[1])
---> 86 return target(*new_args_tuple)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:131](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:131), in mm_run_starter(proc_idx, queue, device_id)
129 return sparse_mm_run_thread(X1, X2, out, kernel, n, m, computation_dtype, dev, tid=proc_idx)
130 else:
--> 131 return mm_run_thread(X1, X2, out, kernel, n, m, computation_dtype, dev, tid=proc_idx)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:291](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/mmv_ops/fmm.py:291), in mm_run_thread(m1, m2, out, kernel, n, m, comp_dt, dev, tid)
288 c_dev_out.fill_(0.0)
290 # Compute kernel sub-matrix
--> 291 kernel.compute(c_dev_m1, c_dev_m2, c_dev_out, diag=False)
293 # Copy back to host
294 if has_gpu_bufs:
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/kernels/diff_kernel.py:91](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/kernels/diff_kernel.py:91), in DiffKernel.compute(self, X1, X2, out, diag)
90 def compute(self, X1: torch.Tensor, X2: torch.Tensor, out: torch.Tensor, diag: bool):
---> 91 return self.core_fn(X1, X2, out, **self.diff_params, diag=diag, **self._other_params)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/kernels/distance_kernel.py:163](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/kernels/distance_kernel.py:163), in rbf_core(mat1, mat2, out, diag, sigma)
161 mat1_div_sig = mat1 [/](https://file+.vscode-resource.vscode-cdn.net/) sigma
162 mat2_div_sig = mat2 [/](https://file+.vscode-resource.vscode-cdn.net/) sigma
--> 163 norm_sq_mat1 = square_norm(mat1_div_sig, -1, True) # b*n*1 or n*1
164 norm_sq_mat2 = square_norm(mat2_div_sig, -1, True) # b*m*1 or m*1
166 out = _sq_dist(mat1_div_sig, mat2_div_sig, norm_sq_mat1, norm_sq_mat2, out)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/la_helpers/wrapper.py:129](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/la_helpers/wrapper.py:129), in square_norm(mat, dim, keepdim)
128 def square_norm(mat: torch.Tensor, dim: int, keepdim: Optional[bool] = None) -> torch.Tensor:
--> 129 return c_ext.square_norm(mat, dim, keepdim)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/c_ext/__init__.py:15](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/c_ext/__init__.py:15), in _make_lazy_cuda_func.<locals>.call_cuda(*args, **kwargs)
14 def call_cuda(*args, **kwargs):
---> 15 from ._backend import _assert_has_ext
17 _assert_has_ext()
18 return getattr(torch.ops.falkon, name)(*args, **kwargs)
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/c_ext/_backend.py:86](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/c_ext/_backend.py:86)
84 lib_path = _get_extension_path("_C")
85 try:
---> 86 torch.ops.load_library(lib_path)
87 except OSError as e:
88 # Hack: usually ld can't find torch_cuda_linalg.so which is in TORCH_LIB_PATH
89 # if we load it first, then load_library will work.
90 # TODO: This will only work on linux.
91 if (missing_lib := lib_from_oserror(e)).startswith("libtorch_cuda_linalg"):
File [~/anaconda3/envs/falkon/lib/python3.10/site-packages/torch/_ops.py:643](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/site-packages/torch/_ops.py:643), in _Ops.load_library(self, path)
638 path = _utils_internal.resolve_library_path(path)
639 with dl_open_guard():
640 # Import the shared library into the process, thus running its
641 # static (global) initialization code in order to register custom
642 # operators with the JIT.
--> 643 ctypes.CDLL(path)
644 self.loaded_libraries.add(path)
File [~/anaconda3/envs/falkon/lib/python3.10/ctypes/__init__.py:374](https://file+.vscode-resource.vscode-cdn.net/Users/ag2435/repos/falkon/notebooks/~/anaconda3/envs/falkon/lib/python3.10/ctypes/__init__.py:374), in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
371 self._FuncPtr = _FuncPtr
373 if handle is None:
--> 374 self._handle = _dlopen(self._name, mode)
375 else:
376 self._handle = handle
OSError: dlopen(/Users/ag2435/anaconda3/envs/falkon/lib/python3.10/site-packages/falkon/c_ext/_C.so, 0x0006): symbol not found in flat namespace '__ZN2at6native14lapackCholeskyIdEEvciPT_iPi'
What's the Python version requirement for Falkon? Is it 3.6? I haven't seen it mentioned anywhere but in setup.py I see that 3.6 is a requirement. Am I right?
Thanks.
Hey,
After checking kernel.py, I think it should be possible to write a custom kernel and use it in Falkon, but I'm not sure about the conditions under which I'll get proper speedups using my custom kernel. Let me elaborate on the kernel that I have:
Suppose that given a pair of datapoints (x1, x2)
, my kernel is deterministic, meaning that I have a function to compute k(x1, x2)
directly (not super fast, and not trivial to compute, but deterministic). Thus, I think in this case, no training (.fit
) is needed in Falkon, am I right? Moreover, I'm not able to write the KeOps routine to compute my kernel (thus, if I use DiffKernel
as my parent class, I'm not able to write _keops_mmv_impl
, and I have to set KeOps to False).
I'd like to know if in such a case, do I need to compute the full n^2
kernel matrix to compute the KRR predictions, or my space and time complexity will be O(n sqrt(n))
?
P.S: In my case, kernel computation is expensive and I'd like to minimize the number of kernel computation calls.
Thanks for the great work!
With this combination the wheel expects torch_cuda_cu.so
and torch_cuda_cpp.so
(i.e. the pytorch used for building had a split-libraries option enabled), but when installing torch with conda, only torch_cuda.so
is available (pytorch built without split-libraries option).
Hello, I wanted to reinstall Falkon on a linux server (because it cound not find Cuda). I deleted the environment and followed all the installation step again.
Unfortunately, now I keep having the following error:
"ImportError: Failed to find C-extension. Please recompile Falkon."
Thanks a lot,
Clement
It is extremely slow. Example:
#!/usr/bin/env python3
from falkon import Falkon
from falkon.kernels import GaussianKernel
from falkon.options import FalkonOptions
import numpy as np
import time
import torch
def build_dataset():
X = torch.rand(10000,28)
f = lambda x: torch.sin(x)
Y = f(X)
return X,Y
def single_sigma():
sigma = 2.8
lam = 1e-5
ITERS = 10
SEED= 4242
config = {
'kernel': GaussianKernel(sigma=sigma),
'penalty': lam,
'M': 200,
'maxiter': ITERS,
'seed': SEED,
'options': FalkonOptions()
}
return Falkon(**config)
def multi_sigma():
sigma = torch.tensor([2.8 for _ in range(28)])
lam = 1e-5
ITERS = 10
SEED= 4242
config = {
'kernel': GaussianKernel(sigma=sigma),
'penalty': lam,
'M': 200,
'maxiter': ITERS,
'seed': SEED,
'options': FalkonOptions()
}
return Falkon(**config)
def multi_sigma_matrix():
sigma = 2.8 * torch.eye(28,28)
lam = 1e-5
ITERS = 10
SEED= 4242
config = {
'kernel': GaussianKernel(sigma=sigma),
'penalty': lam,
'M': 200,
'maxiter': ITERS,
'seed': SEED,
'options': FalkonOptions()
}
return Falkon(**config)
def test_fit(X,Y, flk):
st = time.time()
flk.fit(X, Y)
end = time.time()
return end - st
X, Y = build_dataset()
print("[->] Single sigma => dataset fitted in {} seconds".format(test_fit(X, Y, single_sigma())))
print("[->] Multi sigma => dataset fitted in {} seconds".format(test_fit(X, Y, multi_sigma())))
print("[->] Multi sigma (using a matrix with sigmas in the diagonal) => dataset fitted in {} seconds".format(test_fit(X, Y, multi_sigma_matrix())))
Hi again!
Thanks again for the help last time.
This time, I'd like to replace the falkon.mmv_ops in the InCoreFalkon solver with a homemade mmv_ops for a research project.
Wondering what is the "cleanest" and simplest way to do this?
Thank you!
Best regards,
Robert
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.