Code Monkey home page Code Monkey logo

rapidsai / raft Goto Github PK

View Code? Open in Web Editor NEW
612.0 26.0 168.0 14.95 MB

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Home Page: https://docs.rapids.ai/api/raft/stable/

License: Apache License 2.0

Shell 0.46% Python 3.25% CMake 0.99% C++ 15.76% Cuda 64.32% Cython 2.87% C 0.21% HTML 0.02% Perl 0.01% Jupyter Notebook 12.10% Dockerfile 0.01%
anns building-blocks clustering cuda distance gpu information-retrieval linear-algebra llm machine-learning

raft's Introduction

 RAFT: Reusable Accelerated Functions and Tools for Vector Search and More

Important

The vector search and clustering algorithms in RAFT are being migrated to a new library dedicated to vector search called cuVS. We will continue to support the vector search algorithms in RAFT during this move, but will no longer update them after the RAPIDS 24.06 (June) release. We plan to complete the migration by RAPIDS 24.08 (August) release.

RAFT tech stack

Contents


  1. Useful Resources
  2. What is RAFT?
  3. Use cases
  4. Is RAFT right for me?
  5. Getting Started
  6. Installing RAFT
  7. Codebase structure and contents
  8. Contributing
  9. References

Useful Resources

What is RAFT?

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

By taking a primitives-based approach to algorithm development, RAFT

  • accelerates algorithm construction time
  • reduces the maintenance burden by maximizing reuse across projects, and
  • centralizes core reusable computations, allowing future optimizations to benefit all algorithms that use them.

While not exhaustive, the following general categories help summarize the accelerated functions in RAFT:

Category Accelerated Functions in RAFT
Nearest Neighbors vector search, neighborhood graph construction, epsilon neighborhoods, pairwise distances
Basic Clustering spectral clustering, hierarchical clustering, k-means
Solvers combinatorial optimization, iterative solvers
Data Formats sparse & dense, conversions, data generation
Dense Operations linear algebra, matrix and vector operations, reductions, slicing, norms, factorization, least squares, svd & eigenvalue problems
Sparse Operations linear algebra, eigenvalue problems, slicing, norms, reductions, factorization, symmetrization, components & labeling
Statistics sampling, moments and summary statistics, metrics, model evaluation
Tools & Utilities common tools and utilities for developing CUDA applications, multi-node multi-gpu infrastructure

RAFT is a C++ header-only template library with an optional shared library that

  1. can speed up compile times for common template types, and
  2. provides host-accessible "runtime" APIs, which don't require a CUDA compiler to use

In addition being a C++ library, RAFT also provides 2 Python libraries:

  • pylibraft - lightweight Python wrappers around RAFT's host-accessible "runtime" APIs.
  • raft-dask - multi-node multi-GPU communicator infrastructure for building distributed algorithms on the GPU with Dask.

RAFT is a C++ header-only template library with optional shared library and lightweight Python wrappers

Use cases

Vector Similarity Search

RAFT contains state-of-the-art implementations of approximate nearest neighbors search (ANNS) algorithms on the GPU, such as:

  • Brute force. Performs a brute force nearest neighbors search without an index.
  • IVF-Flat and IVF-PQ. Use an inverted file index structure to map contents to their locations. IVF-PQ additionally uses product quantization to reduce the memory usage of vectors. These methods were originally popularized by the FAISS library.
  • CAGRA (Cuda Anns GRAph-based). Uses a fast ANNS graph construction and search implementation optimized for the GPU. CAGRA outperforms state-of-the art CPU methods (i.e. HNSW) for large batch queries, single queries, and graph construction time.

Projects that use the RAFT ANNS algorithms for accelerating vector search include: Milvus, Redis, and Faiss.

Please see the example Jupyter notebook to get started RAFT for vector search in Python.

Information Retrieval

RAFT contains a catalog of reusable primitives for composing algorithms that require fast neighborhood computations, such as

  1. Computing distances between vectors and computing kernel gramm matrices
  2. Performing ball radius queries for constructing epsilon neighborhoods
  3. Clustering points to partition a space for smaller and faster searches
  4. Constructing neighborhood "connectivities" graphs from dense vectors

Machine Learning

RAFT's primitives are used in several RAPIDS libraries, including cuML, cuGraph, and cuOpt to build many end-to-end machine learning algorithms that span a large spectrum of different applications, including

  • data generation
  • model evaluation
  • classification and regression
  • clustering
  • manifold learning
  • dimensionality reduction.

RAFT is also used by the popular collaborative filtering library implicit for recommender systems.

Is RAFT right for me?

RAFT contains low-level primitives for accelerating applications and workflows. Data source providers and application developers may find specific tools -- like ANN algorithms -- very useful. RAFT is not intended to be used directly by data scientists for discovery and experimentation. For data science tools, please see the RAPIDS website.

Getting started

RAPIDS Memory Manager (RMM)

RAFT relies heavily on RMM which eases the burden of configuring different allocation strategies globally across the libraries that use it.

Multi-dimensional Arrays

The APIs in RAFT accept the mdspan multi-dimensional array view for representing data in higher dimensions similar to the ndarray in the Numpy Python library. RAFT also contains the corresponding owning mdarray structure, which simplifies the allocation and management of multi-dimensional data in both host and device (GPU) memory.

The mdarray forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:

#include <raft/core/device_mdarray.hpp>

int n_rows = 10;
int n_cols = 10;

auto scalar = raft::make_device_scalar<float>(handle, 1.0);
auto vector = raft::make_device_vector<float>(handle, n_cols);
auto matrix = raft::make_device_matrix<float>(handle, n_rows, n_cols);

C++ Example

Most of the primitives in RAFT accept a raft::device_resources object for the management of resources which are expensive to create, such CUDA streams, stream pools, and handles to other CUDA libraries like cublas and cusolver.

The example below demonstrates creating a RAFT handle and using it with device_matrix and device_vector to allocate memory, generating random clusters, and computing pairwise Euclidean distances:

#include <raft/core/device_resources.hpp>
#include <raft/core/device_mdarray.hpp>
#include <raft/random/make_blobs.cuh>
#include <raft/distance/distance.cuh>

raft::device_resources handle;

int n_samples = 5000;
int n_features = 50;

auto input = raft::make_device_matrix<float, int>(handle, n_samples, n_features);
auto labels = raft::make_device_vector<int, int>(handle, n_samples);
auto output = raft::make_device_matrix<float, int>(handle, n_samples, n_samples);

raft::random::make_blobs(handle, input.view(), labels.view());

auto metric = raft::distance::DistanceType::L2SqrtExpanded;
raft::distance::pairwise_distance(handle, input.view(), input.view(), output.view(), metric);

It's also possible to create raft::device_mdspan views to invoke the same API with raw pointers and shape information:

#include <raft/core/device_resources.hpp>
#include <raft/core/device_mdspan.hpp>
#include <raft/random/make_blobs.cuh>
#include <raft/distance/distance.cuh>

raft::device_resources handle;

int n_samples = 5000;
int n_features = 50;

float *input;
int *labels;
float *output;

...
// Allocate input, labels, and output pointers
...

auto input_view = raft::make_device_matrix_view(input, n_samples, n_features);
auto labels_view = raft::make_device_vector_view(labels, n_samples);
auto output_view = raft::make_device_matrix_view(output, n_samples, n_samples);

raft::random::make_blobs(handle, input_view, labels_view);

auto metric = raft::distance::DistanceType::L2SqrtExpanded;
raft::distance::pairwise_distance(handle, input_view, input_view, output_view, metric);

Python Example

The pylibraft package contains a Python API for RAFT algorithms and primitives. pylibraft integrates nicely into other libraries by being very lightweight with minimal dependencies and accepting any object that supports the __cuda_array_interface__, such as CuPy's ndarray. The number of RAFT algorithms exposed in this package is continuing to grow from release to release.

The example below demonstrates computing the pairwise Euclidean distances between CuPy arrays. Note that CuPy is not a required dependency for pylibraft.

import cupy as cp

from pylibraft.distance import pairwise_distance

n_samples = 5000
n_features = 50

in1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
in2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)

output = pairwise_distance(in1, in2, metric="euclidean")

The output array in the above example is of type raft.common.device_ndarray, which supports cuda_array_interface making it interoperable with other libraries like CuPy, Numba, PyTorch and RAPIDS cuDF that also support it. CuPy supports DLPack, which also enables zero-copy conversion from raft.common.device_ndarray to JAX and Tensorflow.

Below is an example of converting the output pylibraft.device_ndarray to a CuPy array:

cupy_array = cp.asarray(output)

And converting to a PyTorch tensor:

import torch

torch_tensor = torch.as_tensor(output, device='cuda')

Or converting to a RAPIDS cuDF dataframe:

cudf_dataframe = cudf.DataFrame(output)

When the corresponding library has been installed and available in your environment, this conversion can also be done automatically by all RAFT compute APIs by setting a global configuration option:

import pylibraft.config
pylibraft.config.set_output_as("cupy")  # All compute APIs will return cupy arrays
pylibraft.config.set_output_as("torch") # All compute APIs will return torch tensors

You can also specify a callable that accepts a pylibraft.common.device_ndarray and performs a custom conversion. The following example converts all output to numpy arrays:

pylibraft.config.set_output_as(lambda device_ndarray: return device_ndarray.copy_to_host())

pylibraft also supports writing to a pre-allocated output array so any __cuda_array_interface__ supported array can be written to in-place:

import cupy as cp

from pylibraft.distance import pairwise_distance

n_samples = 5000
n_features = 50

in1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
in2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
output = cp.empty((n_samples, n_samples), dtype=cp.float32)

pairwise_distance(in1, in2, out=output, metric="euclidean")

Installing

RAFT's C++ and Python libraries can both be installed through Conda and the Python libraries through Pip.

Installing C++ and Python through Conda

The easiest way to install RAFT is through conda and several packages are provided.

  • libraft-headers C++ headers
  • libraft (optional) C++ shared library containing pre-compiled template instantiations and runtime API.
  • pylibraft (optional) Python library
  • raft-dask (optional) Python library for deployment of multi-node multi-GPU algorithms that use the RAFT raft::comms abstraction layer in Dask clusters.
  • raft-ann-bench (optional) Benchmarking tool for easily producing benchmarks that compare RAFT's vector search algorithms against other state-of-the-art implementations.
  • raft-ann-bench-cpu (optional) Reproducible benchmarking tool similar to above, but doesn't require CUDA to be installed on the machine. Can be used to test in environments with competitive CPUs.

Use the following command, depending on your CUDA version, to install all of the RAFT packages with conda (replace rapidsai with rapidsai-nightly to install more up-to-date but less stable nightly packages). mamba is preferred over the conda command.

# for CUDA 11.8
mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft cuda-version=11.8
# for CUDA 12.0
mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft cuda-version=12.0

Note that the above commands will also install libraft-headers and libraft.

You can also install the conda packages individually using the mamba command above. For example, if you'd like to install RAFT's headers and pre-compiled shared library to use in your project:

# for CUDA 12.0
mamba install -c rapidsai -c conda-forge -c nvidia libraft libraft-headers cuda-version=12.0

If installing the C++ APIs please see using libraft for more information on using the pre-compiled shared library. You can also refer to the example C++ template project for a ready-to-go CMake configuration that you can drop into your project and build against installed RAFT development artifacts above.

Installing Python through Pip

pylibraft and raft-dask both have experimental packages that can be installed through pip:

pip install pylibraft-cu11 --extra-index-url=https://pypi.nvidia.com
pip install raft-dask-cu11 --extra-index-url=https://pypi.nvidia.com

These packages statically build RAFT's pre-compiled instantiations and so the C++ headers and pre-compiled shared library won't be readily available to use in your code.

The build instructions contain more details on building RAFT from source and including it in downstream projects. You can also find a more comprehensive version of the above CPM code snippet the Building RAFT C++ and Python from source section of the build instructions.

You can find an example RAFT project template in the cpp/template directory, which demonstrates how to build a new application with RAFT or incorporate RAFT into an existing CMake project.

Contributing

If you are interested in contributing to the RAFT project, please read our Contributing guidelines. Refer to the Developer Guide for details on the developer guidelines, workflows, and principals.

References

When citing RAFT generally, please consider referencing this Github project.

@misc{rapidsai,
  title={Rapidsai/raft: RAFT contains fundamental widely-used algorithms and primitives for data science, Graph and machine learning.},
  url={https://github.com/rapidsai/raft},
  journal={GitHub},
  publisher={Nvidia RAPIDS},
  author={Rapidsai},
  year={2022}
}

If citing the sparse pairwise distances API, please consider using the following bibtex:

@article{nolet2021semiring,
  title={Semiring primitives for sparse neighborhood methods on the gpu},
  author={Nolet, Corey J and Gala, Divye and Raff, Edward and Eaton, Joe and Rees, Brad and Zedlewski, John and Oates, Tim},
  journal={arXiv preprint arXiv:2104.06357},
  year={2021}
}

If citing the single-linkage agglomerative clustering APIs, please consider the following bibtex:

@misc{nolet2023cuslink,
      title={cuSLINK: Single-linkage Agglomerative Clustering on the GPU},
      author={Corey J. Nolet and Divye Gala and Alex Fender and Mahesh Doijade and Joe Eaton and Edward Raff and John Zedlewski and Brad Rees and Tim Oates},
      year={2023},
      eprint={2306.16354},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

If citing CAGRA, please consider the following bibtex:

@misc{ootomo2023cagra,
      title={CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs},
      author={Hiroyuki Ootomo and Akira Naruse and Corey Nolet and Ray Wang and Tamas Feher and Yong Wang},
      year={2023},
      eprint={2308.15136},
      archivePrefix={arXiv},
      primaryClass={cs.DS}
}

If citing the k-selection routines, please consider the following bibtex:

@proceedings{10.1145/3581784,
    title = {SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
    year = {2023},
    isbn = {9798400701092},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    abstract = {Started in 1988, the SC Conference has become the annual nexus for researchers and practitioners from academia, industry and government to share information and foster collaborations to advance the state of the art in High Performance Computing (HPC), Networking, Storage, and Analysis.},
    location = {, Denver, CO, USA, }
}

If citing the nearest neighbors descent API, please consider the following bibtex:

@inproceedings{10.1145/3459637.3482344,
    author = {Wang, Hui and Zhao, Wan-Lei and Zeng, Xiangxiang and Yang, Jianye},
    title = {Fast K-NN Graph Construction by GPU Based NN-Descent},
    year = {2021},
    isbn = {9781450384469},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3459637.3482344},
    doi = {10.1145/3459637.3482344},
    abstract = {NN-Descent is a classic k-NN graph construction approach. It is still widely employed in machine learning, computer vision, and information retrieval tasks due to its efficiency and genericness. However, the current design only works well on CPU. In this paper, NN-Descent has been redesigned to adapt to the GPU architecture. A new graph update strategy called selective update is proposed. It reduces the data exchange between GPU cores and GPU global memory significantly, which is the processing bottleneck under GPU computation architecture. This redesign leads to full exploitation of the parallelism of the GPU hardware. In the meantime, the genericness, as well as the simplicity of NN-Descent, are well-preserved. Moreover, a procedure that allows to k-NN graph to be merged efficiently on GPU is proposed. It makes the construction of high-quality k-NN graphs for out-of-GPU-memory datasets tractable. Our approach is 100-250\texttimes{} faster than the single-thread NN-Descent and is 2.5-5\texttimes{} faster than the existing GPU-based approaches as we tested on million as well as billion scale datasets.},
    booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
    pages = {1929–1938},
    numpages = {10},
    keywords = {high-dimensional, nn-descent, gpu, k-nearest neighbor graph},
    location = {Virtual Event, Queensland, Australia},
    series = {CIKM '21}
}

raft's People

Contributors

achirkin avatar afender avatar ahendriksen avatar ajschmidt8 avatar aschaffer avatar ayodeawe avatar bdice avatar benfred avatar cjnolet avatar dantegd avatar divyegala avatar enp1s0 avatar galipremsagar avatar gputester avatar hlinsen avatar jjacobelli avatar lowener avatar mdoijade avatar mfoerste4 avatar nyrio avatar raydouglass avatar robertmaynard avatar seunghwak avatar teju85 avatar tfeher avatar trivialfis avatar trxcllnt avatar viclafargue avatar vyasr avatar wphicks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raft's Issues

[BUG] Cleanup C++ warnings introduced by RAFT headers

ubuntu16.04 and 18.04
cuda10.0 to 10.2
py 3.6 and 3.7

 // #define CUBLAS_CHECK_NO_THROW(call)                                          \

 ^

In file included from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp:35:0,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/graph.hpp:21,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/algorithms.hpp:18,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/tests/centrality/katz_centrality_test.cu:2:

/opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/linalg/cusolver_wrappers.h:60:1: warning: multi-line comment [-Wcomment]

 // #define CUSOLVER_CHECK_NO_THROW(call)                                          \

 ^

In file included from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp:36:0,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/graph.hpp:21,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/algorithms.hpp:18,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/tests/centrality/katz_centrality_test.cu:2:

/opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/sparse/cusparse_wrappers.h:61:1: warning: multi-line comment [-Wcomment]

 // #define CUSPARSE_CHECK_NO_THROW(call)                                          \

 ^

In file included from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp:34:0,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/graph.hpp:21,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/algorithms.hpp:18,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/tests/centrality/katz_centrality_test.cu:2:

/opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/linalg/cublas_wrappers.h:60:1: warning: multi-line comment [-Wcomment]

 // #define CUBLAS_CHECK_NO_THROW(call)                                          \

 ^

In file included from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp:35:0,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/graph.hpp:21,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/algorithms.hpp:18,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/tests/centrality/katz_centrality_test.cu:2:

/opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/linalg/cusolver_wrappers.h:60:1: warning: multi-line comment [-Wcomment]

 // #define CUSOLVER_CHECK_NO_THROW(call)                                          \

 ^

In file included from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp:36:0,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/graph.hpp:21,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/include/algorithms.hpp:18,

                 from /opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/tests/centrality/katz_centrality_test.cu:2:

/opt/conda/envs/rapids/conda-bld/libcugraph_1591999335369/work/cpp/build/raft/src/raft/cpp/include/raft/sparse/cusparse_wrappers.h:61:1: warning: multi-line comment [-Wcomment]

 // #define CUSPARSE_CHECK_NO_THROW(call)                                          \

 ^

[BUG] raft::linalg::reduce won't compile when InType != OutType

Describe the bug
I am reducing a boolean matrix to an integer vector (cf code sample in the section below).
I only need coalescedReduction, but the reduce function instantiates stridedReduction too and this is where the problem happens, at line 150 of strided_reduction.cuh: the if statement is not determined at compile time (it's not a C++17 if constexpr) so it attempts to compile stridedSummationKernel even if the input and output types are different, in which case it can't compile.

Note that at the moment I solved the problem for my use case by calling directly coalescedReduction instead of reduce.

Steps/Code to reproduce bug
This is approximately what my code looks like:

bool* in;
int* out;
raft::linalg::reduce(
  out, in, N, N, 0, true, true, stream, false,
  [] __device__(bool value, int idx) {
    return static_cast<int>(value);
  },
  raft::Sum<int>(), raft::Nop<int>());

Compilation error:

include/raft/linalg/strided_reduction.cuh(152): error: no instance of function template "raft::linalg::stridedSummationKernel" matches the argument list
argument types are: (int *, const __nv_bool *, int, int, int, lambda [](__nv_bool, int)->int)
detected during: instantiation of "void raft::linalg::stridedReduction(OutType *, const InType *, IdxType, IdxType, OutType, cudaStream_t, __nv_bool, MainLambda, ReduceLambda, FinalLambda) [with InType=__nv_bool, OutType=int, IdxType=int, MainLambda=lambda [](__nv_bool, int)->int, ReduceLambda=raft::Sum<int>, FinalLambda=raft::Nop<int, int>]"

Expected behavior
The reduction should support different input and output types as suggested by the template parameters and the docs.

[BUG] handle_t destructor throws exceptions

Describe the bug

The raft::handle_t destructor calls destroy_resources(), which can throw a variety of exceptions. It's hard/impossible to catch those if stack unwinding is in progress. I see a TODO to use NO_THROW and enable logging, but personally I don't like to see log messages about failed dtors due to a previous error.

void destroy_resources() {
///@todo: enable *_NO_THROW variants once we have enabled logging

Is it necessary to check the destroy calls at all?

Example code that terminates
inline void transpose(raft::handle_t& handle, float* out, float* in, size_t n_rows, size_t n_cols) {
  const float alpha = 1.0f;
  const float beta = 0.0f;
  const int lda = n_rows;
  const int ldb = n_cols;
  const int ldc = lda;
  DeviceBuffer<float> garbage(1);
  auto tStatus = raft::linalg::cublasgeam(handle.get_cublas_handle(),
                              CUBLAS_OP_T,
                              CUBLAS_OP_N,
                              n_cols, n_rows,
                              &alpha, in, lda,
                              &beta, garbage.Data(), ldb,
                              out, ldc,
                              handle.get_stream());
  if (tStatus != CUBLAS_STATUS_SUCCESS)
    throw std::runtime_error(raft::linalg::detail::cublas_error_to_string(tStatus));
}

[DOC] Build Instructions for cpp incorrect

Report incorrect documentation

The docs for building point to a private repo not to this one. When I change to use a private repo it only brings over raft.hpp using the cmake provided. I ended up giving up having it use externalProject_add since it did not have the files it needed for some reason and just ended up pointing directly to a version of the repository I just manually cloned using RAFT_DIR enverionment variable.

Location of incorrect documentation
https://github.com/rapidsai/raft/blob/branch-0.15/BUILD.md

Describe the problems or issues found in the documentation
The private repo link.
The fact that using the cmake instructions provided did not leave my include directory in a state that could be used.
You don't set RAFT_INCLUDE_DIR in the case that RAFT_PATH was set if I am not mistaken

Steps taken to verify documentation is incorrect
Tried it out

Suggested fix for documentation
Put the right link in.
Make sure cmake works whether or not you set or do not set RAFT_PATH

[FEA] Move k-means from cuml to RAFT

cuML exposes a Kmeans feature. Another Kmeans implementation has been introduced as part of moving Spectral clustering to RAFT.
We should avoid multiplying implementations for the same feature.

We could move cuML's kmeans backend to RAFT and use it in spectral clustering (and potentially in cuGraph as well later on).

[FEA] Need to integrate device property queries in handle.hpp and cudart_utils.cuh

Is your feature request related to a problem? Please describe.
RAFT handle.hpp stores the entire set of device properties at the beginning to make future queries low cost but cudart_utils.h does not exploit this and individually query device properties again.

const cudaDeviceProp& get_device_properties() const {

inline int get_shared_memory_per_block() {

inline int get_multi_processor_count() {

Describe the solution you'd like
Better integrate these two; either hanlde_t provides device property query functions or cudart_utils.cuh call internally call handle functions.

[FEA] Deploy raft globally

Currently, each repo carries its own RAFT under its own include directory. The versions of the different copies are not necessarily the same.

Ideally, we would adopt RMM style for global deployment of RAFT.

That mostly requires

  • always supporting the latest raft version
  • shipping a new conda package

[BUG] Invalid write in raft::sparse::distance::classic_csr_semiring_spmv_smem_kernel

Describe the bug
I ran into an error while testing pairwise distance caused by an invalid write (according to cuda-memcheck) in raft::sparse::distance::classic_csr_semiring_spmv_smem_kernel

Steps/Code to reproduce bug
Easy python code that you can use on my branch of cuml implementing pairwise dist API: https://github.com/lowener/cuml/tree/019-expose-spmv

import cupyx
import cupy as cp
from cuml.metrics import pairwise_distances as pd

X = cupyx.scipy.sparse.random(20, 10000, dtype=cp.float64, random_state=123, density=0.01)
pd(X, metric='l1')

Environment details (please complete the following information):

  • Environment location: Cloud
  • Method of RAFT install:
    • docker run --gpus all rapidsai/rapidsai-dev-nightly:0.19-cuda11.0-devel-ubuntu20.04-py3.8
    • I'm joining cuml print_env.sh results: env.txt

Additional context
Core dump message:

terminate called after throwing an instance of 'raft::cuda_error'
  what():  CUDA error encountered at: file=/rapids/cuml/cpp/build/raft/src/raft/cpp/include/raft/handle.hpp line=270: call='cudaEventDestroy(event_)', Reason=cudaErrorMisalignedAddress:misaligned address
Obtained 28 stack frames
#0 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f52e024cf96]
#1 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x69) [0x7f52e024d6b9]
#2 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft8handle_t17destroy_resourcesEv+0x5dd) [0x7f52e024e30d]
#3 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(_ZN4raft8handle_tD1Ev+0x30) [0x7f52e024e4d0]
#4 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/raft/common/handle.cpython-38-x86_64-linux-gnu.so(+0x2bf69) [0x7f52e0247f69]
#5 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/metrics/pairwise_distances.cpython-38-x86_64-linux-gnu.so(+0x31b98) [0x7f52ac052b98]
#6 in python(PyObject_Call+0x255) [0x55fde3dd22b5]
#7 in python(_PyEval_EvalFrameDefault+0x21c1) [0x55fde3e7ede1]
#8 in python(_PyEval_EvalCodeWithName+0x2c3) [0x55fde3e5d503]
#9 in python(_PyFunction_FastCallDict+0x1b2) [0x55fde3d792e8]
#10 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/metrics/pairwise_distances.cpython-38-x86_64-linux-gnu.so(+0x2500c) [0x7f52ac04600c]
#11 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/metrics/pairwise_distances.cpython-38-x86_64-linux-gnu.so(+0x280f0) [0x7f52ac0490f0]
#12 in python(PyObject_Call+0x255) [0x55fde3dd22b5]
#13 in python(_PyEval_EvalFrameDefault+0x21c1) [0x55fde3e7ede1]
#14 in python(_PyEval_EvalCodeWithName+0x2c3) [0x55fde3e5d503]
#15 in python(_PyFunction_Vectorcall+0x378) [0x55fde3e5e8d8]
#16 in python(_PyEval_EvalFrameDefault+0x1782) [0x55fde3e7e3a2]
#17 in python(_PyEval_EvalCodeWithName+0x2c3) [0x55fde3e5d503]
#18 in python(PyEval_EvalCodeEx+0x39) [0x55fde3e5e559]
#19 in python(PyEval_EvalCode+0x1b) [0x55fde3f019ab]
#20 in python(+0x254a43) [0x55fde3f01a43]
#21 in python(+0x26e6b3) [0x55fde3f1b6b3]
#22 in python(+0x2735b2) [0x55fde3f205b2]
#23 in python(PyRun_SimpleFileExFlags+0x1b2) [0x55fde3f20792]
#24 in python(Py_RunMain+0x36d) [0x55fde3f20d0d]
#25 in python(Py_BytesMain+0x39) [0x55fde3f20ec9]
#26 in /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f53b7b610b3]
#27 in python(+0x1e9369) [0x55fde3e96369]

Aborted (core dumped)

And here's the output of cuda memcheck:
pairwisedist_cudamemcheck.txt

[FEA] Draw a worker stream from a parent handle and set it as the user stream​ in a returned handle

Is your feature request related to a problem? Please describe.
Many functions (including all graph prims) accept a handle and execute on the handle's user stream (accessed by handle_t.get_stream()​). This is an issue if the caller is also provided with a single handle but wants to run these functions concurrently. We need to leverage the pool of worker streams of the parent handle without having to change the signature and internals of low-level functions.

Describe the solution you'd like
One way is to add a raft::handle_t get_handle_from_pool feature that draws a worker stream from the parent handle and set it as the user stream​ of the returned handle.

Describe alternatives you've considered
Pass streams explicitly or rewrite prims to access worker streams from the handle

Additional context
Needed for rapidsai/cugraph#957

[FEA] also reconcile error_utils from cugraph

Is your feature request related to a problem? Please describe.
As mentioned by @afender here, it would be nice to also merge common parts of cugraph into RAFT, especially the error_utils component.

Describe the solution you'd like
RAFT error_utils must be an intersection of commonalities between cuml and cugraph.

Describe alternatives you've considered
For the initial version, it consists of stuff from cuml only!

[BUG] Rename "master" branch.

Decades from now, will history students wonder why an entire industry decided to rename a core part of its infrastructure, or will they wonder why it was named that way to begin with?

Let's rename the "master" branch.

rapidsai/cudf#5560

Test color array in MST

We should leverage MST color array in the process of connecting knn graphs. This will allow skipping distance computation for vertices in the same component.

We did plan for this UI in MST but did not spend much time on testing/validation since it wasn't exposed to the end-user.

Now is a good time to add a test and validate this.

+cc @divyegala @cjnolet

[BUG] RAFT header conflicts

Describe the bug
RAFT headers do not play nicely with some other RAPIDS libraries because of duplicated names

in rapids-js, just doing

include <cugraph/graph.hpp>

results in:

./../../.cache/cpm/raft/39a4ee56e5ca66c2be3eb2b78767b9d0d46759cc/cpp/include/raft/cudart_utils.h:52: error: "CUDA_TRY" redefined [-Werror]
   52 | #define CUDA_TRY(call)                                                        \
      | 
In file included from ../../../cudf/src/node_cudf/utilities/error.hpp:20,
                 from ../../../cudf/src/node_cudf/scalar.hpp:17,
                 from ../../../cudf/src/node_cudf/column.hpp:17,
                 from ../../src/node_cugraph/graph_coo.hpp:17,
                 from ../../src/graph_coo.cpp:15:
../../../.cache/cpm/cudf/6bbbff12b33dfcdeef951b16babffd1472c03bdb/cpp/include/cudf/utilities/error.hpp:102: note: this is the location of the previous definition
  102 | #define CUDA_TRY(call)                                            \
      | 

Currently to mitigate we have an internal/graph.hpp header to do this:

#ifdef CUDA_TRY
#undef CUDA_TRY
#endif
#ifdef CHECK_CUDA
#undef CHECK_CUDA
#endif
#include <cugraph/graph.hpp>
#ifdef CHECK_CUDA
#undef CHECK_CUDA
#endif
#ifdef CUDA_TRY
#undef CUDA_TRY
#endif

But it would be better if these RAPIDS tools could coordinate to avoid these clashes.

cc @trxcllnt

[BUG] compiler warnings (due to using deprecated functions)

Describe the bug
cuSparseS(or D)gthr, cuSparseS(or D)gemmi, and rmm::mr::get_default_resource are deprecated. RAFT uses this and this generates messy warnings; this makes finding real problems harder.

/home/seunghwak/RAPIDS/development/cugraph/cpp/build/raft/src/raft/cpp/include/raft/sparse/cusparse_wrappers.h: In function ‘cusparseStatus_t raft::sparse::cusparsegthr(cusparseHandle_t, int, const T*, T*, int*, cudaStream_t) [with T = double; cusparseHandle_t = cusparseContext*; cudaStream_t = CUstream_st*]’:
/home/seunghwak/RAPIDS/development/cugraph/cpp/build/raft/src/raft/cpp/include/raft/sparse/cusparse_wrappers.h:130:48: warning: ‘cusparseStatus_t cusparseDgthr(cusparseHandle_t, int, const double*, double*, const int*, cusparseIndexBase_t)’ is deprecated: please use cusparseGather instead [-Wdeprecated-declarations]
                        CUSPARSE_INDEX_BASE_ZERO);
/home/seunghwak/RAPIDS/development/cugraph/cpp/build/raft/src/raft/cpp/include/raft/sparse/cusparse_wrappers.h: In function ‘cusparseStatus_t raft::sparse::cusparsegemmi(cusparseHandle_t, int, int, int, int, const T*, const T*, int, const T*, const int*, const int*, const T*, T*, int, cudaStream_t) [with T = float; cusparseHandle_t = cusparseContext*; cudaStream_t = CUstream_st*]’:
/home/seunghwak/RAPIDS/development/cugraph/cpp/build/raft/src/raft/cpp/include/raft/sparse/cusparse_wrappers.h:211:61: warning: ‘cusparseStatus_t cusparseSgemmi(cusparseHandle_t, int, int, int, int, const float*, const float*, int, const float*, const int*, const int*, const float*, float*, int)’ is deprecated: please use cusparseSpMM instead [-Wdeprecated-declarations]
                         cscColPtrB, cscRowIndB, beta, C, ldc);
/home/seunghwak/RAPIDS/development/cugraph/cpp/build/raft/src/raft/cpp/include/raft/mr/device/allocator.hpp: In member function ‘virtual void* raft::mr::device::default_allocator::allocate(std::size_t, cudaStream_t)’:
/home/seunghwak/RAPIDS/development/cugraph/cpp/build/raft/src/raft/cpp/include/raft/mr/device/allocator.hpp:40:26: warning: ‘rmm::mr::device_memory_resource* rmm::mr::get_default_resource()’ is deprecated [-Wdeprecated-declarations]
     void* ptr = rmm::mr::get_default_resource()->allocate(n, stream);

Steps/Code to reproduce bug
Copmile with CUDA11.

Expected behavior
No warnings.

[FEA] Add bounded_queue data structure

In work on rapidsai/cuml#3410, I introduced a bounded queue for performance reasons. This is a general enough data structure with a common enough use case that it should probably be made available in RAFT.

The basic idea of this data structure is that when you know that there is some reasonable upper limit on the size of your queue and you can get an approximate sense of what that limit is before you begin filling it (or even an approximate sense that you update periodically as you fill it), then it is better to simply store the data in a vector and keep track of a "front" and "end" index into that vector than to use a stdlib queue. This is primarily because the C++ standard for stdlib queue has certain requirements on when memory gets de- or re-allocated that can result in a non-optimal (in terms of runtime) memory allocation strategy when you know that running out of memory for the underlying vector would not be a problem.

[FEA] lap_kernels.cu numeric_limits/infinity

Once cuda 10.2 is the minimum supported version of cuda, we could modify lap_kernels.cu to use cuda::std::numeric_limits<weight_t>::max() instead of passing infinity into the kernels.

[FEA] Interface support to reserve an address space without actually allocating it

Describe the solution you'd like
@seunghwak has a very good point here about having a support for reserving memory (aka over-subscription) using CUDA's virtual memory management APIs. I believe this is a good improvement to our existing Allocator, device_buffer and host_buffer interfaces. Thus, filing this issue so that this feature item is not lost.

Additional context
Ref: https://devblogs.nvidia.com/introducing-low-level-gpu-virtual-memory-management/

[FEA] Logging abstraction

We have started some discussions about this already but want to Formalize the requirement for it.

We need to standardize upon some abstraction / API for logging where the actual logging implementation can be easily plugged in underneath.

As we are hoping this repository can someday be used by additional projects, aside from cuml and cugraph, it would be beneficial to not assume too much about the actual logging implementation that will be used.

[BUG] Lanczos solver isn't reproducible even with seed

We noticed this specifically on CUDA 11.2. It looks like the spectral clustering and lanczos solver aren't returning the same results when given the same input and seed. We noticed this in UMAP specifically in cuML.

[FEA] Update RAFT comms to use ncclSend/ncclRecv for P2P

Is your feature request related to a problem? Please describe.
Currently, RAFT uses UCX for P2P and UCX does not take cudaStream_t. This requires additional cudaStreamSynchronize() operations which lower performance and also easy to forget leading to very tricky to debug timing bugs.

Describe the solution you'd like
Use ncclSend/ncclRecv for P2P with device memory. Use UCX only for P2P with host memory.

https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/p2p.html

[FEA] Sketching device function.

Aside from being reusable and useful on its own, it is specifically going to be useful for some upcoming algorithms, such as count-min sketch-based sparse approx nearest neighbors and for the approximate construction of massive sparse count matrices.

It will be very useful to have some sketching primitives / device functions that we can use to build these algorithms.

[BUG] Fix clang-tidy errors

Describe the bug
In the past we had to disable parsing of .cu files in run-clang-tidy.py due to difficulties in trying to get them working properly. So, we never tidied our cuda source files!

Steps/Code to reproduce bug
Today, if we path this script with the following change:

diff --git a/cpp/scripts/run-clang-tidy.py b/cpp/scripts/run-clang-tidy.py
index 23260d2..fdb0abc 100644
--- a/cpp/scripts/run-clang-tidy.py
+++ b/cpp/scripts/run-clang-tidy.py
@@ -36,7 +36,7 @@ def parse_args():
                            help="Path to cmake-generated compilation database")
     argparser.add_argument("-exe", type=str, default="clang-tidy",
                            help="Path to clang-tidy exe")
-    argparser.add_argument("-ignore", type=str, default="[.]cu$",
+    argparser.add_argument("-ignore", type=str, default=None,
                            help="Regex used to ignore files from checking")
     argparser.add_argument("-select", type=str, default=None,
                            help="Regex used to select files for checking")
@@ -123,6 +123,10 @@ def get_tidy_args(cmd, exe):
         # replace nvcc's "-gencode ..." with clang's "--cuda-gpu-arch ..."
         archs = get_gpu_archs(command)
         command.extend(archs)
+        # clang-tooling 8.0.1 doesn't support linking with greater sm_70 archs
+        # so, we need to disable libdevice linkage, until we are in a position
+        # to upgrade clang-tooling
+        command.extend(["-nocudalib", "--no-cuda-version-check"])
         while True:
             loc = remove_item_plus_one(command, "-gencode")
             if loc < 0:

and then run the script, we get a ton of errors which we never caught before (and thus never fixed). Full list of all the errors in here: run.log

Expected behavior
tidy check should pass on all source files, irrespective of device or host code.

[FEA] Use RMM directly instead of adapters

We should use RMM’s device buffers/device uvectors directly and it has a ton of advantages over raft/cuml device buffers such as:

  1. C++11/14 features of using copy/move constructors and assign operators. This will be a blessing to work with and allow us to take ownership of data across APIs without the need to copy
  2. Allowing us to take ownership of raw pointers and storing it in buffers. This is especially beneficial as the size is stored in the buffer object and will help us minimize the confusion at least at a developer level by having to pass explicit multiple size arguments between functions and create a barrier from out of bounds accesses
  3. Using vanilla RMM shifts the dependency from us having to maintain our own buffers (with RMM under the hood) to instead allowing for RMM to maintain the API

[FEA] Time-out mechanism for communication collectives.

Is your feature request related to a problem? Please describe.
client.submit(...) can hang if one worker fails (e.g. RAFT_EXPECTS throws an exception) but the remaining workers continue and reach a communication collective (this will not return till all the workers participate in the collective).

Describe the solution you'd like
Much better to abort a job throwing an exception instead of indefinite hang. If communication collectives throw an exception on timeout (need to decide how long is too long... in particular, if the data size is large and the interconnect is slow), we can avoid hang.

Describe alternatives you've considered
In case of MPI, the entire job gets aborted if one process aborts, this model may not be applicable to Dask.

Start doc for RAFT "principals"

Just a simple outline of principals we plan to follow in RAFT to help the ML and graph teams coordinate.

  • header only
  • comms
  • priorities for different versions (eg. what will be moved and when)
  • abstraction paradigms (eg. resource management, etc...)

[FEA] Add getDeviceCapability function

Spawned from this cuML PR comment: rapidsai/cuml#3058 (comment)

This is a simple utility method that would be good to have in raft.

#include <utility>  // pair

/** helper method to get the compute capability version numbers */
inline std::pair<int, int> getDeviceCapability() {
  int devId;
  CUDA_CHECK(cudaGetDevice(&devId));
  int major, minor;
  CUDA_CHECK(
    cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, devId));
  CUDA_CHECK(
    cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, devId));
  return std::make_pair(major, minor);
}

[ENH] waive the 2B size limit in raft::comms::allgatherv

While recvcounts are already size_t, displs are still 32-bit ints.
In practice, displs are typically bigger than recvcounts.

The underlying ncclbroadcast based implementation does not limit to 32-bit so displs should be size_t.

https://www.mpich.org/static/docs/latest/www3/MPI_Allgatherv.html

https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/colls.html?highlight=ncclbroadcast

void allgatherv(const void *sendbuf, void *recvbuf, const size_t recvcounts[],

[QST] Should we include standard library?

What is your question?
I ran into an issue (which was easy to solve) when including <raft/mr/allocator.hpp>:
Basically, if you include nothing else, the standard library namespace isn't defined and the compiler doesn't know what to do with the std::size_t types in that file. It's easy enough to just include <cstddef> before including <raft/mr/allocator.hpp> but maybe it makes sense to do this directly in raft.
Should I look for these cases in raft and submit a PR?

[FEA] Remove `Euc` prefix from distance_type.h

This prefix is redundant in the distances that are actually derived from Euclidean (such as EucExpandedL2) but doesn't make sense in other distances like EucUnexpandedL1. We should probably remove this prefix since many of the other recently added metrics are not derived from Euclidean.

[FEA] Move `label/classlabels.cuh` from cuml to RAFT

The MSF currently returns a colors array with colors that are not always likely to be drawn from a monotonically increasing set.

The algorithm to connect a knn graph currently uses the colors array as-is, but the algorithm would be more efficient and save a lot of memory if the array sizes didn't depend on the max label but instead on the number of unique labels. This can be accomplished easily by re-labeling the colors array so they are drawn from a monotonically increasing set.

[FEA] Move PartDescriptor & Data abstractions from cuML

These abstractions have been used in the Dask packages of cuML to enable the representation of Dask partitions as input to our multi-node multi-GPU algorithms w/ the communicator. It would be useful for these abstractions to be placed in RAFT in order to create a more unified API for the inputs to primitives.

It would also be nice to include some Cython code for constructing and managing instances of PartDescriptor in order to ease the boilerplate needed and reduce duplication across primitives.

[FEA] Upgrade clang version to include support for cuda-11

Is your feature request related to a problem? Please describe.
We currently are using clang version 8.0.1 which doesn't even support cuda-10! This especially causes problems with clang-tidy if we are enabling tidy checks on device-side code as is being done in PR #85 .

Describe the solution you'd like
We'll have to upgrade our clang version to the one that supports cuda-11 in order to also be able to run tidy checks on our .cu source files. However, one issue that could arise due to this is from clang-format which can introduce churn due to formatting logic updates between versions. This needs to be ascertained, first.

[FEA] Add cuFFT helpers

Spawned from this cuML PR comment: rapidsai/cuml#3058 (comment)

That PR adds the below code, which would be useful in raft.

#include <cufft.h>
#include <raft/error.hpp>

namespace raft {

/**
 * @brief Exception thrown when a cuFFT error is encountered.
 */
struct cufft_error : public raft::exception {
  explicit cufft_error(char const* const message) : raft::exception(message) {}
  explicit cufft_error(std::string const& message) : raft::exception(message) {}
};

const char* getCufftErrStr(cufftResult status) {
  // https://docs.nvidia.com/cuda/cufft/index.html#cufftresult
  switch (status) {
    case CUFFT_SUCCESS:
      return "The cuFFT operation was successful.";
    case CUFFT_INVALID_PLAN:
      return "cuFFT was passed an invalid plan handle.";
    case CUFFT_ALLOC_FAILED:
      return "cuFFT failed to allocate GPU or CPU memory.";
    case CUFFT_INVALID_VALUE:
      return "User specified an invalid pointer or parameter.";
    case CUFFT_INTERNAL_ERROR:
      return "Driver or internal cuFFT library error.";
    case CUFFT_EXEC_FAILED:
      return "Failed to execute an FFT on the GPU.";
    case CUFFT_SETUP_FAILED:
      return "The cuFFT library failed to initialize.";
    case CUFFT_INVALID_SIZE:
      return "User specified an invalid transform size.";
    case CUFFT_INCOMPLETE_PARAMETER_LIST:
      return "Missing parameters in call.";
    case CUFFT_INVALID_DEVICE:
      return "Execution of a plan was on different GPU than plan creation.";
    case CUFFT_PARSE_ERROR:
      return "Internal plan database error.";
    case CUFFT_NO_WORKSPACE:
      return "No workspace has been provided prior to plan execution.";
    case CUFFT_NOT_IMPLEMENTED:
      return "Function does not implement functionality for parameters given.";
    case CUFFT_NOT_SUPPORTED:
      return "Operation is not supported for parameters given.";
    default:
      return "Unknown error.";
  }
}

/**
 * @brief Error checking macro for cuFFT functions.
 *
 * Invokes a cuFFT function. If the call does not return CUFFT_SUCCESS, throws
 * an exception detailing the error that occurred.
 */
#define CUFFT_TRY(call)                                                     \
  do {                                                                      \
    const cufftResult status = call;                                        \
    if (status != CUFFT_SUCCESS) {                                          \
      std::string msg{};                                                    \
      SET_ERROR_MSG(msg,                                                    \
                    "cuFFT error encountered at: ", "call='%s', Reason=%s", \
                    #call, raft::getCufftErrStr(status));                   \
      throw raft::cufft_error(msg);                                         \
    }                                                                       \
  } while (0)

class CuFFTHandle {
 public:
  CuFFTHandle() { CUFFT_TRY(cufftCreate(&handle)); }
  ~CuFFTHandle() { cufftDestroy(handle); }
  operator cufftHandle() const { return handle; }

 private:
  cufftHandle handle;
};

}  // namespace raft

[FEA] Scatter and Gather calls in cuml comms

There are a few places in cuML where we are using UCX endpoints directly to perform scatter and gather calls. We originally kept the API minimal, since users could compose their own scatter and gather calls out of the isend and irecv calls.

However, this can be error prone and we're already finding small subtle places where we diverged from the MPI spec and this caused unexpected behaviors. While I don't think we should claim to adhere strictly to the MPI spec, I do think that including the scatter and gather calls in the comms would alleviate some of this, since they are common communications primitives.

[FEA] Clean up class labels / merge labels prims

The class labels primitives are being moved over to RAFT quickly so that they can be used to support SLHC in 0.19. Just to avoid any surprises, we're keeping the function signatures compatible with the ones in cuml for now. These prims could use a little tender loving care, though, as they haven't changed much since they were created.

[FEA] MST kernels

Minimum Spanning Tree (MST)

Input

  • Format: CSR - single GPU
  • Weighted: Yes
  • Directed: No
  • Size
    • V=1M-2B
    • E=1M-2B
    • EF > 16
  • Need to support multiple components

Output

Set of edges present in the MST. A boolean array of size E where mst[i] is true if edge [i] is in the MST, false otherwise.

SW

Both cuML and cuGraph need this, RAFT would be a natural place to host the device code.

  • cuML can leverage this as part of hierarchical clustering. KNN forms a reduced neighborhood graph that is used to construct the MST (as an optimization to shrink the space of total potential edges). In other words, the Sparse KNN would provide the graph that is then used to construct the MST instead of the KNN using the MST directly.
  • cuGraph can add python bindings and expose it as a graph algo in the API.

Litterature

Other considerations

If we reduce this to knn graphs then we can further optimize but we lose a lot on the generic aspect. In this case we would have CSR of fixed degree and we know the edge with the shortest distance is going to be the first edge for each row of the CSR for instance.


Thanks to @cjnolet for helping to scope this new feature

[QST] Why is raft not something that is installed like a utils library we can all use?

Given that all of the projects in Rapids are trying to be on the same branch, following the same standards why is this not just a package that gets installed with rapids so we can use it without necessarily having to build it and can include it by assuming its just in "conda_prefix/include" for the c++ side and lib for the python stuff?

It seems strange that there will be a cuml.raft and a blazingsql.raft because thats how it adds raft to the packages right now.

[TASK] Refactor & clean up templates in many sparse prims

There are many places in the sparse prims where templates could be cleaned up (and used). We had originally adopted using int for the row & column arrays in many of the prims because that's what cusparse & scipy were limited to using but we've since started using template types for those in many cuml algorithms (and more recent sparse prims) so it's time those get cleaned up.

[FEA] Add gather & gatherv to raft::comms_t

Is your feature request related to a problem? Please describe.
RAFT comms currently has allgather & allgatherv but lacks gather & gatherv

Describe the solution you'd like
Add gather & gatherv to raft::comms_t

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.