gunrock / graphblast Goto Github PK

High-Performance Linear Algebra-based Graph Primitives on GPUs

License: Apache License 2.0

C++ 74.21% CMake 1.13% Makefile 1.01% Cuda 21.49% Shell 1.07% Python 1.10%

graphblast's Introduction

Gunrock: CUDA/C++ GPU Graph Analytics

Examples	Project Template	Documentation	GitHub Actions

Gunrock¹ is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous/asynchronous, data-centric abstraction focused on operations on vertex or edge frontiers. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies, particularly in the area of fine-grained load balancing, with a high-level programming model that allows programmers to quickly develop new graph primitives that scale from one to many GPUs on a node with small code size and minimal GPU programming knowledge.

Branch	Purpose	Version	Status
`main`	Default branch, ported from `gunrock/essentials`, serves as the official release branch.	$\geq$ `2.x.x`	Active
`develop`	Development feature branch, ported from `gunrock/essentials`.	$\geq$ `2.x.x`	Active
`master`	Previous release branch for `gunrock/gunrock` version `1.x.x` interface, preserves all commit history.	$\leq$ `1.x.x`	Deprecated
`dev`	Previous development branch for `gunrock/gunrock`. All changes now merged in `master`.	$\leq$ `1.x.x`	Deprecated

Quick Start Guide

Before building Gunrock make sure you have CUDA Toolkit² installed on your system. Other external dependencies such as NVIDIA/thrust, NVIDIA/cub, etc. are automatically fetched using cmake.

git clone https://github.com/gunrock/gunrock.git
cd gunrock
mkdir build && cd build
cmake .. 
make sssp # or for all algorithms, use: make -j$(nproc)
bin/sssp ../datasets/chesapeake/chesapeake.mtx

Implementing Graph Algorithms

For a detailed explanation, please see the full documentation. The following example shows simple APIs using Gunrock's data-centric, bulk-synchronous programming model, we implement Breadth-First Search on GPUs. This example skips the setup phase of creating a problem_t and enactor_t struct and jumps straight into the actual algorithm.

We first prepare our frontier with the initial source vertex to begin push-based BFS traversal. A simple f->push_back(source) places the initial vertex we will use for our first iteration.

void prepare_frontier(frontier_t* f,
                      gcuda::multi_context_t& context) override {
  auto P = this->get_problem();
  f->push_back(P->param.single_source);
}

We then begin our iterative loop, which iterates until a convergence condition has been met. If no condition has been specified, the loop converges when the frontier is empty.

void loop(gcuda::multi_context_t& context) override {
  auto E = this->get_enactor();   // Pointer to enactor interface.
  auto P = this->get_problem();   // Pointer to problem (data) interface.
  auto G = P->get_graph();        // Graph that we are processing.

  auto single_source = P->param.single_source;  // Initial source node.
  auto distances = P->result.distances;         // Distances array for BFS.
  auto visited = P->visited.data().get();       // Visited map.
  auto iteration = this->iteration;             // Iteration we are on.

  // Following lambda expression is applied on every source,
  // neighbor, edge, weight tuple during the traversal.
  // Our intent here is to find and update the minimum distance when found.
  // And return which neighbor goes in the output frontier after traversal.
  auto search = [=] __host__ __device__(
                      vertex_t const& source,    // ... source
                      vertex_t const& neighbor,  // neighbor
                      edge_t const& edge,        // edge
                      weight_t const& weight     // weight (tuple).
                      ) -> bool {
    auto old_distance =
      math::atomic::min(&distances[neighbor], iteration + 1);
    return (iteration + 1 < old_distance);
  };

  // Execute advance operator on the search lambda expression.
  // Uses load_balance_t::block_mapped algorithm (try others for perf. tuning.)
  operators::advance::execute<operators::load_balance_t::block_mapped>(
    G, E, search, context);
}

include/gunrock/algorithms/bfs.hxx

How to Cite Gunrock & Essentials

Thank you for citing our work.

@article{Wang:2017:GGG,
  author =	 {Yangzihao Wang and Yuechao Pan and Andrew Davidson
                  and Yuduo Wu and Carl Yang and Leyuan Wang and
                  Muhammad Osama and Chenshan Yuan and Weitang Liu and
                  Andy T. Riffel and John D. Owens},
  title =	 {{G}unrock: {GPU} Graph Analytics},
  journal =	 {ACM Transactions on Parallel Computing},
  year =	 2017,
  volume =	 4,
  number =	 1,
  month =	 aug,
  pages =	 {3:1--3:49},
  doi =		 {10.1145/3108140},
  ee =		 {http://arxiv.org/abs/1701.01170},
  acmauthorize = {https://dl.acm.org/doi/10.1145/3108140?cid=81100458295},
  url =		 {http://escholarship.org/uc/item/9gj6r1dj},
  code =	 {https://github.com/gunrock/gunrock},
  ucdcite =	 {a115},
}

@InProceedings{Osama:2022:EOP,
  author =	 {Muhammad Osama and Serban D. Porumbescu and John D. Owens},
  title =	 {Essentials of Parallel Graph Analytics},
  booktitle =	 {Proceedings of the Workshop on Graphs,
                  Architectures, Programming, and Learning},
  year =	 2022,
  series =	 {GrAPL 2022},
  month =	 may,
  pages =	 {314--317},
  doi =		 {10.1109/IPDPSW55747.2022.00061},
  url =          {https://escholarship.org/uc/item/2p19z28q},
}

Copyright & License

Gunrock is copyright The Regents of the University of California. The library, examples, and all source code are released under Apache 2.0.

This repository has been moved from https://github.com/gunrock/essentials and the previous history is preserved with tags and under master branch. Read more about gunrock and essentials in our vision paper: Essentials of Parallel Graph Analytics. ↩
Recommended CUDA v11.5.1 or higher due to support for stream ordered memory allocators. ↩

graphblast's People

Contributors

Stargazers

Watchers

graphblast's Issues

Error while running gmis algorithm

I've tested mis algorithm on graph test_mis.mtx and got this error:
Cuda error in file './graphblas/backend/cuda/reduce.hpp' in line 40 : invalid configuration argument.

System configuration:

GPU: NVidia GeForce GT 1030
NVIDIA-SMI 440.33.01
OS: Ubuntu 16.04 (I run in docker container)
CUDA Version: 9.2
g++ version 4.9

Installation fault

I was trying to install GraphBlast on Ubuntu 16.04, checked software versions (CUDA 9.1, Boost 1.58, g++ 4.9.3) and followed the instruction step by step, but got the errors.

Also, the flags recommended in this issue didn't help.

Incorrect MinPlus semiring output

I've built the library using CMake with the corresponding script and tried to run test/gspgemm.cu example. I have modified it a bit for it to read the matrix of mine. I had one run with PlusMultipliesSemiring and got the correct output. Then I ran the same example with MinimumPlusSemiring and got the same result as the PlusMultipliesSemiring one (which is incorrect in the case of minplus semiring).

The minimal reproducible example (Note: another issue is that uncommenting the lines at the bottom leads to segfault):

#define GRB_USE_CUDA
#define private public
#include <iostream>
#include <algorithm>
#include <string>

#include <cstdio>
#include <cstdlib>

#include <boost/program_options.hpp>

#include "graphblas/graphblas.hpp"
#include "test/test.hpp"

int main( int argc, char** argv )
{
  bool DEBUG = true;

  std::vector<graphblas::Index> a_row_indices, b_row_indices;
  std::vector<graphblas::Index> a_col_indices, b_col_indices;
  std::vector<float> a_values, b_values;
  graphblas::Index a_num_rows, a_num_cols, a_num_edges;
  graphblas::Index b_num_rows, b_num_cols, b_num_edges;
  char* dat_name;

  // Load A
  std::cout << "loading A" << std::endl;
  readMtx("path/to/example/matrix", &a_row_indices, &a_col_indices,
      &a_values, &a_num_rows, &a_num_cols, &a_num_edges, 1, false);
  graphblas::Matrix<float> a(a_num_rows, a_num_cols);
  a.build(&a_row_indices, &a_col_indices, &a_values, a_num_edges, GrB_NULL,
     GrB_NULL);
  if(DEBUG) a.print();

  // Load B, i.e. the matrix is squared
  std::cout << "loading B" << std::endl;
  readMtx("path/to/example/matrix", &b_row_indices, &b_col_indices,
      &b_values, &b_num_rows, &b_num_cols, &b_num_edges, 1, false);
  graphblas::Matrix<float> b(b_num_rows, b_num_cols);
  b.build(&b_row_indices, &b_col_indices, &b_values, b_num_edges, GrB_NULL,
     GrB_NULL );
  if(DEBUG) b.print();

  //
  graphblas::Matrix<float> c(a_num_rows, b_num_cols);
  graphblas::Descriptor desc;
  desc.descriptor_.debug_ = true;
  graphblas::mxm<float,float,float,float>(
      &c,
      GrB_NULL,
      GrB_NULL,
      graphblas::MinimumPlusSemiring<float>(),
      &a,
      &b,
      &desc
  );
  if(DEBUG) c.print();

//   // Multiply using GPU array initialization.
//   graphblas::Matrix<float> A(a_num_rows, a_num_cols);
//   graphblas::Matrix<float> B(b_num_rows, b_num_cols);
//   graphblas::Matrix<float> C(a_num_rows, b_num_cols);

//   A.build(a.matrix_.sparse_.d_csrRowPtr_, a.matrix_.sparse_.d_csrColInd_, a.matrix_.sparse_.d_csrVal_, a.matrix_.sparse_.nvals_);
//   B.build(b.matrix_.sparse_.d_csrRowPtr_, b.matrix_.sparse_.d_csrColInd_, b.matrix_.sparse_.d_csrVal_, b.matrix_.sparse_.nvals_);

//   desc.descriptor_.debug_ = true;

//   graphblas::mxm<T, T, T, T>(&C, GrB_NULL, GrB_NULL, graphblas::PlusMultipliesSemiring<float>(),
//                              &A, &B, &desc);

  // Multiply using CPU array initialization.
  // TODO(ctcyang): Add EXPECT_FAIL, because require pointers to be GPU.
  /*graphblas::Matrix<float> a_(a_num_rows, a_num_cols);
  graphblas::Matrix<float> b_(b_num_rows, b_num_cols);
  graphblas::Matrix<float> c_(a_num_rows, b_num_cols);

  a_.build(a.matrix_.sparse_.h_csrRowPtr_, a.matrix_.sparse_.h_csrColInd_, a.matrix_.sparse_.h_csrVal_, a.matrix_.sparse_.nvals_);
  b_.build(b.matrix_.sparse_.h_csrRowPtr_, b.matrix_.sparse_.h_csrColInd_, b.matrix_.sparse_.h_csrVal_, b.matrix_.sparse_.nvals_);

  desc.descriptor_.debug_ = true;

  graphblas::mxm<T, T, T, T>(&c_, GrB_NULL, GrB_NULL, graphblas::PlusMultipliesSemiring<float>(),
                             &a_, &b_, &desc);*/
}

example matrix:

%%MatrixMarket matrix coordinate real general
3 3 6
1   1   1
1   2   2
1   3   3
2   1   2
2   3   1
3   3   2

gspgemm with PlusMultipliesSemiring yields:

%%MatrixMarket matrix coordinate real general
3 3 7
1   1   5
1   2   2
1   3   11
2   1   2
2   2   4
2   3   8
3   3   4

which is correct, while MinimumPlusSemiring yields the same result as above when the correct one is:

%%MatrixMarket matrix coordinate real general
3 3 7
1   1   2
1   2   3
1   3   3
2   1   3
2   2   4
2   3   3
3   3   4

Move some command-line argument defaults to be environment variables instead

Currently, desc.loadArgs is required to load many default values. This is undesirable from an user interface perspective. For example, "Error: Simple kernel unmasked not implemented yet! will be thrown if desc.loadArgs is not called by test/gmxv.cu.

Latest commits are breaking Makefile builds

Hi. Just noticed that there were some commits made yesterday rendering the software not buildable (via Makefile method)

==> 29525: graphblast: Executing phase: 'edit'                                                                                                                                                                                                                                                                     [82/11013]
==> 29525: graphblast: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j12'

18 errors found in build log:
     17    nvcc -g -gencode arch=compute_35,code=compute_35 -gencode arch=compute_60,code=compute_60 -O3 -use_fast_math -w -std=c++11 -o bin/ggc example/ggc.cu -Iext/moderngpu/include/ -Iex
           t/cub/cub/ -I/data/ctcyang/boost_1_58_0/ -I./ ext/moderngpu/src/mgpucontext.cu ext/moderngpu/src/mgpuutil.cpp -L/data/ctcyang/boost_1_58_0/stage/lib/ -lboost_program_options -lcu
           blas -lcusparse -lcurand
     18    mkdir -p bin
     19    mkdir -p bin
     20    nvcc -g -gencode arch=compute_35,code=compute_35 -gencode arch=compute_60,code=compute_60 -O3 -use_fast_math -w -std=c++11 -o bin/ggc_cusparse example/ggc_cusparse.cu -Iext/moder
           ngpu/include/ -Iext/cub/cub/ -I/data/ctcyang/boost_1_58_0/ -I./ ext/moderngpu/src/mgpucontext.cu ext/moderngpu/src/mgpuutil.cpp -L/data/ctcyang/boost_1_58_0/stage/lib/ -lboost_pr
           ogram_options -lcublas -lcusparse -lcurand
     21    nvcc -g -gencode arch=compute_35,code=compute_35 -gencode arch=compute_60,code=compute_60 -O3 -use_fast_math -w -std=c++11 -o bin/gpr example/gpr.cu -Iext/moderngpu/include/ -Iex
           t/cub/cub/ -I/data/ctcyang/boost_1_58_0/ -I./ ext/moderngpu/src/mgpucontext.cu ext/moderngpu/src/mgpuutil.cpp -L/data/ctcyang/boost_1_58_0/stage/lib/ -lboost_program_options -lcu
           blas -lcusparse -lcurand
     22    nvcc -g -gencode arch=compute_35,code=compute_35 -gencode arch=compute_60,code=compute_60 -O3 -use_fast_math -w -std=c++11 -o bin/gtc example/gtc.cu -Iext/moderngpu/include/ -Iex
           t/cub/cub/ -I/data/ctcyang/boost_1_58_0/ -I./ ext/moderngpu/src/mgpucontext.cu ext/moderngpu/src/mgpuutil.cpp -L/data/ctcyang/boost_1_58_0/stage/lib/ -lboost_program_options -lcu
           blas -lcusparse -lcurand
  >> 23    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     24    
  >> 25    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     26    
  >> 27    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     28    
  >> 29    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     30    
  >> 31    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     32    
  >> 33    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     34    
  >> 35    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     36    
  >> 37    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     38    
  >> 39    ./graphblas/backend/cuda/reduce.hpp(160): error: function template "graphblas::backend::reduceInner(T *, BinaryOpT, MonoidT, const graphblas::backend::SparseMatrix<a> *, graphbla
           s::backend::Descriptor *)" has already been defined
     40    
     41    1 error detected in the compilation of "/cache//tmpxft_000074a3_00000000-14_gpr.compute_60.cpp1.ii".
  >> 42    make: *** [gpr] Error 1
     43    make: *** Waiting for unfinished jobs....
     44    1 error detected in the compilation of "/cache//tmpxft_000074a6_00000000-14_gtc.compute_60.cpp1.ii".
  >> 45    make: *** [gtc] Error 1
     46    1 error detected in the compilation of "/cache//tmpxft_0000749f_00000000-14_ggc.compute_60.cpp1.ii".
  >> 47    make: *** [ggc] Error 1
     48    1 error detected in the compilation of "/cache//tmpxft_00007493_00000000-14_gbfs.compute_60.cpp1.ii".
  >> 49    make: *** [gbfs] Error 1
     50    1 error detected in the compilation of "/cache//tmpxft_00007495_00000000-14_gsssp.compute_60.cpp1.ii".
  >> 51    make: *** [gsssp] Error 1
     52    1 error detected in the compilation of "/cache//tmpxft_00007497_00000000-14_glgc.compute_60.cpp1.ii".
     53    1 error detected in the compilation of "/cache//tmpxft_00007494_00000000-14_gdiameter.compute_60.cpp1.ii".
  >> 54    make: *** [glgc] Error 1
  >> 55    make: *** [gdiameter] Error 1
     56    1 error detected in the compilation of "/cache//tmpxft_0000749e_00000000-14_gmis.compute_60.cpp1.ii".
     57    1 error detected in the compilation of "/cache//tmpxft_000074a2_00000000-14_ggc_cusparse.compute_60.cpp1.ii".
  >> 58    make: *** [gmis] Error 1
  >> 59    make: *** [ggc_cusparse] Error 1

I am attempting to build it using Spack HPC package manager, to which the graphblast package will be added tomorrow or so (details can be found here spack/spack#17289 )

Which version of boost, gcc, cuda do you guys use to build the latest version of the package? Thanks.

Segmentation fault (core dumped) & Could not process Matrix Market banner.

Is it okay that all examples fail on graphs small/test_spgemm.mtx and small/test_sgm.mtx?

Feature: Runtime Kernel Fusion

If you have a DAG of binary operations, you can traverse it in some topological order and generate proper bitcodes for your GPU kernel. OmniSci does it on parsed SQL queries, and more specifically different filter combinations, etc. TensorFlow does it based on the equation that needs to be minimized for gradient descent. Can we do it for GraphBLAS?

If you want to keep your kernel code as simple as possible, with minimal branches, etc., then there's no way doing it at compile-time. Unless you know what you're going to solve at compile time for SQL queries and tensor flow optimizations, you don't know about the exact details of the queries/equations at compile-time.

Both TensorFlow and OmniSci do real-time code generation using LLVM.

Algorithm names are too short

I found it hard to decipher algorithm names such as lgc vs gc. One would think the former is a special case of the latter but in fact they are completely different, because "c" refers to "clustering" in lgc but to "coloring" in the second. Perhaps longer names with more comments would help?

Code Cleanup

Some TODOs:

make include order follow Google C++ style guide
make code follow clang-tidy
get rid of #define private public, either by making all classes structs or adding accessors
move unit tests to within each folder and of naming spgemm_test.cu
add Makefile support for unit tests
get rid of env vars, by making global static context object
make CSR only vs. CSRCSC matrix granularity rather than global granularity
change std::cout's to use GLOG or some other type of logging library
get rid of Boost dependecies on program_options and test with commandline flag library such as gflags and googletest

GraphBLAS Compilation with the CUB error

Hi,

Thanks for the authors' contributions to GraphBLAS.
I am a beginner of GraphBLAS. I try to compile the GraphBLAS on a server with Ubuntu 20.04 and NVIDIA RTX 3080Ti GPU with CUDA Version=11.5. When I execute the make -j16, there is some error information listed below.

(graphblast) bizhao.shi@dasys21-lc:~/research/compiler/graphblast/build$ make                                                                                       
Scanning dependencies of target graphblas                                                                                                                           
[  1%] Linking CXX static library libgraphblas.a                                                                                                                    
[  1%] Built target graphblas                                                                                                                                       
[  3%] Building NVCC (Device) object CMakeFiles/gspgemm.dir/ext/moderngpu/src/gspgemm_generated_mgpucontext.cu.o                                                    
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-ta
rgets to suppress warning).                                                                                                                                         
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-ta
rgets to suppress warning).                                                                                                                                         
ptxas info    : 0 bytes gmem                                                                                                                                        
ptxas info    : Compiling entry function '_ZN4mgpu17KernelVersionShimEv' for 'sm_35'                                                                                
ptxas info    : Function properties for _ZN4mgpu17KernelVersionShimEv                                                                                               
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads                                                                                                  
ptxas info    : Used 2 registers, 320 bytes cmem[0]                                                                                                                 
/rshome/bizhao.shi/research/compiler/graphblast/ext/moderngpu/src/mgpucontext.cu:126:6: warning: ‘template<class> class std::auto_ptr’ is deprecated [-Wdeprecated-d
eclarations]                                                                                                                                                        
  126 | std::auto_ptr<DeviceGroup> deviceGroup;                                                                                                                     
      |      ^~~~~~~~                                                                                                                                               
/usr/include/c++/9/bits/unique_ptr.h:53:25: note: declared here                                                                                                     
   53 |   template<typename> class auto_ptr;                                                                                                                        
      |                         ^~~~~~~~                                                                                                                            
/rshome/bizhao.shi/research/compiler/graphblast/ext/moderngpu/src/mgpucontext.cu:216:6: warning: ‘template<class> class std::auto_ptr’ is deprecated [-Wdeprecated-d
eclarations]                                                                                                                                                        
  216 | std::auto_ptr<ContextGroup> contextGroup;                                                                                                                   
      |      ^~~~~~~~                                                                                                                                               
/usr/include/c++/9/bits/unique_ptr.h:53:25: note: declared here                                                                                                     
   53 |   template<typename> class auto_ptr;                                                                                                                        
      |                         ^~~~~~~~
[  4%] Building NVCC (Device) object CMakeFiles/gspgemm.dir/test/gspgemm_generated_gspgemm.cu.o
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/local/cuda/include/thrust/detail/config/config.h:27,
                 from /usr/local/cuda/include/thrust/detail/config.h:23,
                 from /usr/local/cuda/include/thrust/iterator/iterator_facade.h:35,
                 from /rshome/bizhao.shi/research/compiler/graphblast/ext/cub/cub/device/../iterator/arg_index_input_iterator.cuh:48,
                 from /rshome/bizhao.shi/research/compiler/graphblast/ext/cub/cub/device/device_reduce.cuh:41,
                 from /rshome/bizhao.shi/research/compiler/graphblast/ext/cub/cub/cub.cuh:53,
                 from /rshome/bizhao.shi/research/compiler/graphblast/./graphblas/backend/cuda/spmspv_inner.hpp:8,
                 from /rshome/bizhao.shi/research/compiler/graphblast/./graphblas/backend/cuda/cuda.hpp:13,
                 from /rshome/bizhao.shi/research/compiler/graphblast/./graphblas/graphblas.hpp:16,
                 from /rshome/bizhao.shi/research/compiler/graphblast/test/gspgemm.cu:12:
/usr/local/cuda/include/thrust/detail/config/cpp_dialect.h:131:13: warning: Thrust requires at least C++14. C++11 is deprecated but still supported. C++11 support will be removed in a future release. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
  131 |      THRUST_COMPILER_DEPRECATION_SOFT(C++14, C++11);
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                                                           
In file included from /usr/local/cuda/include/thrust/system/cuda/config.h:33,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:35,
                 from /usr/local/cuda/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /usr/local/cuda/include/thrust/iterator/detail/iterator_facade_category.h:22,
                 from /usr/local/cuda/include/thrust/iterator/iterator_facade.h:37,
                 from /rshome/bizhao.shi/research/compiler/graphblast/ext/cub/cub/device/../iterator/arg_index_input_iterator.cuh:48,
                 from /rshome/bizhao.shi/research/compiler/graphblast/ext/cub/cub/device/device_reduce.cuh:41,
                 from /rshome/bizhao.shi/research/compiler/graphblast/ext/cub/cub/cub.cuh:53,
                 from /rshome/bizhao.shi/research/compiler/graphblast/./graphblas/backend/cuda/spmspv_inner.hpp:8,
                 from /rshome/bizhao.shi/research/compiler/graphblast/./graphblas/backend/cuda/cuda.hpp:13,
                 from /rshome/bizhao.shi/research/compiler/graphblast/./graphblas/graphblas.hpp:16,
                 from /rshome/bizhao.shi/research/compiler/graphblast/test/gspgemm.cu:12:
/usr/local/cuda/include/cub/util_namespace.cuh:46:2: error: #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
   46 | #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
      |  ^~~~~
CMake Error at gspgemm_generated_gspgemm.cu.o.cmake:220 (message):
  Error generating
  /rshome/bizhao.shi/research/compiler/graphblast/build/CMakeFiles/gspgemm.dir/test/./gspgemm_generated_gspgemm.cu.o


make[2]: *** [CMakeFiles/gspgemm.dir/build.make:65: CMakeFiles/gspgemm.dir/test/gspgemm_generated_gspgemm.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:101: CMakeFiles/gspgemm.dir/all] Error 2
make: *** [Makefile:84: all] Error 2

I would like to ask for the solution of CUB error, thx!

Error: eWiseAdd sparse-sparse not implemented yet!

I test all algorithms from /example on graphs from /data/small and some of them (ggc, gmis, gpr) give the error message: Error: eWiseAdd sparse-sparse not implemented yet!. Could you tell me if you're planning to implement the operation, please? If yes, then when?

Also, I've got this error while running gmis:
Error: Feature not implemented yet!

To reproduce:
make -j8
bin/gpr --niter 0 --mxvmode 0 --directed 2 data/small/test_pr.mtx

System configuration:

GPU: NVidia GeForce GT 1030
NVIDIA-SMI 440.33.01
OS: Ubuntu 16.04 (I run in docker container)
CUDA Version: 9.2
g++ version 4.9

Cannot reproduce PR number on socLiveJournal data

Hey,

I am trying to reproduce your Page Rank number on the socLiveJournal-1. I installed CUDA-9.1 and gcc-5.4.0. I downloaded the data from here. I remove the first comment lines and added the required ones to convert the data to an .mtx file:

%%MatrixMarket matrix coordinate pattern general
4847571 4847571 68993773

I compiled the Page Rank example (using the Makefile with make gpr). Here is my output after running it:

 -> % ./run_pr.sh
bin/gpr --timing 1 --mxvmode 0 --niter 100 --max_niter 1000 data/topc-datasets/soc-LiveJournal1.mtx
Undirected due to mtx: 0
Undirected due to cmd: 0
Undirected: 0
Remove self-loop: 1
Reading data/topc-datasets/.soc-LiveJournal1.mtx.d.nosl.bin
Allocate 4847572
Allocate 82170360
Allocate 82170360
Do not allocate 4847571 0x7f92c5367010
Do not allocate 68475300 0x7f92b19f2010
Do not allocate 68475300 0x7f929e07d010
Do not allocate 4847571 0x7f92c5367010
Do not allocate 68475300 0x7f92b19f2010
Do not allocate 68475300 0x7f929e07d010
output:
[0]:4.39931e-06 [1]:2.28478e-06 [2]:1.69434e-06 [3]:1.92988e-06 [4]:1.22651e-06 [5]:2.72542e-06 [6]:1.1593e-06 [7]:6.15954e-07 [8]:1.37225e-06 [9]:7.86397e-07 [10]:1.81577e-06 [11]:1.33162e-06 [12]:9.23166e-06 [13]:1.69406e-06 [14]:3.96623e-07 [15]:1.44772e-06 [16]:1.2477e-06 [17]:3.52301e-07 [18]:9.64398e-06 [19]:1.0859e-06 [20]:1.2196e-06 [21]:3.94725e-07 [22]:5.35532e-07 [23]:5.91095e-07 [24]:2.39766e-07 [25]:1.27003e-06 [26]:4.75216e-07 [27]:1.35969e-06 [28]:4.24832e-07 [29]:2.08527e-07 [30]:1.8405e-06 [31]:1.94205e-06 [32]:5.78724e-07 [33]:7.93741e-07 [34]:7.44357e-08 [35]:6.85211e-07 [36]:2.066e-07 [37]:8.49346e-07 [38]:1.00569e-06 [39]:6.59399e-07
CPU PR finished in 935.825134 msec. Search depth is: 3. Resultant: 0.000000

CORRECT
cpu, 957.859,
warmup, 294.753, 0
tight, 264.629
vxm, 290.707

CORRECT

If I interpret the numbers correctly one GPU iteration does 290ms whereas in the paper you mention that an iteration does 21ms. Also my GPU is a Tesla P100, which I believe is better than the one that you have used in your experiments. What am I doing wrong? I would appreciate any help!

Thank you for your time in advance.

Requests for Apps

Graph coloring (Requester: Aydin)
Maximal independent set (no requester)
Minimum spanning tree (Requester: Scientist at NVIDIA, for computer vision)
Strongly connected components (no requester)
Weakly connected components (no requester)
Triangle counting (no requester)
Topological sort using direction-optimization and (logical_and, logical_and semiring) (no requester)

Graphblast with sms >= 70

Hello,

Thank you for hosting Graphblast on a public repo to help the research community.

I was wondering whether there is any plan to get GraphBlast working for the latest sms. I am finding the mgpu version leveraged by GraphBlast a little bit challenging to get it to work on latest sms for Graphblast. I tried to put in some patches in the mgpu version currently being used by Grtaphblast, in particular, for the synchronization primitives (mostly shuffles and ballots suggested by @neoblizz in the mgpu repo). and I am encountering hangs for algorithms such as bfs with matrices of medium size.

I would really appreciate any insight. Thanks in advance!

Double cudaFree when using raw GPU pointer variant Matrix::build

Running gspgemm unit test results in:
Cuda error in file '/home/ctcyang/graphblast-ben/./graphblas/backend/cuda/sparse_matrix.hpp' in line 167 : invalid device pointer.

The way to fix this is to either:

add a member variable per Matrix called owner_, which is set to true only when Matrix::build is called using a non-raw GPU pointer variant; or
add a wrapper struct around the GPU memory and use std::shared_ptr to keep track of this structure. Then when the number drops down to 0, this can be cleared. Downsides of this approach is it makes interacting with third-party libraries more difficult.

installation error

Hi There,

I tried to install the code on my ubuntu-16.04 node, with gcc-4.8.5, cuda 9.0.

But I got the following errors. Hope you can let me know how to solve this problem.
mkdir -p bin nvcc -g -gencode arch=compute_60,code=compute_60 -O3 -use_fast_math -w -std=c++11 -o bin/gbfs example/gbfs.cu -Iext/moderngpu/include/ -Iext/cub/cub/ -Iext/boost_1_58_/ -I./ ext/moderngpu/src/mgpucontext.cu ext/moderngpu/src/mgpuutil.cpp -Lext/boost_1_58_0/stage/lib/ -lboost_program_options -lcublas -lcusparse -lcurand /tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::variables_map::operator[](std::string const&) const':
/usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const' /usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const'
/usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const' /usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const'
/usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:/usr/include/boost/program_options/variables_map.hpp:155: more undefined references to boost::program_options::abstract_variables_map::operator[](std::string const&) const' follow
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function parseArgs(int, char**, boost::program_options::variables_map*)': /vpublic01/frog/zhengzhigao/addtest/graphblast/./graphblas/util.hpp:41: undefined reference to boost::program_options::options_description::options_description(std::string const&, unsigned int, unsigned int)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::typed_value<std::string, char>::xparse(boost::any&, std::vector<std::string, std::allocator<std::string> > const&) const': /usr/include/boost/program_options/detail/value_semantic.hpp:167: undefined reference to boost::program_options::validate(boost::any&, std::vector<std::string, std::allocatorstd::string > const&, std::string*, int)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::typed_value<bool, char>::xparse(boost::any&, std::vector<std::string, std::allocator<std::string> > const&) const': /usr/include/boost/program_options/detail/value_semantic.hpp:167: undefined reference to boost::program_options::validate(boost::any&, std::vector<std::string, std::allocatorstd::string > const&, bool*, int)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::string const&, std::string const&, int)': /usr/include/boost/program_options/errors.hpp:372: undefined reference to boost::program_options::validation_error::get_template(boost::program_options::validation_error::kind_t)'
/usr/include/boost/program_options/errors.hpp:372: undefined reference to boost::program_options::error_with_option_name::error_with_option_name(std::string const&, std::string const&, std::string const&, int)' /tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::variables_map::operator[](std::string const&) const':
/usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const' /usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const'
/usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const' /usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const'
/usr/include/boost/program_options/variables_map.hpp:155: undefined reference to boost::program_options::abstract_variables_map::operator[](std::string const&) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:/usr/include/boost/program_options/variables_map.hpp:155: more undefined references to boost::program_options::abstract_variables_map::operator[](std::string const&) const' follow
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::typed_value<float, char>::name() const': /usr/include/boost/program_options/detail/value_semantic.hpp:19: undefined reference to boost::program_options::arg'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::typed_value<bool, char>::name() const': /usr/include/boost/program_options/detail/value_semantic.hpp:19: undefined reference to boost::program_options::arg'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::typed_value<std::string, char>::name() const': /usr/include/boost/program_options/detail/value_semantic.hpp:19: undefined reference to boost::program_options::arg'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::typed_value<int, char>::name() const': /usr/include/boost/program_options/detail/value_semantic.hpp:19: undefined reference to boost::program_options::arg'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function to_internal<std::basic_string<char> >': /usr/include/boost/program_options/detail/convert.hpp:79: undefined reference to boost::program_options::to_internal(std::string const&)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function basic_command_line_parser': /usr/include/boost/program_options/detail/parsers.hpp:39: undefined reference to boost::program_options::detail::cmdline::cmdline(std::vector<std::string, std::allocatorstd::string > const&)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function boost::program_options::basic_command_line_parser<char>::extra_parser(boost::function1<std::pair<std::string, std::string>, std::string const&>)': /usr/include/boost/program_options/detail/parsers.hpp:77: undefined reference to boost::program_options::detail::cmdline::set_additional_parser(boost::function1<std::pair<std::string, std::string>, std::string const&>)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function to_internal<std::basic_string<char> >': /usr/include/boost/program_options/detail/convert.hpp:79: undefined reference to boost::program_options::to_internal(std::string const&)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function void boost::program_options::validate<int, char>(boost::any&, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int*, long)': /usr/include/boost/program_options/detail/value_semantic.hpp:92: undefined reference to boost::program_options::invalid_option_value::invalid_option_value(std::string const&)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o: In function void boost::program_options::validate<float, char>(boost::any&, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, float*, long)': /usr/include/boost/program_options/detail/value_semantic.hpp:92: undefined reference to boost::program_options::invalid_option_value::invalid_option_value(std::string const&)'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost16exception_detail19error_info_injectorINS_15program_options20invalid_option_valueEEE[_ZTVN5boost16exception_detail19error_info_injectorINS_15program_options20invalid_option_valueEEE]+0x30): undefined reference to boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options20invalid_option_valueEEEEE[_ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options20invalid_option_valueEEEEE]+0x38): undefined reference to boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost16exception_detail19error_info_injectorINS_15program_options16validation_errorEEE[_ZTVN5boost16exception_detail19error_info_injectorINS_15program_options16validation_errorEEE]+0x30): undefined reference to boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options16validation_errorEEEEE[_ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options16validation_errorEEEEE]+0x38): undefined reference to boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost15program_options16validation_errorE[_ZTVN5boost15program_options16validation_errorE]+0x30): undefined reference to boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost15program_options20invalid_option_valueE[_ZTVN5boost15program_options20invalid_option_valueE]+0x30): more undefined references to boost::program_options::error_with_option_name::substitute_placeholders(std::string const&) const' follow
/tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost15program_options11typed_valueIicEE[_ZTVN5boost15program_options11typed_valueIicEE]+0x38): undefined reference to boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::string, std::allocator<std::string> > const&, bool) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost15program_options11typed_valueISscEE[_ZTVN5boost15program_options11typed_valueISscEE]+0x38): undefined reference to boost::program_options::value_semantic_codecvt_helper::parse(boost::any&, std::vector<std::string, std::allocatorstd::string > const&, bool) const'
/tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost15program_options11typed_valueIbcEE[_ZTVN5boost15program_options11typed_valueIbcEE]+0x38): undefined reference to boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::string, std::allocator<std::string> > const&, bool) const' /tmp/tmpxft_0000012c_00000000-13_gbfs.o:(.rodata._ZTVN5boost15program_options11typed_valueIfcEE[_ZTVN5boost15program_options11typed_valueIfcEE]+0x38): undefined reference to boost::program_options::value_semantic_codecvt_helper::parse(boost::any&, std::vector<std::string, std::allocatorstd::string > const&, bool) const'
collect2: error: ld returned 1 exit status
Makefile:17: recipe for target 'gbfs' failed
make: *** [gbfs] Error 1`

Runtime error 12

Hi,

I tried compiling this against CUDA 11.0 on Ubuntu 18.04 with the nvidia HPC SDK version 2021_212, and running

../bin/gbfs ../data/small/chesapeake.mtx

prints a plethora of error messages:

Runtime error: reduceInner(val, accum, op, &u->sparse_, desc) returned 12 at /home/mgara/software/graphblast/./graphblas/backend/cuda/operations.hpp:1012

none of which are very informative. At this point I assume this project is highly experimental and unstable?