Comments (13)
Hi @mahmoodn, just to clarify, you would like the ability to select the grid and block sizes for the various kernels? Or do you mean you would like to know which grid/block were selected for which kernel?
from gunrock.
Basically, I was trying to change the grid size via command line to see the effect, but I found some hard coded parts as you mentioned before.
I assume the default grid size is automatically calculated by the device specification. So that could be a reference number for performance.
from gunrock.
I assume the default grid size is automatically calculated by the device specification. So that could be a reference number for performance.
And you understand the correctness depends on the block and grid sizes? If so, you can pick an application:
- https://github.com/gunrock/gunrock/blob/main/include/gunrock/algorithms/bfs.hxx#L125
- Then look at what operators are being used, in BFS case it is advance.
- Then you look at what load-balance algorithm is being used, in this case it is:
operators::load_balance_t::block_mapped
- Then you go to block-mapped and can update grid block sizes: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L190
- Grid size: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L194
from gunrock.
If I understand correctly, you are saying that modifying the grid size may affect the correctness of the BFS. is that right? Do you mean that for example, changing 512 to 1024 may cause the program not to visit all vertices? How that is possible? In this case, how the grid size is really determined?
According to this section
std::size_t num_elements = (input_type == advance_io_type_t::graph)
? G.get_number_of_vertices()
: input.get_number_of_elements();
// Set-up and launch block-mapped advance.
using namespace gcuda::launch_box;
using launch_t =
launch_box_t<launch_params_dynamic_grid_t<fallback, dim3_t<256>>>;
launch_t launch_box;
launch_box.calculate_grid_dimensions_strided(num_elements);
the num_elements
is either G.get_number_of_vertices()
or input.get_number_of_elements()
. In case of a graph with millions of vertices, e.g. soc-twitter2010, that should be CTA size of millions. Is that right? But it doesn't...
from gunrock.
Oh, my bad, changing the block size from 256 in the example above to a larger number should not impact the correctness. However, the grid size is being calculated based on the block size and number of elements to process. So, you don't really change that.
the num_elements is either G.get_number_of_vertices() or input.get_number_of_elements(). In case of a graph with millions of vertices, e.g. soc-twitter2010, that should be CTA size of millions. Is that right? But it doesn't...
You have it opposite. Grid size is very large, while block size is small.
Just to clarify, grid size is number of blocks being launched and block size is number of threads per block.
from gunrock.
OK let me rephrase my question again and see if we on the same track.
What I am going to do is to change the grid size (the number of blocks that are put on the GPU SMs). I want to do that because I am working with GPU simulator. So, I can change the number of SMs. Assume an architecture has 64 SMs, and the number of blocks (grid size) is 128. So, each SM logically receives 2 blocks. Now, if I change the number of SMs to 128, then each SM receives 1 block. No matter how many threads are in the block (the default is 256 as you mentioned), I want to change the number of blocks with the grid size parameter.
I was hoping that the command line option as I mentioned in the previous topic is the solution, but you said that parameter is kind of obsolete. No problem then... If I have to change some hardcoded parameters, I was wondering what are they for the BFS application?
From the previous posts, I see
std::size_t num_elements = (input_type == advance_io_type_t::graph)
? G.get_number_of_vertices()
: input.get_number_of_elements();
// Set-up and launch block-mapped advance.
using namespace gcuda::launch_box;
using launch_t =
launch_box_t<launch_params_dynamic_grid_t<fallback, dim3_t<256>>>;
launch_t launch_box;
launch_box.calculate_grid_dimensions_strided(num_elements);
I assume that the num_elements
is the thing that I want to change. Do you agree with that? No matter how this parameter gets the value, is it safe to use
launch_box.calculate_grid_dimensions_strided(64);
or
launch_box.calculate_grid_dimensions_strided(512);
for any given graph size?
from gunrock.
I see! I now understand your problem well. If you don't care about the size of the graph and you can make up a synthetic graph, you can change the number of elements. Num elements correspond to the number of vertices in a frontier for BFS.
This is how grid dimension is calculated:
void calculate_grid_dimensions_strided(std::size_t num_elements) {
grid_dimensions = dimensions_t(
(num_elements + block_dimensions.x - 1) / block_dimensions.x, 1, 1);
}
Specifically, it is doing a ceil division of num_elements
and block_dimensions.x
, where;
num_elements
is number of vertices to process inadvance
operator (vertices in frontier).block_dimensions.x
is the size of the block; number of threads within a block: e.g. 256.
So, if you want to get to a specific grid size number, you can either control it by changing number of elements (which is changing every iteration for BFS as vertices get visited), or the block size (which is static).
from gunrock.
I would also note that the kernel can be written slightly differently as well to accommodate ANY grid size, even 1 without impacting the correctness, see this example: https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/
from gunrock.
OK if I understand it correctly, the grid size is a function of number of vertices in the frontier and that is num_elements
. Then that means, it is not generally possible to set a constant large size for any graph. Also, there is no guarantee that a big graph uses large grid size because it depends on the a fraction of vertices in the frontier.
Regarding the synthetic graph, do you have any idea about the graph shape for using large grid sizes? For example, does it make sense that for a large fully connected graph with 10M vertices, to have grid sizes of 16K or 32K?
from gunrock.
One more question. Is it possible to print the number of elements (grid size) at every BFS iteration? Is this the write place?
std::size_t num_elements = (input_type == advance_io_type_t::graph)
? G.get_number_of_vertices()
: input.get_number_of_elements();
// print num_elements
from gunrock.
Regarding the synthetic graph, do you have any idea about the graph shape for using large grid sizes? For example, does it make sense that for a large fully connected graph with 10M vertices, to have grid sizes of 16K or 32K?
Yeah, if a graph queues in all 10M vertices of the frontier (which can and does happen for large graphs), the grid size would be ceil(10M/256) = 39063
.
from gunrock.
Just before this line: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L209 you can print both num_elements and grid size:
std::cout << "Num Elements: " << num_elements << "Grid dimensions: " << launch_box.params_t::grid_dimensions.x << std::endl;
I believe that should work. If not, just do a ceil div and calculate the grid size yourself: num_elements + 256- 1) / 256
from gunrock.
OK this new version has changed a lot since the previous version. I will close this topic for the moment.
from gunrock.
Related Issues (20)
- Gunrock does not build on Cuda 11.7 HOT 11
- Counting number of triangles HOT 1
- Gunrock/SM : Cuda out of memory / Read Access Violation HOT 10
- Try to update graph then run sssp on new graph HOT 7
- Followup on issues relating to benchmarking GunrockSM HOT 1
- Possible memory leakage issue in using gunrock::sssp::run HOT 5
- Looking for example on how to conduct graphcut on using push relabel? HOT 1
- Query on Gunrock Build Settings: RDC HOT 2
- Essentials compiling for arch 7.5 even when setting -DCMAKE_CUDA_ARCHITECTURES=70
- SpGEMM - `c_nz_idx` in `gustavsons` kernel goes out of bounds
- Request for CPU and GPU code review and optimization for DAWN algorithm implementation HOT 9
- Guidance on installing as library HOT 5
- Question about grid size HOT 7
- How gunrock support edge advance? HOT 2
- Internal Compiler Error (codegen) when trying to build gunrock HOT 4
- Cmake CUDA option HOT 1
- Building failed when fetching RAPIDS HOT 2
- Compile issue HOT 1
- Cannot compile benchmarks HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gunrock.