Hi Regarding the latest version, <a href="https://github.com/gunrock/gunrock/relea

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Question about 2.0.0 changes about gunrock HOT 13 CLOSED

mahmoodn commented on June 11, 2024

Question about 2.0.0 changes

from gunrock.

Comments (13)

neoblizz commented on June 11, 2024

Hi @mahmoodn, just to clarify, you would like the ability to select the grid and block sizes for the various kernels? Or do you mean you would like to know which grid/block were selected for which kernel?

from gunrock.

mahmoodn commented on June 11, 2024

Basically, I was trying to change the grid size via command line to see the effect, but I found some hard coded parts as you mentioned before.
I assume the default grid size is automatically calculated by the device specification. So that could be a reference number for performance.

from gunrock.

neoblizz commented on June 11, 2024

I assume the default grid size is automatically calculated by the device specification. So that could be a reference number for performance.

And you understand the correctness depends on the block and grid sizes? If so, you can pick an application:

https://github.com/gunrock/gunrock/blob/main/include/gunrock/algorithms/bfs.hxx#L125
Then look at what operators are being used, in BFS case it is advance.
Then you look at what load-balance algorithm is being used, in this case it is: operators::load_balance_t::block_mapped
Then you go to block-mapped and can update grid block sizes: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L190
Grid size: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L194

from gunrock.

mahmoodn commented on June 11, 2024

If I understand correctly, you are saying that modifying the grid size may affect the correctness of the BFS. is that right? Do you mean that for example, changing 512 to 1024 may cause the program not to visit all vertices? How that is possible? In this case, how the grid size is really determined?

According to this section

  std::size_t num_elements = (input_type == advance_io_type_t::graph)
                                 ? G.get_number_of_vertices()
                                 : input.get_number_of_elements();

  // Set-up and launch block-mapped advance.
  using namespace gcuda::launch_box;
  using launch_t =
      launch_box_t<launch_params_dynamic_grid_t<fallback, dim3_t<256>>>;

  launch_t launch_box;

  launch_box.calculate_grid_dimensions_strided(num_elements);

the num_elements is either G.get_number_of_vertices() or input.get_number_of_elements(). In case of a graph with millions of vertices, e.g. soc-twitter2010, that should be CTA size of millions. Is that right? But it doesn't...

from gunrock.

neoblizz commented on June 11, 2024

Oh, my bad, changing the block size from 256 in the example above to a larger number should not impact the correctness. However, the grid size is being calculated based on the block size and number of elements to process. So, you don't really change that.

the num_elements is either G.get_number_of_vertices() or input.get_number_of_elements(). In case of a graph with millions of vertices, e.g. soc-twitter2010, that should be CTA size of millions. Is that right? But it doesn't...

You have it opposite. Grid size is very large, while block size is small.

Just to clarify, grid size is number of blocks being launched and block size is number of threads per block.

from gunrock.

mahmoodn commented on June 11, 2024

OK let me rephrase my question again and see if we on the same track.
What I am going to do is to change the grid size (the number of blocks that are put on the GPU SMs). I want to do that because I am working with GPU simulator. So, I can change the number of SMs. Assume an architecture has 64 SMs, and the number of blocks (grid size) is 128. So, each SM logically receives 2 blocks. Now, if I change the number of SMs to 128, then each SM receives 1 block. No matter how many threads are in the block (the default is 256 as you mentioned), I want to change the number of blocks with the grid size parameter.

I was hoping that the command line option as I mentioned in the previous topic is the solution, but you said that parameter is kind of obsolete. No problem then... If I have to change some hardcoded parameters, I was wondering what are they for the BFS application?

From the previous posts, I see

  std::size_t num_elements = (input_type == advance_io_type_t::graph)
                                 ? G.get_number_of_vertices()
                                 : input.get_number_of_elements();

  // Set-up and launch block-mapped advance.
  using namespace gcuda::launch_box;
  using launch_t =
      launch_box_t<launch_params_dynamic_grid_t<fallback, dim3_t<256>>>;

  launch_t launch_box;

  launch_box.calculate_grid_dimensions_strided(num_elements);

I assume that the num_elements is the thing that I want to change. Do you agree with that? No matter how this parameter gets the value, is it safe to use

launch_box.calculate_grid_dimensions_strided(64);
or
launch_box.calculate_grid_dimensions_strided(512);

for any given graph size?

from gunrock.

neoblizz commented on June 11, 2024

I see! I now understand your problem well. If you don't care about the size of the graph and you can make up a synthetic graph, you can change the number of elements. Num elements correspond to the number of vertices in a frontier for BFS.

This is how grid dimension is calculated:

  void calculate_grid_dimensions_strided(std::size_t num_elements) {
    grid_dimensions = dimensions_t(
        (num_elements + block_dimensions.x - 1) / block_dimensions.x, 1, 1);
  }

Specifically, it is doing a ceil division of num_elements and block_dimensions.x, where;

num_elements is number of vertices to process in advance operator (vertices in frontier).
block_dimensions.x is the size of the block; number of threads within a block: e.g. 256.

So, if you want to get to a specific grid size number, you can either control it by changing number of elements (which is changing every iteration for BFS as vertices get visited), or the block size (which is static).

from gunrock.

neoblizz commented on June 11, 2024

I would also note that the kernel can be written slightly differently as well to accommodate ANY grid size, even 1 without impacting the correctness, see this example: https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/

from gunrock.

mahmoodn commented on June 11, 2024

OK if I understand it correctly, the grid size is a function of number of vertices in the frontier and that is num_elements. Then that means, it is not generally possible to set a constant large size for any graph. Also, there is no guarantee that a big graph uses large grid size because it depends on the a fraction of vertices in the frontier.

Regarding the synthetic graph, do you have any idea about the graph shape for using large grid sizes? For example, does it make sense that for a large fully connected graph with 10M vertices, to have grid sizes of 16K or 32K?

from gunrock.

mahmoodn commented on June 11, 2024

One more question. Is it possible to print the number of elements (grid size) at every BFS iteration? Is this the write place?

  std::size_t num_elements = (input_type == advance_io_type_t::graph)
                                 ? G.get_number_of_vertices()
                                 : input.get_number_of_elements();
 // print num_elements

from gunrock.

neoblizz commented on June 11, 2024

Regarding the synthetic graph, do you have any idea about the graph shape for using large grid sizes? For example, does it make sense that for a large fully connected graph with 10M vertices, to have grid sizes of 16K or 32K?

Yeah, if a graph queues in all 10M vertices of the frontier (which can and does happen for large graphs), the grid size would be ceil(10M/256) = 39063.

from gunrock.

neoblizz commented on June 11, 2024

Just before this line: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L209 you can print both num_elements and grid size:

std::cout << "Num Elements: " << num_elements << "Grid dimensions: " << launch_box.params_t::grid_dimensions.x << std::endl;

I believe that should work. If not, just do a ceil div and calculate the grid size yourself: num_elements + 256- 1) / 256

from gunrock.

mahmoodn commented on June 11, 2024

OK this new version has changed a lot since the previous version. I will close this topic for the moment.

from gunrock.

Question about 2.0.0 changes about gunrock HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent