Code Monkey home page Code Monkey logo

Comments (13)

neoblizz avatar neoblizz commented on June 11, 2024

Hi @mahmoodn, just to clarify, you would like the ability to select the grid and block sizes for the various kernels? Or do you mean you would like to know which grid/block were selected for which kernel?

from gunrock.

mahmoodn avatar mahmoodn commented on June 11, 2024

Basically, I was trying to change the grid size via command line to see the effect, but I found some hard coded parts as you mentioned before.
I assume the default grid size is automatically calculated by the device specification. So that could be a reference number for performance.

from gunrock.

neoblizz avatar neoblizz commented on June 11, 2024

I assume the default grid size is automatically calculated by the device specification. So that could be a reference number for performance.

And you understand the correctness depends on the block and grid sizes? If so, you can pick an application:

from gunrock.

mahmoodn avatar mahmoodn commented on June 11, 2024

If I understand correctly, you are saying that modifying the grid size may affect the correctness of the BFS. is that right? Do you mean that for example, changing 512 to 1024 may cause the program not to visit all vertices? How that is possible? In this case, how the grid size is really determined?

According to this section

  std::size_t num_elements = (input_type == advance_io_type_t::graph)
                                 ? G.get_number_of_vertices()
                                 : input.get_number_of_elements();

  // Set-up and launch block-mapped advance.
  using namespace gcuda::launch_box;
  using launch_t =
      launch_box_t<launch_params_dynamic_grid_t<fallback, dim3_t<256>>>;

  launch_t launch_box;

  launch_box.calculate_grid_dimensions_strided(num_elements);

the num_elements is either G.get_number_of_vertices() or input.get_number_of_elements(). In case of a graph with millions of vertices, e.g. soc-twitter2010, that should be CTA size of millions. Is that right? But it doesn't...

from gunrock.

neoblizz avatar neoblizz commented on June 11, 2024

Oh, my bad, changing the block size from 256 in the example above to a larger number should not impact the correctness. However, the grid size is being calculated based on the block size and number of elements to process. So, you don't really change that.

the num_elements is either G.get_number_of_vertices() or input.get_number_of_elements(). In case of a graph with millions of vertices, e.g. soc-twitter2010, that should be CTA size of millions. Is that right? But it doesn't...

You have it opposite. Grid size is very large, while block size is small.

Just to clarify, grid size is number of blocks being launched and block size is number of threads per block.

from gunrock.

mahmoodn avatar mahmoodn commented on June 11, 2024

OK let me rephrase my question again and see if we on the same track.
What I am going to do is to change the grid size (the number of blocks that are put on the GPU SMs). I want to do that because I am working with GPU simulator. So, I can change the number of SMs. Assume an architecture has 64 SMs, and the number of blocks (grid size) is 128. So, each SM logically receives 2 blocks. Now, if I change the number of SMs to 128, then each SM receives 1 block. No matter how many threads are in the block (the default is 256 as you mentioned), I want to change the number of blocks with the grid size parameter.

I was hoping that the command line option as I mentioned in the previous topic is the solution, but you said that parameter is kind of obsolete. No problem then... If I have to change some hardcoded parameters, I was wondering what are they for the BFS application?

From the previous posts, I see

  std::size_t num_elements = (input_type == advance_io_type_t::graph)
                                 ? G.get_number_of_vertices()
                                 : input.get_number_of_elements();

  // Set-up and launch block-mapped advance.
  using namespace gcuda::launch_box;
  using launch_t =
      launch_box_t<launch_params_dynamic_grid_t<fallback, dim3_t<256>>>;

  launch_t launch_box;

  launch_box.calculate_grid_dimensions_strided(num_elements);

I assume that the num_elements is the thing that I want to change. Do you agree with that? No matter how this parameter gets the value, is it safe to use

launch_box.calculate_grid_dimensions_strided(64);
or
launch_box.calculate_grid_dimensions_strided(512);

for any given graph size?

from gunrock.

neoblizz avatar neoblizz commented on June 11, 2024

I see! I now understand your problem well. If you don't care about the size of the graph and you can make up a synthetic graph, you can change the number of elements. Num elements correspond to the number of vertices in a frontier for BFS.

This is how grid dimension is calculated:

  void calculate_grid_dimensions_strided(std::size_t num_elements) {
    grid_dimensions = dimensions_t(
        (num_elements + block_dimensions.x - 1) / block_dimensions.x, 1, 1);
  }

Specifically, it is doing a ceil division of num_elements and block_dimensions.x, where;

  • num_elements is number of vertices to process in advance operator (vertices in frontier).
  • block_dimensions.x is the size of the block; number of threads within a block: e.g. 256.

So, if you want to get to a specific grid size number, you can either control it by changing number of elements (which is changing every iteration for BFS as vertices get visited), or the block size (which is static).

from gunrock.

neoblizz avatar neoblizz commented on June 11, 2024

I would also note that the kernel can be written slightly differently as well to accommodate ANY grid size, even 1 without impacting the correctness, see this example: https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/

from gunrock.

mahmoodn avatar mahmoodn commented on June 11, 2024

OK if I understand it correctly, the grid size is a function of number of vertices in the frontier and that is num_elements. Then that means, it is not generally possible to set a constant large size for any graph. Also, there is no guarantee that a big graph uses large grid size because it depends on the a fraction of vertices in the frontier.

Regarding the synthetic graph, do you have any idea about the graph shape for using large grid sizes? For example, does it make sense that for a large fully connected graph with 10M vertices, to have grid sizes of 16K or 32K?

from gunrock.

mahmoodn avatar mahmoodn commented on June 11, 2024

One more question. Is it possible to print the number of elements (grid size) at every BFS iteration? Is this the write place?

  std::size_t num_elements = (input_type == advance_io_type_t::graph)
                                 ? G.get_number_of_vertices()
                                 : input.get_number_of_elements();
 // print num_elements

from gunrock.

neoblizz avatar neoblizz commented on June 11, 2024

Regarding the synthetic graph, do you have any idea about the graph shape for using large grid sizes? For example, does it make sense that for a large fully connected graph with 10M vertices, to have grid sizes of 16K or 32K?

Yeah, if a graph queues in all 10M vertices of the frontier (which can and does happen for large graphs), the grid size would be ceil(10M/256) = 39063.

from gunrock.

neoblizz avatar neoblizz commented on June 11, 2024

Just before this line: https://github.com/gunrock/gunrock/blob/main/include/gunrock/framework/operators/advance/block_mapped.hxx#L209 you can print both num_elements and grid size:

std::cout << "Num Elements: " << num_elements << "Grid dimensions: " << launch_box.params_t::grid_dimensions.x << std::endl;

I believe that should work. If not, just do a ceil div and calculate the grid size yourself: num_elements + 256- 1) / 256

from gunrock.

mahmoodn avatar mahmoodn commented on June 11, 2024

OK this new version has changed a lot since the previous version. I will close this topic for the moment.

from gunrock.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.