Code Monkey home page Code Monkey logo

Comments (3)

kaushikcfd avatar kaushikcfd commented on June 11, 2024

Passing default_tag=None to add_prefetch and parallelizing the prefetch by explicitly calling split_inames might help.

from loopy.

kaushikcfd avatar kaushikcfd commented on June 11, 2024

Also, if the workload is coming from Mirge-Com, it might be useful to evaluate if such big batched einsums are relevant. See illinois-ceesd/mirgecom#777 for context.

from loopy.

nchristensen avatar nchristensen commented on June 11, 2024

Good point. I'll dump out the kernels for the current y3 driver to see if anything has changed in terms of batch sizes.

In any case, just setting default_tag=None doesn't affect the overall scaling. It does change the profiling results somewhat.

   Ordered by: cumulative time
   List reduced from 467 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.846    8.846 __init__.py:791(prefetch_and_project)
        1    0.000    0.000    6.842    6.842 data.py:302(add_prefetch)
        1    0.000    0.000    6.842    6.842 data.py:153(add_prefetch_for_single_kernel)
        1    0.001    0.001    6.761    6.761 precompute.py:353(precompute_for_single_kernel)
35274/7782    4.665    0.000    4.703    0.001 __init__.py:936(wrapper)
        1    0.001    0.001    3.908    3.908 array_buffer_map.py:196(__init__)
        1    0.000    0.000    3.814    3.814 array_buffer_map.py:173(compute_bounds)
        1    0.000    0.000    3.788    3.788 array_buffer_map.py:162(find_var_base_indices_and_shape_from_inames)
        1    0.000    0.000    3.788    3.788 array_buffer_map.py:165(<listcomp>)
        2    0.000    0.000    3.788    1.894 tools.py:379(base_index_and_length)
       44    2.102    0.048    2.107    0.048 __init__.py:769(_number_to_expr_like)
       44    0.000    0.000    2.106    0.048 __init__.py:801(expr_like_add)
        3    0.000    0.000    2.059    0.686 __init__.py:1061(obj_project_out_except)
        2    0.000    0.000    2.004    1.002 translation_unit.py:677(_collective_transform)
        1    0.001    0.001    2.004    2.004 decouple_domain.py:38(decouple_domain)
        4    0.000    0.000    0.852    0.213 tools.py:352(op)
        2    0.779    0.390    0.780    0.390 isl_helpers.py:576(find_max_of_pwaff_with_params)
        2    0.000    0.000    0.479    0.240 tools.py:370(dim_max)
        2    0.478    0.239    0.478    0.239 tools.py:339(_get_dim_max)
 6450/630    0.012    0.000    0.435    0.001 __init__.py:256(__call__)
1636/1166    0.002    0.000    0.422    0.000 __init__.py:752(wrapper)
      153    0.001    0.000    0.401    0.003 instruction.py:858(with_transformed_expressions)
      119    0.001    0.000    0.389    0.003 symbolic.py:134(map_reduction)
  190/162    0.001    0.000    0.386    0.002 __init__.py:524(map_sum)
  190/162    0.000    0.000    0.385    0.002 __init__.py:525(<listcomp>)
        1    0.000    0.000    0.379    0.379 precompute.py:302(map_kernel)
        5    0.000    0.000    0.379    0.076 symbolic.py:1370(__call__)
        2    0.000    0.000    0.379    0.189 precompute.py:320(<lambda>)
        2    0.000    0.000    0.378    0.189 symbolic.py:1314(map_call)
        1    0.000    0.000    0.378    0.378 precompute.py:227(map_substitution)

from loopy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.