Comments (3)
Passing default_tag=None
to add_prefetch
and parallelizing the prefetch by explicitly calling split_inames
might help.
from loopy.
Also, if the workload is coming from Mirge-Com, it might be useful to evaluate if such big batched einsums are relevant. See illinois-ceesd/mirgecom#777 for context.
from loopy.
Good point. I'll dump out the kernels for the current y3 driver to see if anything has changed in terms of batch sizes.
In any case, just setting default_tag=None
doesn't affect the overall scaling. It does change the profiling results somewhat.
Ordered by: cumulative time
List reduced from 467 to 30 due to restriction <30>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 8.846 8.846 __init__.py:791(prefetch_and_project)
1 0.000 0.000 6.842 6.842 data.py:302(add_prefetch)
1 0.000 0.000 6.842 6.842 data.py:153(add_prefetch_for_single_kernel)
1 0.001 0.001 6.761 6.761 precompute.py:353(precompute_for_single_kernel)
35274/7782 4.665 0.000 4.703 0.001 __init__.py:936(wrapper)
1 0.001 0.001 3.908 3.908 array_buffer_map.py:196(__init__)
1 0.000 0.000 3.814 3.814 array_buffer_map.py:173(compute_bounds)
1 0.000 0.000 3.788 3.788 array_buffer_map.py:162(find_var_base_indices_and_shape_from_inames)
1 0.000 0.000 3.788 3.788 array_buffer_map.py:165(<listcomp>)
2 0.000 0.000 3.788 1.894 tools.py:379(base_index_and_length)
44 2.102 0.048 2.107 0.048 __init__.py:769(_number_to_expr_like)
44 0.000 0.000 2.106 0.048 __init__.py:801(expr_like_add)
3 0.000 0.000 2.059 0.686 __init__.py:1061(obj_project_out_except)
2 0.000 0.000 2.004 1.002 translation_unit.py:677(_collective_transform)
1 0.001 0.001 2.004 2.004 decouple_domain.py:38(decouple_domain)
4 0.000 0.000 0.852 0.213 tools.py:352(op)
2 0.779 0.390 0.780 0.390 isl_helpers.py:576(find_max_of_pwaff_with_params)
2 0.000 0.000 0.479 0.240 tools.py:370(dim_max)
2 0.478 0.239 0.478 0.239 tools.py:339(_get_dim_max)
6450/630 0.012 0.000 0.435 0.001 __init__.py:256(__call__)
1636/1166 0.002 0.000 0.422 0.000 __init__.py:752(wrapper)
153 0.001 0.000 0.401 0.003 instruction.py:858(with_transformed_expressions)
119 0.001 0.000 0.389 0.003 symbolic.py:134(map_reduction)
190/162 0.001 0.000 0.386 0.002 __init__.py:524(map_sum)
190/162 0.000 0.000 0.385 0.002 __init__.py:525(<listcomp>)
1 0.000 0.000 0.379 0.379 precompute.py:302(map_kernel)
5 0.000 0.000 0.379 0.076 symbolic.py:1370(__call__)
2 0.000 0.000 0.379 0.189 precompute.py:320(<lambda>)
2 0.000 0.000 0.378 0.189 symbolic.py:1314(map_call)
1 0.000 0.000 0.378 0.378 precompute.py:227(map_substitution)
from loopy.
Related Issues (20)
- Spurious MissingBarrierError HOT 1
- duplicaing an iname results in an unschedulable kernel HOT 2
- lp.make_einsum fails parsing subscript expressions with spaces in them
- CInstructions break get_op_map HOT 2
- Private variables not supported with ISPC / Exception needs explanations HOT 1
- LOCAL memory access starts at an inefficient point relative to access pattern HOT 2
- Back out the implemented-domains cache hackery
- check_implemented_domains still fails HOT 1
- Allow vec tagging of odd sizes for local temporaries HOT 1
- Support pragma unroll HOT 3
- ASV sumpy benchmarks are broken
- Math callables not registered inside pymbolic LogicalNot HOT 2
- Replace `_lpy_even_div` with `//` in unchecked mode
- `memoize_on_disk` can produce filenames that are too long HOT 1
- matmul.floopy doesn't seem to work HOT 2
- Unsigned array bounds break in CUDA HOT 2
- Loss of length-1 inames HOT 2
- Pinning dependencies HOT 2
- Enable `check_untyped_defs` in mypy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from loopy.