Comments (25)
Absent evidence that this is actually causing considerable memory usage, this is going to go on the backlog for now. I'll continue to encourage all users to think about functors as stateless objects that wrap statically compiled functions that have just been given the same name so the runtime knows how to refer to them uniformly across processes.
from legion.
I'll continue to encourage all users to think about functors as stateless objects that wrap statically compiled functions that have just been given the same name so the runtime knows how to refer to them uniformly across processes.
Are you saying that the interface really ought to be
struct ProjectionFunctor {
LogicalRegion (*project_region)(const Mappable *mappable, unsigned index,
LogicalRegion upper_bound,
const DomainPoint &point);
LogicalRegion (*project_partition)(const Mappable *mappable, unsigned index,
LogicalPartition upper_bound,
const DomainPoint &point);
LogicalRegion (*project_region0)(LogicalRegion upper_bound,
const DomainPoint &point,
const Domain &launch_domain);
LogicalRegion (*project_partition0)(LogicalPartition upper_bound,
const DomainPoint &point,
const Domain &launch_domain);
void (*invert_region)(LogicalRegion region, LogicalRegion upper_bound,
const Domain &launch_domain,
std::vector<DomainPoint> &ordered_points);
void (*invert_partition)(LogicalRegion region, LogicalPartition upper_bound,
const Domain &launch_domain,
std::vector<DomainPoint> &ordered_points);
bool is_exclusive,is_functional,is_invertible;
unsigned depth;
};
so that there is no *this
at all? Maybe I'm misunderstanding what "the same name" means here.
from legion.
I'll continue to encourage all users to think about functors as stateless objects that wrap statically compiled functions that have just been given the same name so the runtime knows how to refer to them uniformly across processes.
What's the point of dynamic projection functor registration then? I've definitely had use cases like this where I need a projection functor to keep track of some extra information (essentially treating it as a closure) stored in the functor itself.
from legion.
If all you want to do is build a closure, you don't need dynamic registration of projection functors. Just use the projection_args
interface and set your projection functor as non-functional:
Lines 1084 to 1085 in 91e4df2
This is how Regent implements closures. There is one, statically registered projection functor and it gets instantiated with the dynamic values of the enclosed values at runtime.
Non-functional projection functors have some cost, but I would imagine that dynamic projection functors have even higher cost, so I think overall we shouldn't be encouraging dynamic projection functors for use cases that aren't truly dynamic.
from legion.
oh wow, TIL about this corner of the API.
from legion.
This is how Regent implements closures. There is one, statically registered projection functor and it gets instantiated with the dynamic values of the enclosed values at runtime.
How does the single registered functor receive those arguments? Do they eventually become Mappable::mapper_data
? It seems weird to copy them every time you launch a task, but in our case it's just an integer for each element of the launch domain, so maybe that's OKβ¦
from legion.
Projection arguments are attached to the region requirement, so you need to query the kind of mappable you've got and then fetch the argument out of the object's region requirements:
Lines 3963 to 3972 in 91e4df2
Here is the (C API) code that Regent uses to do this:
legion/language/src/regent/codegen.t
Lines 3064 to 3081 in 91e4df2
from legion.
OK. It seems like it would be eminently useful to have
class ProjectionFunctor {
// ...
protected:
static const RegionRequirement* get_region_requirement(const Mappable *, unsigned);
};
or just
class Mappable {
// ...
public:
virtual const RegionRequirement* get_region_requirement(unsigned) const = 0;
};
from legion.
This seems reasonable to me, but we should run it by @lightsighter.
from legion.
I don't want to do that. If the runtime is going to add an API it needs to be something that the user fundamentally cannot do on their own. I'm not going to support "convenience" methods that create multiple ways of doing things and only make the API bigger and create confusion.
from legion.
I'm not going to support "convenience" methods that create multiple ways of doing things and only make the API bigger and create confusion.
In what way is it less confusing to have users reimplement obscure counting logic based entirely on implementation details and run the risk of having it silently fail when the sequence of region requirements for some type changes? Moreover, they would be doing that inside one branch of a standard "exhaustive enumeration of derived classes" anti-pattern that falls over whenever any new type is added.
Fundamentally, this RegionRequirement
-fetching code is going to exist, either once where you can maintain it, or in multiple places where you can't (except that you have to maintain the copy in Regent and the C API anyway). You could always avoid "multiple ways of doing things" by making that one place be before calling ProjectionFunctor::project
; just (deprecate and) replace the current const Mappable*,unsigned
overloads with const Mappable*,const RegionRequirement&
ones.
from legion.
In what way is it less confusing to have users reimplement obscure counting logic based entirely on implementation details
That is the C API and not the C++ API, and I neither designed nor implemented it. In the C++ API you don't have to do any "obscure counting logic based entirely on implementation details".
Fundamentally, this RegionRequirement-fetching code is going to exist, either once where you can maintain it, or in multiple places where you can't (except that you have to maintain the copy in Regent and the C API anyway).
If someone wants to put it in the C API then I wouldn't complain, but I don't want it in the C++ API.
You could always avoid "multiple ways of doing things" by making that one place be before calling ProjectionFunctor::project; just (deprecate and) replace the current const Mappable*,unsigned overloads with const Mappable*,const RegionRequirement& ones.
That is overly specific and there are other projection functors that actually need the index of the region requirement that they are projecting. This current API is the most general. Anything else is a specialization and therefore unnecessary for the runtime to provide as users can specialize how they like.
from legion.
In the C++ API you don't have to do any "obscure counting logic based entirely on implementation details".
I must have misunderstood something: for a projection applied to a copy operation, what C++ call would you make to select the correct RegionRequirement
based just on the unsigned
index?
from legion.
I must have misunderstood something: for a projection applied to a copy operation, what C++ call would you make to select the correct RegionRequirement based just on the unsigned index?
It all depends on what order you put your region requirements in the copy launcher. There's no other reasonable order to expect the user to know.
from legion.
How could it depend on the order in which the client populated those vectors? If what you mean is that it's based on the order of the members, such that dst_requirements
always immediately precedes src_indirect_requirements
, yes that's the order I would guess if I had to guess an order, but I think that still counts as an implementation detail since one could just as easily say that all sources come before all destinations or so.
from legion.
If what you mean is that it's based on the order of the members, such that dst_requirements always immediately precedes src_indirect_requirements
Exactly, what else could it reasonably be? Nothing. It's user visible and doesn't have anything to do with the implementation inside the runtime.
from legion.
Ok, here is a short Regent code that demonstrates Regent's approach to code generation for multi-level projection functors and why those projection functors cannot be "functional" per #1309 (comment)). You can find the code at the bottom, but here are the important parts:
- A cross-product is the cartesian product of two partitions. Importantly, these can dynamic in Regent. We don't have a fixed or static number of them in a program, and they can be generated dynamically. When you read the code below, please pretend that
N
is a dynamic number and that the partitionp2
is more interesting than just the same partition over and over; i.e., it could have been a genuinely different partition every time and the code would still work. - At the point of
inner_task
, we no nothing about where the cross-productcp
comes from and how it is generated. So whatever thecross_product
type contains, it needs to capture enough information to do a launch on a (potentially dynamically-generated) cross-product. (Remember that Regent supports separate compilation and thatinner_task
might not be in the same compilation unit asmain
.) - Despite this, Regent actually generates static code for the projection functor at line 25. It can do this because the shape of the region tree is fixed (it is always two levels). But it does not know either the identity of the top-level partition
p1
or the identity of any of the second-level partitions. - Right now, the way we solve this is by capturing the colors of the second-level partitions. We enforce that all second-level partitions must have the same color. Then for a cross-product of
L
levels, it is sufficient to track theLogicalPartition
at the top, along withL-1
colors for the remaining levels. - But inside of a projection functor, Legion provides no way to get access to the
L-1
colors. In fact, there might be multiple cross-products on the same region tree, so we cannot make any naive assumptions about which partition we need to grab as we walk down the tree. Nor can we (I think) tag partitions in a way that would be helpful---it would amount to doing colors in a different way. The only way to implementinner_task
correctly is to capture theL-1
colors in thecross_product
object and then to close over that object in theRegionRequirement
. That's what I was referring to in this comment: #1410 (comment) - Just to be clear, multi-level projection functors are only one case where we close over data in a projection functor. This is pretty fundamental to a lot of use cases in Regent (e.g., periodic boundary conditions, where you may need to do something like
p[i + 1 % N]
whereN
is a dynamically determined number based on the partitioning of the mesh. But while these numbers are "dynamic" in the sense that they cannot be constant-folded into the code at compile-time, they are usually stable in that there are (typically, though not always) a small number of values actually passed into them at runtime.
I am open to alternatives, but they need to support this level of expressiveness and not make unreasonable assumptions of what the compiler is able to determine (i.e., we will never be able to do a static analysis that enumerates all the cross-products a given task may be called with).
import "regent"
fspace fs {
value : int,
color1 : int1d,
color2 : int1d,
-- ...
}
task leaf_task(r : region(ispace(int1d), fs), cp_point : int2d)
where reads writes(r.value) do
for i in r do
r[i].value += cp_point.x * cp_point.y
end
end
task inner_task(r : region(ispace(int1d), fs),
p1 : partition(disjoint, r, ispace(int1d)),
p2 : partition(disjoint, r, ispace(int1d)),
cp : cross_product(p1, p2),
cp_space : ispace(int2d))
where reads writes(r.value) do
__demand(__index_launch)
for i in cp_space do
leaf_task(cp[i.x][i.y], i)
end
end
task main()
var r = region(ispace(int1d, 10), fs)
fill(r.value, 0)
for i in r do
r[i].color1 = i/2
r[i].color2 = i%2
end
var p1 = partition(r.color1, ispace(int1d, 5))
var N = 2
for n = 0, N do
var p2 = partition(r.color2, ispace(int1d, 2))
var cp = cross_product(p1, p2)
var cp_space = ispace(int2d, {5, 2})
inner_task(r, p1, p2, cp, cp_space)
end
end
regentlib.start(main)
from legion.
There are two purposes of projection functors (in decreasing order of priority):
- Provide a way for Legion to recognize when multiple different index space task operations are using the same sets of data.
- Provide a portable name for how to compute projections across multiple processes (without needing to resort to
dlsym
chicanery).
If we didn't need to do (1) then you can imagine a whole spectrum of possible solutions ranging from registering a brand-new projection pure functor for every single dynamic index space operation all the way to the other extreme of having exactly one projection functor which can introspect the entire heap (and even some parts of remote heaps through the runtime API) to decide how to perform a projection. However, we don't live in that world. It's very important that whatever solution anybody proposes makes (1) a priority and forces users to grapple with their ability to "compress" the description of sets accessed by multiple projection functors across index space operations.
So far I've prioritized (1) over ease of use and I still think that is the right trade-off because I'm tired of users not using Legion correctly and then having to explain to them why their performance is bad so I'm just protecting myself. That doesn't mean we shouldn't still support the "uncompressible" cases, but I don't want to make it easy for that to be the default that everyone jumps to.
The other constraint I will put on the design is that Legion will not be in the business of providing closures. I've tentatively proposed to @elliottslaughter that I would be willing to support the following API:
class ProjectionFunctor {
...
virtual LogicalRegion project(const DomainPoint &point, const void *args, size_t arglen) = 0;
...
};
It would be up to users to build their own closures for each and every region requirement to use it and Legion will use the "buffer of bits" as a key in a memoization table for determining when it is safe to replay the results. I still feel like most users are going to default to this even when they actually can compress their description of region tree accesses without needing it, but I'm tired of arguing.
But it does not know either the identity of the top-level partition p1 or the identity of any of the second-level partitions.
This part is confusing to me. It doesn't need to know the identity of the partitions; it just needs to know the names of the colors. Why not just have a "template" (not a literal C++ template) and instantiate it and register it right before dispatching the task if you haven't encountered the colors before? (This is what Legate does.)
But inside of a projection functor, Legion provides no way to get access to the L-1 colors.
This part is not true. All users have total access to the runtime and the context-free methods for traversing the region tree to explore and it. You can get every single node in the region tree and each of their colors to aid in your decision making if you had to.
Just to be clear, multi-level projection functors are only one case where we close over data in a projection functor. This is pretty fundamental to a lot of use cases in Regent (e.g., periodic boundary conditions, where you may need to do something like p[i + 1 % N] where N is a dynamically determined number based on the partitioning of the mesh.
This is definitely a compressible case. I can easily write a projection functor that handles this as a function of the launch domain which will also be parametrized on N
. Special cases that are "regular" like this are also "compressible". Yes, there will be branches in the code for your projection functor, but it will still be pure functional with the existing arguments.
from legion.
It's user visible and doesn't have anything to do with the implementation inside the runtime.
This is of course technically true, but it is quite a creative use of an index. It calls to mind situations where I could tolerate some interface design until I tried writing the user-facing documentation for it, at which point it became unconscionably embarrassing.
Legion will not be in the business of providing closures
What exactly does this mean? Dynamic registration of projection functors does support them being closures, albeit with this very issue with destroying them. The "tentative[] API" would require Legion to maintain, copy, and compare the byte-buffers rather than just using the address of such a closure, although maybe the coordination cost of dynamic registration is greater than the cost of such copying and comparison.
instantiate it and register it right before dispatching the task if you haven't encountered the colors before?
That's basically what we did first (although we just reused the functor for several related objects we knew used the same pattern rather than maintaining a global cache). It's those last-minute registrations that leak memory (though perhaps not on a practically relevant scale) and raise the issue that "dynamic projection functors have even higher cost", but they certainly do accomplish the goal of "recogniz[ing] when multiple different index space task operations are using the same sets of data".
from legion.
at which point it became unconscionably embarrassing.
I'm not embarrassed; the semantics are obvious.
What exactly does this mean?
It means I'm not going to provide an overload of the ProjectionFunctor::project
method for every single use case that some random user decides to propose.
from legion.
I don't think anyone has proposed any new overloads except for the "tentative[] API". I know that I just wanted either to delete these objects (in an appropriately deferred fashion) or to not have to write 38 lines of code to get the relevant const RegionRequirement&
for my unique such object. I don't care which one of those we pursue; that choice should be made to best support the memoization that Legion needs to do.
Actually, now I remember that I suggested adding const RegionRequirement&
overloads to avoid having to have a convenience helper, but I'm willing to believe that the index has other purposes such that that's a bad idea (and so I forgot about it already).
from legion.
Was about to quote you back to yourself. π
that choice should be made to best support the memoization that Legion needs to do.
I don't actually care that it is the best memoization for the implementation. I just want one memoization that is conducive to getting users to describe the reuse as much as possible.
from legion.
I don't actually care that it is the best memoization for the implementation. I just want one memoization that is conducive to getting users to describe the reuse as much as possible.
I think we're in violent agreement here: "best support" was meant to include "users can naturally provide the relevant information" as well as "the implementation can use the information effectively". Still, we want the implementation to be efficient; do you have a theory as to whether dynamic projection functor registration (and pointer comparisons) or static (and single) functor registration (and associated copies and comparisons of the state data, which in our case is typically a few bytes per node or up to a few bytes per core) is more desirable in that regard?
from legion.
Static registration and more dynamic state data that happens to turn out the same is probably going to be better for the runtime. Part of the 'key' in the memoization is always going to be projection functor ID. The runtime isn't going to do comparisons of the regions used by different projection functors. Effectively we want a cheap test: if two index space operations are using the same functor, with the same upper bound region/partition, and same state data then we can safely conclude that they are using the same sets of logical regions.
from legion.
This direction sounds reasonable; we have already rewritten our functor to use the set_projection_args
approach.
It's worth pointing out that, just as we would otherwise have a theoretically unbounded number of functors, we will have a somewhat smaller but still unbounded number of distinct state-data blobs. Legion would have to manage the lifetime of the cache, of course; I'd be glad to announce to Legion the beginning and end of the lifetime of each "reference" to a given blob-value if that helped. (I say reference because we could have overlapping lifetimes of two different usages of the same sequence of bytes.)
If we do intend for people to use this separate state data in preference to separate functors, it would be good to document that; perhaps ideally we would deprecate having state in a ProjectionFunctor
at all (in which case this issue per se could be closed), although maybe someone needs much more data than we do and can't afford to serialize it for the blob approach.
I do still find it surprising that fetching the relevant const RegionRequirement&
to gain access to the blob is so complicated, but additional documentation here (at least to say that each suitable Mappable
object has an associated sequence of RegionRequirement
objects into which the appropriate index refers) would certainly be sufficient to make it usable.
from legion.
Related Issues (20)
- Freeing/Refreshing trace IDs
- Invalid allocation payload buffer size passed to network module HOT 7
- Legion: S3D poor weak scaling performance on Frontier HOT 24
- cmake complains that gcc 12.2.0 doesn't support C++17 HOT 5
- Regent: `__parallel_prefix` incorrect on Frontier HOT 3
- Regent: ROCm 5.1 retiring from Frontier HOT 2
- Legion: default mapper disregards field constraints for region requirements with reduction privileges HOT 8
- Realm: API to estimate multi-hop copy bandwidth HOT 1
- Default mapper: mark `default_make_instance` and `default_create_custom_instances` as virtual HOT 5
- `legion_utilities.h` UBSAN Error HOT 2
- Legion Prof: Framebuffer instances missing index space bounds HOT 3
- Regent: Add support for external attach
- CUDA Compiler Detection fails when using Kokkos with unsupported Clang as CUDA compiler
- Potential scalability hazard in Realm GASNet-EX layer HOT 11
- Realm: Print messages at -errlevel or above to stderr if -logfile unspecified HOT 2
- Legion types are not hashable HOT 6
- Realm: error when compiling with HIP with NVIDIA GPUs HOT 5
- Legion: warning `βit.Legion::Domain::DomainPointIterator::is_validβ may be used uninitialized` HOT 5
- S3D: Subranks single node performance HOT 8
- legion_prof_rs: load archive when navigating to directory in browser HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from legion.