The runtime takes ownership for deleting the projection functor after the

OK. It seems like it would be eminently useful to have <div class="highlight high

This seems reasonable to me, but we should run it by <a class="user-mention notranslat

ProjectionFunctor destruction,about stanfordlegion/legion

Comments (25)

lightsighter commented on July 17, 2024

Absent evidence that this is actually causing considerable memory usage, this is going to go on the backlog for now. I'll continue to encourage all users to think about functors as stateless objects that wrap statically compiled functions that have just been given the same name so the runtime knows how to refer to them uniformly across processes.

from legion.

opensdh commented on July 17, 2024

I'll continue to encourage all users to think about functors as stateless objects that wrap statically compiled functions that have just been given the same name so the runtime knows how to refer to them uniformly across processes.

Are you saying that the interface really ought to be

struct ProjectionFunctor {
  LogicalRegion (*project_region)(const Mappable *mappable, unsigned index,
                                  LogicalRegion upper_bound,
                                  const DomainPoint &point);
  LogicalRegion (*project_partition)(const Mappable *mappable, unsigned index,
                                     LogicalPartition upper_bound,
                                     const DomainPoint &point);

  LogicalRegion (*project_region0)(LogicalRegion upper_bound,
                                   const DomainPoint &point,
                                   const Domain &launch_domain);
  LogicalRegion (*project_partition0)(LogicalPartition upper_bound,
                                      const DomainPoint &point,
                                      const Domain &launch_domain);

  void (*invert_region)(LogicalRegion region, LogicalRegion upper_bound,
                        const Domain &launch_domain,
                        std::vector<DomainPoint> &ordered_points);
  void (*invert_partition)(LogicalRegion region, LogicalPartition upper_bound,
                           const Domain &launch_domain,
                           std::vector<DomainPoint> &ordered_points);

  bool is_exclusive,is_functional,is_invertible;
  unsigned depth;
};

so that there is no *this at all? Maybe I'm misunderstanding what "the same name" means here.

from legion.

rohany commented on July 17, 2024

I'll continue to encourage all users to think about functors as stateless objects that wrap statically compiled functions that have just been given the same name so the runtime knows how to refer to them uniformly across processes.

What's the point of dynamic projection functor registration then? I've definitely had use cases like this where I need a projection functor to keep track of some extra information (essentially treating it as a closure) stored in the functor itself.

from legion.

elliottslaughter commented on July 17, 2024

If all you want to do is build a closure, you don't need dynamic registration of projection functors. Just use the projection_args interface and set your projection functor as non-functional:

legion/runtime/legion.h

Lines 1084 to 1085 in 91e4df2

    
           const void* get_projection_args(size_t *size) const; 
        
           void set_projection_args(const void *args, size_t size, bool own = false);

This is how Regent implements closures. There is one, statically registered projection functor and it gets instantiated with the dynamic values of the enclosed values at runtime.

Non-functional projection functors have some cost, but I would imagine that dynamic projection functors have even higher cost, so I think overall we shouldn't be encouraging dynamic projection functors for use cases that aren't truly dynamic.

from legion.

rohany commented on July 17, 2024

oh wow, TIL about this corner of the API.

from legion.

opensdh commented on July 17, 2024

This is how Regent implements closures. There is one, statically registered projection functor and it gets instantiated with the dynamic values of the enclosed values at runtime.

How does the single registered functor receive those arguments? Do they eventually become Mappable::mapper_data? It seems weird to copy them every time you launch a task, but in our case it's just an integer for each element of the launch domain, so maybe that's OK…

from legion.

elliottslaughter commented on July 17, 2024

Projection arguments are attached to the region requirement, so you need to query the kind of mappable you've got and then fetch the argument out of the object's region requirements:

legion/runtime/legion.h

Lines 3963 to 3972 in 91e4df2

    
           virtual MappableType get_mappable_type(void) const = 0; 
        
           virtual const Task* as_task(void) const { return NULL; } 
        
           virtual const Copy* as_copy(void) const { return NULL; } 
        
           virtual const InlineMapping* as_inline(void) const { return NULL; } 
        
           virtual const Acquire* as_acquire(void) const { return NULL; } 
        
           virtual const Release* as_release(void) const { return NULL; } 
        
           virtual const Close* as_close(void) const { return NULL; } 
        
           virtual const Fill* as_fill(void) const { return NULL; } 
        
           virtual const Partition* as_partition(void) const { return NULL; } 
        
           virtual const MustEpoch* as_must_epoch(void) const { return NULL; }

Here is the (C API) code that Regent uses to do this:

legion/language/src/regent/codegen.t

Lines 3064 to 3081 in 91e4df2

    
           var mappable_type = c.legion_mappable_get_type(mappable) 
        
           if mappable_type == c.TASK_MAPPABLE then 
        
             var task = c.legion_mappable_as_task(mappable) 
        
             [requirement] = c.legion_task_get_requirement(task, idx) 
        
           elseif mappable_type == c.COPY_MAPPABLE then 
        
             var copy = c.legion_mappable_as_copy(mappable) 
        
             [requirement] = c.legion_copy_get_requirement(copy, idx) 
        
           elseif mappable_type == c.FILL_MAPPABLE then 
        
             var fill = c.legion_mappable_as_fill(mappable) 
        
             std.assert(idx == 0, "projection index for fill is not zero") 
        
             [requirement] = c.legion_fill_get_requirement(fill) 
        
           elseif mappable_type == c.INLINE_MAPPABLE then 
        
             var mapping = c.legion_mappable_as_inline_mapping(mappable) 
        
             std.assert(idx == 0, "projection index for inline mapping is not zero") 
        
             [requirement] = c.legion_inline_get_requirement(mapping) 
        
           else 
        
             std.assert(false, "unhandled mappable type") 
        
           end

from legion.

opensdh commented on July 17, 2024

OK. It seems like it would be eminently useful to have

class ProjectionFunctor {
  // ...
protected:
  static const RegionRequirement* get_region_requirement(const Mappable *, unsigned);
};

or just

class Mappable {
  // ...
public:
  virtual const RegionRequirement* get_region_requirement(unsigned) const = 0;
};

from legion.

elliottslaughter commented on July 17, 2024

This seems reasonable to me, but we should run it by @lightsighter.

from legion.

lightsighter commented on July 17, 2024

I don't want to do that. If the runtime is going to add an API it needs to be something that the user fundamentally cannot do on their own. I'm not going to support "convenience" methods that create multiple ways of doing things and only make the API bigger and create confusion.

from legion.

opensdh commented on July 17, 2024

I'm not going to support "convenience" methods that create multiple ways of doing things and only make the API bigger and create confusion.

In what way is it less confusing to have users reimplement obscure counting logic based entirely on implementation details and run the risk of having it silently fail when the sequence of region requirements for some type changes? Moreover, they would be doing that inside one branch of a standard "exhaustive enumeration of derived classes" anti-pattern that falls over whenever any new type is added.

Fundamentally, this RegionRequirement-fetching code is going to exist, either once where you can maintain it, or in multiple places where you can't (except that you have to maintain the copy in Regent and the C API anyway). You could always avoid "multiple ways of doing things" by making that one place be before calling ProjectionFunctor::project; just (deprecate and) replace the current const Mappable*,unsigned overloads with const Mappable*,const RegionRequirement& ones.

from legion.

lightsighter commented on July 17, 2024

In what way is it less confusing to have users reimplement obscure counting logic based entirely on implementation details

That is the C API and not the C++ API, and I neither designed nor implemented it. In the C++ API you don't have to do any "obscure counting logic based entirely on implementation details".

Fundamentally, this RegionRequirement-fetching code is going to exist, either once where you can maintain it, or in multiple places where you can't (except that you have to maintain the copy in Regent and the C API anyway).

If someone wants to put it in the C API then I wouldn't complain, but I don't want it in the C++ API.

You could always avoid "multiple ways of doing things" by making that one place be before calling ProjectionFunctor::project; just (deprecate and) replace the current const Mappable*,unsigned overloads with const Mappable*,const RegionRequirement& ones.

That is overly specific and there are other projection functors that actually need the index of the region requirement that they are projecting. This current API is the most general. Anything else is a specialization and therefore unnecessary for the runtime to provide as users can specialize how they like.

from legion.

opensdh commented on July 17, 2024

In the C++ API you don't have to do any "obscure counting logic based entirely on implementation details".

I must have misunderstood something: for a projection applied to a copy operation, what C++ call would you make to select the correct RegionRequirement based just on the unsigned index?

from legion.

lightsighter commented on July 17, 2024

I must have misunderstood something: for a projection applied to a copy operation, what C++ call would you make to select the correct RegionRequirement based just on the unsigned index?

It all depends on what order you put your region requirements in the copy launcher. There's no other reasonable order to expect the user to know.

from legion.

opensdh commented on July 17, 2024

How could it depend on the order in which the client populated those vectors? If what you mean is that it's based on the order of the members, such that dst_requirements always immediately precedes src_indirect_requirements, yes that's the order I would guess if I had to guess an order, but I think that still counts as an implementation detail since one could just as easily say that all sources come before all destinations or so.

from legion.

lightsighter commented on July 17, 2024

If what you mean is that it's based on the order of the members, such that dst_requirements always immediately precedes src_indirect_requirements

Exactly, what else could it reasonably be? Nothing. It's user visible and doesn't have anything to do with the implementation inside the runtime.

from legion.

elliottslaughter commented on July 17, 2024

Ok, here is a short Regent code that demonstrates Regent's approach to code generation for multi-level projection functors and why those projection functors cannot be "functional" per #1309 (comment)). You can find the code at the bottom, but here are the important parts:

A cross-product is the cartesian product of two partitions. Importantly, these can dynamic in Regent. We don't have a fixed or static number of them in a program, and they can be generated dynamically. When you read the code below, please pretend that N is a dynamic number and that the partition p2 is more interesting than just the same partition over and over; i.e., it could have been a genuinely different partition every time and the code would still work.
At the point of inner_task, we no nothing about where the cross-product cp comes from and how it is generated. So whatever the cross_product type contains, it needs to capture enough information to do a launch on a (potentially dynamically-generated) cross-product. (Remember that Regent supports separate compilation and that inner_task might not be in the same compilation unit as main.)
Despite this, Regent actually generates static code for the projection functor at line 25. It can do this because the shape of the region tree is fixed (it is always two levels). But it does not know either the identity of the top-level partition p1 or the identity of any of the second-level partitions.
Right now, the way we solve this is by capturing the colors of the second-level partitions. We enforce that all second-level partitions must have the same color. Then for a cross-product of L levels, it is sufficient to track the LogicalPartition at the top, along with L-1 colors for the remaining levels.
But inside of a projection functor, Legion provides no way to get access to the L-1 colors. In fact, there might be multiple cross-products on the same region tree, so we cannot make any naive assumptions about which partition we need to grab as we walk down the tree. Nor can we (I think) tag partitions in a way that would be helpful---it would amount to doing colors in a different way. The only way to implement inner_task correctly is to capture the L-1 colors in the cross_product object and then to close over that object in the RegionRequirement. That's what I was referring to in this comment: #1410 (comment)
Just to be clear, multi-level projection functors are only one case where we close over data in a projection functor. This is pretty fundamental to a lot of use cases in Regent (e.g., periodic boundary conditions, where you may need to do something like p[i + 1 % N] where N is a dynamically determined number based on the partitioning of the mesh. But while these numbers are "dynamic" in the sense that they cannot be constant-folded into the code at compile-time, they are usually stable in that there are (typically, though not always) a small number of values actually passed into them at runtime.

I am open to alternatives, but they need to support this level of expressiveness and not make unreasonable assumptions of what the compiler is able to determine (i.e., we will never be able to do a static analysis that enumerates all the cross-products a given task may be called with).

import "regent"

fspace fs {
  value : int,
  color1 : int1d,
  color2 : int1d,
  -- ...
}

task leaf_task(r : region(ispace(int1d), fs), cp_point : int2d)
where reads writes(r.value) do
  for i in r do
    r[i].value += cp_point.x * cp_point.y
  end
end

task inner_task(r : region(ispace(int1d), fs),
                p1 : partition(disjoint, r, ispace(int1d)),
                p2 : partition(disjoint, r, ispace(int1d)),
                cp : cross_product(p1, p2),
                cp_space : ispace(int2d))
where reads writes(r.value) do
  __demand(__index_launch)
  for i in cp_space do
    leaf_task(cp[i.x][i.y], i)
  end
end

task main()
  var r = region(ispace(int1d, 10), fs)
  fill(r.value, 0)
  for i in r do
    r[i].color1 = i/2
    r[i].color2 = i%2
  end
  var p1 = partition(r.color1, ispace(int1d, 5))

  var N = 2
  for n = 0, N do
    var p2 = partition(r.color2, ispace(int1d, 2))
    var cp = cross_product(p1, p2)
    var cp_space = ispace(int2d, {5, 2})
    inner_task(r, p1, p2, cp, cp_space)
  end
end
regentlib.start(main)

from legion.

lightsighter commented on July 17, 2024

There are two purposes of projection functors (in decreasing order of priority):

Provide a way for Legion to recognize when multiple different index space task operations are using the same sets of data.
Provide a portable name for how to compute projections across multiple processes (without needing to resort to dlsym chicanery).

If we didn't need to do (1) then you can imagine a whole spectrum of possible solutions ranging from registering a brand-new projection pure functor for every single dynamic index space operation all the way to the other extreme of having exactly one projection functor which can introspect the entire heap (and even some parts of remote heaps through the runtime API) to decide how to perform a projection. However, we don't live in that world. It's very important that whatever solution anybody proposes makes (1) a priority and forces users to grapple with their ability to "compress" the description of sets accessed by multiple projection functors across index space operations.

So far I've prioritized (1) over ease of use and I still think that is the right trade-off because I'm tired of users not using Legion correctly and then having to explain to them why their performance is bad so I'm just protecting myself. That doesn't mean we shouldn't still support the "uncompressible" cases, but I don't want to make it easy for that to be the default that everyone jumps to.

The other constraint I will put on the design is that Legion will not be in the business of providing closures. I've tentatively proposed to @elliottslaughter that I would be willing to support the following API:

class ProjectionFunctor {
...
     virtual LogicalRegion project(const DomainPoint &point, const void *args, size_t arglen) = 0;
...
};

It would be up to users to build their own closures for each and every region requirement to use it and Legion will use the "buffer of bits" as a key in a memoization table for determining when it is safe to replay the results. I still feel like most users are going to default to this even when they actually can compress their description of region tree accesses without needing it, but I'm tired of arguing.

But it does not know either the identity of the top-level partition p1 or the identity of any of the second-level partitions.

This part is confusing to me. It doesn't need to know the identity of the partitions; it just needs to know the names of the colors. Why not just have a "template" (not a literal C++ template) and instantiate it and register it right before dispatching the task if you haven't encountered the colors before? (This is what Legate does.)

But inside of a projection functor, Legion provides no way to get access to the L-1 colors.

This part is not true. All users have total access to the runtime and the context-free methods for traversing the region tree to explore and it. You can get every single node in the region tree and each of their colors to aid in your decision making if you had to.

Just to be clear, multi-level projection functors are only one case where we close over data in a projection functor. This is pretty fundamental to a lot of use cases in Regent (e.g., periodic boundary conditions, where you may need to do something like p[i + 1 % N] where N is a dynamically determined number based on the partitioning of the mesh.

This is definitely a compressible case. I can easily write a projection functor that handles this as a function of the launch domain which will also be parametrized on N. Special cases that are "regular" like this are also "compressible". Yes, there will be branches in the code for your projection functor, but it will still be pure functional with the existing arguments.

from legion.

opensdh commented on July 17, 2024

It's user visible and doesn't have anything to do with the implementation inside the runtime.

This is of course technically true, but it is quite a creative use of an index. It calls to mind situations where I could tolerate some interface design until I tried writing the user-facing documentation for it, at which point it became unconscionably embarrassing.

Legion will not be in the business of providing closures

What exactly does this mean? Dynamic registration of projection functors does support them being closures, albeit with this very issue with destroying them. The "tentative[] API" would require Legion to maintain, copy, and compare the byte-buffers rather than just using the address of such a closure, although maybe the coordination cost of dynamic registration is greater than the cost of such copying and comparison.

instantiate it and register it right before dispatching the task if you haven't encountered the colors before?

That's basically what we did first (although we just reused the functor for several related objects we knew used the same pattern rather than maintaining a global cache). It's those last-minute registrations that leak memory (though perhaps not on a practically relevant scale) and raise the issue that "dynamic projection functors have even higher cost", but they certainly do accomplish the goal of "recogniz[ing] when multiple different index space task operations are using the same sets of data".

from legion.

lightsighter commented on July 17, 2024

at which point it became unconscionably embarrassing.

I'm not embarrassed; the semantics are obvious.

What exactly does this mean?

It means I'm not going to provide an overload of the ProjectionFunctor::project method for every single use case that some random user decides to propose.

from legion.

opensdh commented on July 17, 2024

I don't think anyone has proposed any new overloads except for the "tentative[] API". I know that I just wanted either to delete these objects (in an appropriately deferred fashion) or to not have to write 38 lines of code to get the relevant const RegionRequirement& for my unique such object. I don't care which one of those we pursue; that choice should be made to best support the memoization that Legion needs to do.

Actually, now I remember that I suggested adding const RegionRequirement& overloads to avoid having to have a convenience helper, but I'm willing to believe that the index has other purposes such that that's a bad idea (and so I forgot about it already).

from legion.

lightsighter commented on July 17, 2024

Was about to quote you back to yourself. 😉

that choice should be made to best support the memoization that Legion needs to do.

I don't actually care that it is the best memoization for the implementation. I just want one memoization that is conducive to getting users to describe the reuse as much as possible.

from legion.

opensdh commented on July 17, 2024

I don't actually care that it is the best memoization for the implementation. I just want one memoization that is conducive to getting users to describe the reuse as much as possible.

I think we're in violent agreement here: "best support" was meant to include "users can naturally provide the relevant information" as well as "the implementation can use the information effectively". Still, we want the implementation to be efficient; do you have a theory as to whether dynamic projection functor registration (and pointer comparisons) or static (and single) functor registration (and associated copies and comparisons of the state data, which in our case is typically a few bytes per node or up to a few bytes per core) is more desirable in that regard?

from legion.

lightsighter commented on July 17, 2024

Static registration and more dynamic state data that happens to turn out the same is probably going to be better for the runtime. Part of the 'key' in the memoization is always going to be projection functor ID. The runtime isn't going to do comparisons of the regions used by different projection functors. Effectively we want a cheap test: if two index space operations are using the same functor, with the same upper bound region/partition, and same state data then we can safely conclude that they are using the same sets of logical regions.

from legion.

opensdh commented on July 17, 2024

This direction sounds reasonable; we have already rewritten our functor to use the set_projection_args approach.

It's worth pointing out that, just as we would otherwise have a theoretically unbounded number of functors, we will have a somewhat smaller but still unbounded number of distinct state-data blobs. Legion would have to manage the lifetime of the cache, of course; I'd be glad to announce to Legion the beginning and end of the lifetime of each "reference" to a given blob-value if that helped. (I say reference because we could have overlapping lifetimes of two different usages of the same sequence of bytes.)

If we do intend for people to use this separate state data in preference to separate functors, it would be good to document that; perhaps ideally we would deprecate having state in a ProjectionFunctor at all (in which case this issue per se could be closed), although maybe someone needs much more data than we do and can't afford to serialize it for the blob approach.

I do still find it surprising that fetching the relevant const RegionRequirement& to gain access to the blob is so complicated, but additional documentation here (at least to say that each suitable Mappable object has an associated sequence of RegionRequirement objects into which the appropriate index refers) would certainly be sufficient to make it usable.

from legion.

ProjectionFunctor destruction about legion HOT 25 OPEN

Comments (25)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	const void* get_projection_args(size_t *size) const;
	void set_projection_args(const void *args, size_t size, bool own = false);

	virtual MappableType get_mappable_type(void) const = 0;
	virtual const Task* as_task(void) const { return NULL; }
	virtual const Copy* as_copy(void) const { return NULL; }
	virtual const InlineMapping* as_inline(void) const { return NULL; }
	virtual const Acquire* as_acquire(void) const { return NULL; }
	virtual const Release* as_release(void) const { return NULL; }
	virtual const Close* as_close(void) const { return NULL; }
	virtual const Fill* as_fill(void) const { return NULL; }
	virtual const Partition* as_partition(void) const { return NULL; }
	virtual const MustEpoch* as_must_epoch(void) const { return NULL; }

	var mappable_type = c.legion_mappable_get_type(mappable)
	if mappable_type == c.TASK_MAPPABLE then
	var task = c.legion_mappable_as_task(mappable)
	[requirement] = c.legion_task_get_requirement(task, idx)
	elseif mappable_type == c.COPY_MAPPABLE then
	var copy = c.legion_mappable_as_copy(mappable)
	[requirement] = c.legion_copy_get_requirement(copy, idx)
	elseif mappable_type == c.FILL_MAPPABLE then
	var fill = c.legion_mappable_as_fill(mappable)
	std.assert(idx == 0, "projection index for fill is not zero")
	[requirement] = c.legion_fill_get_requirement(fill)
	elseif mappable_type == c.INLINE_MAPPABLE then
	var mapping = c.legion_mappable_as_inline_mapping(mappable)
	std.assert(idx == 0, "projection index for inline mapping is not zero")
	[requirement] = c.legion_inline_get_requirement(mapping)
	else
	std.assert(false, "unhandled mappable type")
	end