Goal Decide upon the way forward for building new family types and

The Collect, Extract and Integration chain about pyblish-magenta HOT 8 CLOSED

pyblish commented on July 4, 2024

The Collect, Extract and Integration chain

from pyblish-magenta.

Comments (8)

mottosso commented on July 4, 2024

It's a broad topic, here's some broad thoughts.

The original intent of extraction was always to perform serialisation only, and not involve itself with location or interaction with databases. In this case, extracting into a temporary directory and passing this directory on to integration is exactly aligned with this.

Integration then is the complete opposite. It doesn't do any generation of data on it's own, but merely "mediates" the data, and aligns it with the overall pipeline.

Where Collection represents the "input" of a processing graph, Integration then is the "output". Inbetween, data may "fan out", become divided into smaller tasks, but in the end, it must all pass through integration, i.e. "fan in", if the content is to ever see the light of day.

We also want to implement versioning. So that could be additional required data.

Canonically, no process should ever know about existing assets or the state of existing assets until it comes to integration. In the case of versioning, which requires knowledge about which is the currently highest version in order to increment it, this would have to happen solely during integration.

This means that an integrator is free to not only produce final outputs, but also communicate and gather information (unrelated to validation and extraction) in order to make it's final decision. An integrator is always assumed to be right, so no validation is ever required here, nor serialisation. Which in most cases should converge into plain file-copying and persistence of data within each Instance and/or Context.

Though this will always override it for any instance that has been Collected, which might be more annoying than what we gain from removing this duplicity in code.

Not sure how you mean here, but if you mean that the first instance will create a temporary directory, whereas subsequent instances would be written to an already existing temporary directory, than that's perfectly fine and intended.

The temporary directory is much like Git's "staging area" in that it holds an arbitrary amount of information, but does so temporarily until it all is converged, or integrated, with the rest of the data.

from pyblish-magenta.

BigRoy commented on July 4, 2024

Canonically, no process should ever know about existing assets or the state of existing assets until it comes to integration. In the case of versioning, which requires knowledge about which is the currently highest version in order to increment it, this would have to happen solely during integration.

This means that an integrator is free to not only produce final outputs, but also communicate and gather information (unrelated to validation and extraction) in order to make it's final decision. An integrator is always assumed to be right, so no validation is ever required here, nor serialisation. Which in most cases should converge into plain file-copying and persistence of data within each Instance and/or Context.

Why would it be up to the Integrator to acquire the data (eg. about the current highest version) as opposed to the Collector?

This would also limit Validations (eg. for versioning) like this: https://github.com/mkolar/pyblish-kredenc/blob/master/plugins/common/validate_version_number.py

I feel it might be nice to have the Selector provide data about the current highest published version of the asset. I was thinking about having an Integrator ordered -0.1 that is toggled off by default for Increment Version. Only if this is toggled on will it Incrementally Publish. It's up to the artist to ensure the changes he made won't break anything. What do you think?

Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does.

from pyblish-magenta.

mottosso commented on July 4, 2024

Why would it be up to the Integrator to acquire the data (eg. about the current highest version) as opposed to the Collector?

Because it isn't related to the quality of what you are outputting. If a version on disk is faulty, then that is a fault carried over from a previous publish.

Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does.

Sure, I'll have a look at this.

from pyblish-magenta.

mottosso commented on July 4, 2024

Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does.

I've mocked up an example for you here.

https://gist.github.com/mottosso/863e97d6f9d08a0d9eee

from pyblish-magenta.

mottosso commented on July 4, 2024

I feel it might be nice to have the Selector provide data about the current highest published version of the asset. I was thinking about having an Integrator ordered -0.1 that is toggled off by default for Increment Version. Only if this is toggled on will it Incrementally Publish. It's up to the artist to ensure the changes he made won't break anything. What do you think?

It would be nice and convenient, but also break encapsulation. Think about it. That data doesn't need validation, it has already been saved to disk. The damage is already done.

Furthermore, that data isn't part of what an artist has produced, it's part of what previous Integrators have produced. If anyone should be warned about an invalid version or bad naming convention on already written files, it should be the developer who produced the integrator.

from pyblish-magenta.

BigRoy commented on July 4, 2024

It would be nice and convenient, but also break encapsulation. Think about it. That data doesn't need validation, it has already been saved to disk. The damage is already done.

This isn't correct. The damage wouldn't have been done if the Validator catches it before Extraction. Plus it won't even be in the 'damaging' position if it would have Validated after Extraction. It would only be stored in the temporary location.

I think it's not that we're validating whether previous extractions went alright, but whether the version we are integrating now is up to par with our requirements.

Though as you state it's definitely not up to the artist to provide where it would go towards, unless there's user-defined data that influences "as what type of data it gets extracted". A good example could be publishing shader variations (which we do a lot in our pipeline). For example we build a red, blue and yellow bottle of wine. Each individual variation (for a single asset) could be Validated whether it's named correctly or already existing, etc. The point being that when a user can interact with data which influences Integration we want it to get validated because it's prone to human error.

But I think it's good to see where the ship leads us if we keep it purely implemented in Integration.

I've mocked up an example for you here.
https://gist.github.com/mottosso/863e97d6f9d08a0d9eee

Some questions that come to mind:

How do we let the Extractors extract to the correct temporary location without having to redesign Extractors per pipeline? Should we add a Selector that sets up extractDir data? (Or an Extractor that is ordered -0.1, whatever makes more sense). Do we let multiple Extractors extract to the same directory? If so, what do we do on naming conflicts? Or how do we ensure there are no naming conflicts?
What data do we provide so that integrator knows how to rename a file in the end. This is partially dependent on the structure for how we we want files to be integrated. Do we smash it all into a single published folder for an asset?

from pyblish-magenta.

mottosso commented on July 4, 2024

This isn't correct. The damage wouldn't have been done if the Validator catches it before Extraction.

Are we talking about looking at existing files on disk, and validating whether those files are valid, during the publish of a new file?

Here's what I'm hearing.

MyAsset
├── publish
│   ├── myasset_v001.ma
│   ├── myasset_v002.ma
│   └── myasset_v003.ma
└── dev

When we're about to publish MyAsset once more, it would then create myasset_v004.ma.

You would like to (1) include myasset_v001-3.ma during collection of MyAsset, and (2) validate these versions? I'm sure this isn't what you mean.

A good example could be publishing shader variations (which we do a lot in our pipeline). For example we build a red, blue and yellow bottle of wine.

I guarantee you that there is a better way to solve this exact thing which doesn't involve integration to be validated.

I invite you to produce this asset in the \Pyblish\_sandbox\magenta directory and I'll gladly walk you through how this can happen without complicating integration.

How do we let the Extractors extract to the correct temporary location without having to redesign Extractors per pipeline? Do we let multiple Extractors extract to the same directory? If so, what do we do on naming conflicts? Or how do we ensure there are no naming conflicts?

Yes, that's right, multiple extractors write to the same directory. That's what this is doing. The directory is a generic staging area, each extractor could create it's own little subdirectory if needed, but in general, the data each extractor produces should be unique enough to not need to do that.

The way I handled this in Napoleon was to create one subdirectory per family, and typically only extracted a single family via single extractor.

What data do we provide so that integrator knows how to rename a file in the end.

It depends on what file we're talking about.

Let's take the model from ben in The Deal as an example. Ben is extracted as e.g. ben.mb, his parent (temporary) directory is stored in his instance as e.g. commitDir.

/tmp
└── ben.mb

In this case, an integrator with support for model families would come to expect models to be stored in this manner, a name and suffix and could simply move this exact file into the appropriate directory and give it an appropriate name.

In case a playblast and gif is also present..

/tmp
├── ben.mov
├── ben.gif
└── ben.mb

The integrator will now need to support gifs and playblasts to properly manage their final locations, and when it does will know what to do with files in whichever format they are expected to reside in, for example, it could make the distinction based on their suffix.

So you see there needs to be an interplay between extractors and integrators. There needs to be an "API" or "contract" which they have both agreed to. Any extractor going rouge to produce things an integrator isn't expecting, will simply not get integrated. No harm done.

from pyblish-magenta.

BigRoy commented on July 4, 2024

So much has changed since this discussion and I'm not even sure how to "relate" this to the current state of Magenta. If this is relevant I think it would be great to see it outlined briefly what exactly we need to fix or add, otherwise close the discussion.

from pyblish-magenta.

The Collect, Extract and Integration chain about pyblish-magenta HOT 8 CLOSED

Comments (8)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent