Code Monkey home page Code Monkey logo

Comments (8)

mottosso avatar mottosso commented on July 4, 2024

It's a broad topic, here's some broad thoughts.

The original intent of extraction was always to perform serialisation only, and not involve itself with location or interaction with databases. In this case, extracting into a temporary directory and passing this directory on to integration is exactly aligned with this.

Integration then is the complete opposite. It doesn't do any generation of data on it's own, but merely "mediates" the data, and aligns it with the overall pipeline.

Where Collection represents the "input" of a processing graph, Integration then is the "output". Inbetween, data may "fan out", become divided into smaller tasks, but in the end, it must all pass through integration, i.e. "fan in", if the content is to ever see the light of day.

We also want to implement versioning. So that could be additional required data.

Canonically, no process should ever know about existing assets or the state of existing assets until it comes to integration. In the case of versioning, which requires knowledge about which is the currently highest version in order to increment it, this would have to happen solely during integration.

This means that an integrator is free to not only produce final outputs, but also communicate and gather information (unrelated to validation and extraction) in order to make it's final decision. An integrator is always assumed to be right, so no validation is ever required here, nor serialisation. Which in most cases should converge into plain file-copying and persistence of data within each Instance and/or Context.

Though this will always override it for any instance that has been Collected, which might be more annoying than what we gain from removing this duplicity in code.

Not sure how you mean here, but if you mean that the first instance will create a temporary directory, whereas subsequent instances would be written to an already existing temporary directory, than that's perfectly fine and intended.

The temporary directory is much like Git's "staging area" in that it holds an arbitrary amount of information, but does so temporarily until it all is converged, or integrated, with the rest of the data.

from pyblish-magenta.

BigRoy avatar BigRoy commented on July 4, 2024

Canonically, no process should ever know about existing assets or the state of existing assets until it comes to integration. In the case of versioning, which requires knowledge about which is the currently highest version in order to increment it, this would have to happen solely during integration.

This means that an integrator is free to not only produce final outputs, but also communicate and gather information (unrelated to validation and extraction) in order to make it's final decision. An integrator is always assumed to be right, so no validation is ever required here, nor serialisation. Which in most cases should converge into plain file-copying and persistence of data within each Instance and/or Context.

Why would it be up to the Integrator to acquire the data (eg. about the current highest version) as opposed to the Collector?

This would also limit Validations (eg. for versioning) like this: https://github.com/mkolar/pyblish-kredenc/blob/master/plugins/common/validate_version_number.py

I feel it might be nice to have the Selector provide data about the current highest published version of the asset. I was thinking about having an Integrator ordered -0.1 that is toggled off by default for Increment Version. Only if this is toggled on will it Incrementally Publish. It's up to the artist to ensure the changes he made won't break anything. What do you think?


Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does.

from pyblish-magenta.

mottosso avatar mottosso commented on July 4, 2024

Why would it be up to the Integrator to acquire the data (eg. about the current highest version) as opposed to the Collector?

Because it isn't related to the quality of what you are outputting. If a version on disk is faulty, then that is a fault carried over from a previous publish.

Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does.

Sure, I'll have a look at this.

from pyblish-magenta.

mottosso avatar mottosso commented on July 4, 2024

Either way. I would love to see a simple pseudocode example on what the Collector does, what the Extractor does and what the Integrator does.

I've mocked up an example for you here.

from pyblish-magenta.

mottosso avatar mottosso commented on July 4, 2024

I feel it might be nice to have the Selector provide data about the current highest published version of the asset. I was thinking about having an Integrator ordered -0.1 that is toggled off by default for Increment Version. Only if this is toggled on will it Incrementally Publish. It's up to the artist to ensure the changes he made won't break anything. What do you think?

It would be nice and convenient, but also break encapsulation. Think about it. That data doesn't need validation, it has already been saved to disk. The damage is already done.

Furthermore, that data isn't part of what an artist has produced, it's part of what previous Integrators have produced. If anyone should be warned about an invalid version or bad naming convention on already written files, it should be the developer who produced the integrator.

from pyblish-magenta.

BigRoy avatar BigRoy commented on July 4, 2024

It would be nice and convenient, but also break encapsulation. Think about it. That data doesn't need validation, it has already been saved to disk. The damage is already done.

This isn't correct. The damage wouldn't have been done if the Validator catches it before Extraction. Plus it won't even be in the 'damaging' position if it would have Validated after Extraction. It would only be stored in the temporary location.

I think it's not that we're validating whether previous extractions went alright, but whether the version we are integrating now is up to par with our requirements.

Though as you state it's definitely not up to the artist to provide where it would go towards, unless there's user-defined data that influences "as what type of data it gets extracted". A good example could be publishing shader variations (which we do a lot in our pipeline). For example we build a red, blue and yellow bottle of wine. Each individual variation (for a single asset) could be Validated whether it's named correctly or already existing, etc. The point being that when a user can interact with data which influences Integration we want it to get validated because it's prone to human error.

But I think it's good to see where the ship leads us if we keep it purely implemented in Integration.


I've mocked up an example for you here.
https://gist.github.com/mottosso/863e97d6f9d08a0d9eee

Some questions that come to mind:

  1. How do we let the Extractors extract to the correct temporary location without having to redesign Extractors per pipeline? Should we add a Selector that sets up extractDir data? (Or an Extractor that is ordered -0.1, whatever makes more sense). Do we let multiple Extractors extract to the same directory? If so, what do we do on naming conflicts? Or how do we ensure there are no naming conflicts?
  2. What data do we provide so that integrator knows how to rename a file in the end. This is partially dependent on the structure for how we we want files to be integrated. Do we smash it all into a single published folder for an asset?

from pyblish-magenta.

mottosso avatar mottosso commented on July 4, 2024

This isn't correct. The damage wouldn't have been done if the Validator catches it before Extraction.

Are we talking about looking at existing files on disk, and validating whether those files are valid, during the publish of a new file?

Here's what I'm hearing.

MyAsset
├── publish
│   ├── myasset_v001.ma
│   ├── myasset_v002.ma
│   └── myasset_v003.ma
└── dev

When we're about to publish MyAsset once more, it would then create myasset_v004.ma.

You would like to (1) include myasset_v001-3.ma during collection of MyAsset, and (2) validate these versions? I'm sure this isn't what you mean.

A good example could be publishing shader variations (which we do a lot in our pipeline). For example we build a red, blue and yellow bottle of wine.

I guarantee you that there is a better way to solve this exact thing which doesn't involve integration to be validated.

I invite you to produce this asset in the \Pyblish\_sandbox\magenta directory and I'll gladly walk you through how this can happen without complicating integration.

How do we let the Extractors extract to the correct temporary location without having to redesign Extractors per pipeline? Do we let multiple Extractors extract to the same directory? If so, what do we do on naming conflicts? Or how do we ensure there are no naming conflicts?

Yes, that's right, multiple extractors write to the same directory. That's what this is doing. The directory is a generic staging area, each extractor could create it's own little subdirectory if needed, but in general, the data each extractor produces should be unique enough to not need to do that.

The way I handled this in Napoleon was to create one subdirectory per family, and typically only extracted a single family via single extractor.

What data do we provide so that integrator knows how to rename a file in the end.

It depends on what file we're talking about.

Let's take the model from ben in The Deal as an example. Ben is extracted as e.g. ben.mb, his parent (temporary) directory is stored in his instance as e.g. commitDir.

/tmp
└── ben.mb

In this case, an integrator with support for model families would come to expect models to be stored in this manner, a name and suffix and could simply move this exact file into the appropriate directory and give it an appropriate name.

In case a playblast and gif is also present..

/tmp
├── ben.mov
├── ben.gif
└── ben.mb

The integrator will now need to support gifs and playblasts to properly manage their final locations, and when it does will know what to do with files in whichever format they are expected to reside in, for example, it could make the distinction based on their suffix.

So you see there needs to be an interplay between extractors and integrators. There needs to be an "API" or "contract" which they have both agreed to. Any extractor going rouge to produce things an integrator isn't expecting, will simply not get integrated. No harm done.

from pyblish-magenta.

BigRoy avatar BigRoy commented on July 4, 2024

So much has changed since this discussion and I'm not even sure how to "relate" this to the current state of Magenta. If this is relevant I think it would be great to see it outlined briefly what exactly we need to fix or add, otherwise close the discussion.

from pyblish-magenta.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.