datumbox / dapi-model-versioning Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 252 KB

RFC for Model Versioning across all PyTorch Domain libraries

Python 100.00%

dapi-model-versioning's People

Contributors

Stargazers

Watchers

dapi-model-versioning's Issues

Enum and dataclasses

(Opening a new issue so as not to mix the ongoing discussion)

I feel that separating dataclass and Enum make the code more readable.

@dataclass
class ResNet50Weights(Enum):
    ImageNet1K_RefV1 = (
        url="https://path/to/weights.pth",  # Weights URL/path
        transforms=partial(ImageNetPreprocessing, width=224),  # Preprocessing transform constructor
        meta={"num_classes": 1000, "Acc@1": 76.130, "classes": [...]}  # Arbitrary Meta-Data
        # Other customizable fields go here
    )

@dataclass
class ResNet50Config:
    url: str
    transforms: Any
    meta: meta: Dict[str, Any]
    latest: bool


class ResNet50Weights(Enum):
    ImageNet1K_RefV1 = ResNet50Config(...)
    ...

My feedback and thoughts

Overall, it looks good. Introducing the dedicated class and adding preprocessing transform feels a major improvement.
The following is my thoughts reading through the README and examples.

Major points

Do we want to have parent the Weights class? I think the proposition can also be an idiom of best practice for the flexibility, which different project can implement without having a parent class.
- check_type method is nice but it is simple enough (isinstance) to perform the check. Also what do you think of allowing the customize the behavior with duck-typing, say, in-house models?
- The treatment of latest: bool feels a bit tedious for maintenance. When I add a latest model, I would like to be care-free about the previously-latest model. Since only one of them is supposed to be `True, how about simply making it a hard-coded class attribute?
The idea of latest: bool might not synthesize well with audio. Say, I have a model architecture for Speech Recognition, I can have multiple of SOTA/latest models because of language / expected environments. (Note it is common to train and deploy multiple models with the same architecture, so that one is optimized for meeting room dictation, another is optimized for phone-conversation etc...)
What if a model needs post-processing? Is the current framework extendable? For example, when the preprocessing is FFT, then post-processing can be the inverse of the specific FFT applied during the preprocessing.

Minor points

The naming Weights feels slightly off because transforms not only contains weights but also defines the preprocessing operations as well.
state_dict method should accept **kwargs, which will be passed to load_state_dict_from_url, so that the downloading process can be customized. (i.e. download location and such)
I found it so hard to make Sphinx work well with Enum. Did you try?
Currently, partial is used for the definition of transforms. I think this is good, because it will not instantiate an object. But every maintainer needs to be careful not to accidentally instantiate a transform here, and maintainers have to remember it at code review.

One of the dominant scenario for text is to use some pre-trained encoder (Roberta, BERT, XLMR etc) and attach task specific head on top of it (classification head, Language modeling head, POS tagging head, Q&A head etc). I believe this is also true for Vision as well (as well as to audio @mthrok ?). To the best of my knowledge (please correct me if I am am mistaken), vision currently provides factory function for every possible combination there-of? This approach is somewhat limiting in terms of scalability and boiler-plate code over-head that comes with it. Also versioning could be bit redundant if we replicate same weights class across each combination for the encoder part.

I wonder what folks think about extending this framework to support model composition?

As a reference HF also explicitly provide classes for every combination. Here is one example for Roberta Encoder + Q&A task.

Model Architecture Configuration specification

One of the common cases in text is to define base model architecture and create bigger version just by increasing the number of parameters in terms of number of layers, hidden dimensions etc. Take XLMR Model for instance. There are four variations of the model dubbed as "xlmr.base", "xlmr.large", "xlmr.xl", and "xlmr.xxl"

One way to provide these models to users is to have 4 different factory functions for each one of them. But the code is highly redundant since the only different here is the input configuration. One of the better ways would be to encode this information directly inside Weights Enum, such that user facing function only need to specify which weights to use, and internally model factory function will create the corresponding architecture for the user.

I wonder if conceptually meta argument is the right place to specify model configuration, or is it only reserved for informative attributes?

API extension to other areas of research or datasets

A question on this weights API and its extension to other domains. For example, in medical AI people can train resnet50 on their medical datasets and probably would like to have something like

@dataclass
class ResNet50Weights(Enum):
    MedNist_RefV1 = (
        url="...",  # Weights URL/path
        # Other customizable fields go here
    )

without any imagenet or cifar10 weights.

A rather simple suggestion to such users could be to create something new like:

@dataclass
class MedResNet50Weights(Enum):
    MedNist_RefV1 = (
        url="...",  # Weights URL/path
        # Other customizable fields go here
    )

without any relationship to implemented ResNet50Weights referencing to ImageNet/Cifar10.

@datumbox what are your thoughts on that ?

What if I want to get the architecture with uninitialized model?

Say, I want to do full-scratch training, while reusing all the components from the past training configuration. (transforms and architecture with model hyper parameters)
I assume this is doable if I pass the correct configuration to one of the factory function manually.
Is there an easy way to do this?

Transforms URL hndling

Taking text as example, the transforms may need to download meta-data for it's construction. For example, XLMR transforms requires spm tokenizer model and corresponding vocab to create xlmr preset transform. What's the recommended way to store those URLs? Should we create a callable that is generic i'e accepts URLs to spm model and vocab and create partial function inside each Enum object, with dedicated URLs for corresponding model preset transform? This way the initialization could be lazy.

Explicit software versions in weight references

Following the point 3

"Code change which affects the model behaviour but architecture remains the same (BC-breaking)"

and

The Weights data class which stores crucial information about the pre-trained weights.

I wonder if weight info should not be more explicit and even warn user/raise error if user's software (pytorch, torchvision etc) packages versions are not respecting conditions to reuse the weights ?

By conditions I mean, weight info could explicitly provide that, for example, ImageNet weights are possible to use with
pytorch >= 1.5,<=2.0 and torchvision>=0.8,<=1.0

Post-processing transforms and transforms configuration

I share the concern here with @mthrok raised in #1 .

It is not uncommon to have post-processing or more precisely decoding schemes in text. This is typically the case when dealing with some kind of text generation task (translation, summarization etc) where we need to convert predicted token ids into corresponding tokens. I wonder what's the recommended way of doing this.

Could there be a value in encapsulating this inside a transform class whose __call__ method implement pre-processing and the class consists or special method to perform decoding/post-processing etc?

Also in case of translation, the transforms may not be fixed w.r.t model but require some configuration input from user. For example, now we have universal models that can convert back and forth among 100 different languages. But when it comes to transforms, user would need to explicitly specify which language pair they want to work with, so that the corresponding encoding/decoding schemes can be instantiated. My understanding so far is that these transforms are static w.r.t to corresponding model. If so, In what way the proposed API can be extended to accommodate user-configurable transforms?

datumbox / dapi-model-versioning Goto Github PK

dapi-model-versioning's People

Contributors

Stargazers

Watchers

dapi-model-versioning's Issues

Enum and dataclasses

My feedback and thoughts

Major points

Minor points

Model Compositionality

Model Architecture Configuration specification

API extension to other areas of research or datasets

What if I want to get the architecture with uninitialized model?

Transforms URL hndling

Explicit software versions in weight references

Post-processing transforms and transforms configuration

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent