datumbox / dapi-model-versioning Goto Github PK
View Code? Open in Web Editor NEWRFC for Model Versioning across all PyTorch Domain libraries
RFC for Model Versioning across all PyTorch Domain libraries
(Opening a new issue so as not to mix the ongoing discussion)
I feel that separating dataclass
and Enum
make the code more readable.
@dataclass
class ResNet50Weights(Enum):
ImageNet1K_RefV1 = (
url="https://path/to/weights.pth", # Weights URL/path
transforms=partial(ImageNetPreprocessing, width=224), # Preprocessing transform constructor
meta={"num_classes": 1000, "Acc@1": 76.130, "classes": [...]} # Arbitrary Meta-Data
# Other customizable fields go here
)
@dataclass
class ResNet50Config:
url: str
transforms: Any
meta: meta: Dict[str, Any]
latest: bool
class ResNet50Weights(Enum):
ImageNet1K_RefV1 = ResNet50Config(...)
...
Overall, it looks good. Introducing the dedicated class and adding preprocessing transform feels a major improvement.
The following is my thoughts reading through the README and examples.
Weights
class? I think the proposition can also be an idiom of best practice
for the flexibility, which different project can implement without having a parent class.
check_type
method is nice but it is simple enough (isinstance
) to perform the check. Also what do you think of allowing the customize the behavior with duck-typing, say, in-house models?latest: bool
feels a bit tedious for maintenance. When I add a latest model, I would like to be care-free about the previously-latest model. Since only one of them is supposed to be `True, how about simply making it a hard-coded class attribute?latest: bool
might not synthesize well with audio. Say, I have a model architecture for Speech Recognition, I can have multiple of SOTA/latest models because of language / expected environments. (Note it is common to train and deploy multiple models with the same architecture, so that one is optimized for meeting room dictation, another is optimized for phone-conversation etc...)Weights
feels slightly off because transforms
not only contains weights but also defines the preprocessing operations as well.state_dict
method should accept **kwargs
, which will be passed to load_state_dict_from_url
, so that the downloading process can be customized. (i.e. download location and such)partial
is used for the definition of transforms
. I think this is good, because it will not instantiate an object. But every maintainer needs to be careful not to accidentally instantiate a transform here, and maintainers have to remember it at code review.One of the dominant scenario for text is to use some pre-trained encoder (Roberta, BERT, XLMR etc) and attach task specific head on top of it (classification head, Language modeling head, POS tagging head, Q&A head etc). I believe this is also true for Vision as well (as well as to audio @mthrok ?). To the best of my knowledge (please correct me if I am am mistaken), vision currently provides factory function for every possible combination there-of? This approach is somewhat limiting in terms of scalability and boiler-plate code over-head that comes with it. Also versioning could be bit redundant if we replicate same weights class across each combination for the encoder part.
I wonder what folks think about extending this framework to support model composition?
As a reference HF also explicitly provide classes for every combination. Here is one example for Roberta Encoder + Q&A task.
One of the common cases in text is to define base model architecture and create bigger version just by increasing the number of parameters in terms of number of layers, hidden dimensions etc. Take XLMR Model for instance. There are four variations of the model dubbed as "xlmr.base", "xlmr.large", "xlmr.xl", and "xlmr.xxl"
One way to provide these models to users is to have 4 different factory functions for each one of them. But the code is highly redundant since the only different here is the input configuration. One of the better ways would be to encode this information directly inside Weights Enum, such that user facing function only need to specify which weights to use, and internally model factory function will create the corresponding architecture for the user.
I wonder if conceptually meta
argument is the right place to specify model configuration, or is it only reserved for informative attributes?
A question on this weights API and its extension to other domains. For example, in medical AI people can train resnet50 on their medical datasets and probably would like to have something like
@dataclass
class ResNet50Weights(Enum):
MedNist_RefV1 = (
url="...", # Weights URL/path
# Other customizable fields go here
)
without any imagenet or cifar10 weights.
A rather simple suggestion to such users could be to create something new like:
@dataclass
class MedResNet50Weights(Enum):
MedNist_RefV1 = (
url="...", # Weights URL/path
# Other customizable fields go here
)
without any relationship to implemented ResNet50Weights
referencing to ImageNet/Cifar10.
@datumbox what are your thoughts on that ?
Say, I want to do full-scratch training, while reusing all the components from the past training configuration. (transforms and architecture with model hyper parameters)
I assume this is doable if I pass the correct configuration to one of the factory function manually.
Is there an easy way to do this?
Taking text as example, the transforms may need to download meta-data for it's construction. For example, XLMR transforms requires spm tokenizer model and corresponding vocab to create xlmr preset transform. What's the recommended way to store those URLs? Should we create a callable that is generic i'e accepts URLs to spm model and vocab and create partial function inside each Enum object, with dedicated URLs for corresponding model preset transform? This way the initialization could be lazy.
Following the point 3
"Code change which affects the model behaviour but architecture remains the same (BC-breaking)"
and
The Weights data class which stores crucial information about the pre-trained weights.
I wonder if weight info should not be more explicit and even warn user/raise error if user's software (pytorch, torchvision etc) packages versions are not respecting conditions to reuse the weights ?
By conditions I mean, weight info could explicitly provide that, for example, ImageNet weights are possible to use with
pytorch >= 1.5,<=2.0 and torchvision>=0.8,<=1.0
I share the concern here with @mthrok raised in #1 .
It is not uncommon to have post-processing or more precisely decoding schemes in text. This is typically the case when dealing with some kind of text generation task (translation, summarization etc) where we need to convert predicted token ids into corresponding tokens. I wonder what's the recommended way of doing this.
Could there be a value in encapsulating this inside a transform class whose __call__
method implement pre-processing and the class consists or special method to perform decoding/post-processing etc?
Also in case of translation, the transforms may not be fixed w.r.t model but require some configuration input from user. For example, now we have universal models that can convert back and forth among 100 different languages. But when it comes to transforms, user would need to explicitly specify which language pair they want to work with, so that the corresponding encoding/decoding schemes can be instantiated. My understanding so far is that these transforms are static w.r.t to corresponding model. If so, In what way the proposed API can be extended to accommodate user-configurable transforms?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.