Comments (10)
Along that lines, I do think that the same people who would be interested in something like this, would be interested in syntactic sugar for #504, i.e. something like
file["field"][]
returns theNamedTuple
representation or any other barebone representation
That sounds nice. This syntax file["field"][]
sadly can't work, since file["field"]
already tries to load the regular struct before the second getindex
is called. Something like file["field", :plain]
could work. HDF5.jl
works with that kind of syntax.
from jld2.jl.
Along that lines, I do think that the same people who would be interested in something like this, would be interested in syntactic sugar for #504, i.e. something like file["field"][]
returns the NamedTuple
representation or any other barebone representation
from jld2.jl.
Something like file["field", :plain] could work.
that sounds nice!
from jld2.jl.
I like this idea of an optional bijection
.
The place to start, I'd say, is to define the expected behaviour and desired API style.
E.g.
- Do you want to implement a conversion function? Or give a list of fields to keep?
- How should it compose:
- only convert at toplevel (no nesting)
- Partial nesting: Require that the result named tuple only contains basic types and other (same) converted structures. Do not mix with regular structs.
- full nesting: recursively convert all instances
Potential problem: the conversion is not a bijection. Loading of (normal) structs with converted fields will not work. - What should it do to singleton structures ? These are encoded as
nothing
. (aka there's the type definition in the file but there's no actual data attached to it. So the info would be lost on reconstruction. - What are basic types ? ;) HDF5 has endless different number and string definitions. Also, e.g.
Dicts
are also implemented usingCustomSerialization
.
Another Quirk:
JLD2 typically inlines isbits
fields (eg. other structs) into parent structs.
This makes it difficult to make NT reconstruction do the right thing.
One idea:
Do normal JLD2 storage, dumping everything as normal.
Give JLD2 a list of Types to reconstruct and make everything else return NamedTuples.
Another thing to consider is #487 . Julia can construct arbitrarily large structs i.e. through structs with large ntuples as fields or through code generation. JLD2 has a hard limit on the size of the type description.
Last thing is type stability / compilation / run time.
Depending on the desired behaviour, one could also implement a generic JLD2.Compound
that is returned for all non-basic types and supports type-unstable getindex
or getproperty
for the fields.
from jld2.jl.
@koehlerson you probably didn't get notifications after I first commented with an incomplete message.
What are your thoughts?
To start playing around with ideas, one doesn't really need to work with JLD2 at all.
I think it would be sufficient to use something like
struct MockFile
d::Dict{String,Any}
end
and build an interface that does the desired things.
This could then be tested against.
from jld2.jl.
Hey sorry for the late reply!
Conversion function vs list of fields
Do you want to implement a conversion function? Or give a list of fields to keep?
So far, I only used some of the fields directly, but I can imagine that some people may want to compute something else before saving it. So, I think some kind of function would be nice to have a possible flexibility. Or maybe some folks want to compute a different representation for saving (thinking of #487 and the variety of array layouts, ML model layouts etc) compared to the layout that is present in the struct's field
Composability
How should it compose:
I'm not sure if I can follow 100% but what I would imagine is that either you have the bijection or you have the full nesting approach, i.e.
full nesting: recursively convert all instances
Why I'm unsure if I can follow is the following
Loading of (normal) structs with converted fields will not work.
Do you mean that you cannot recreate the struct that was saved? From my perspective this would be the desired behavior that you opt out of bijection and only save e.g. a NamedTuple with "crucial information" in a more simplistic way (primitives, isbitstype, ...) and if a user wants to rebuild something nothing is guaranteed and its up to the user to save as much as needed to rebuild on their own by some custom function rebuild(file,typename)
in their codebase? But maybe I don't understand the statement correctly or perhaps my view on this issue is too biased with my personal use case.
What should it do to singleton structures ? These are encoded as nothing. (aka there's the type definition in the file but there's no actual data attached to it. So the info would be lost on reconstruction.
This is a very good point, since we have in Ferrite.jl also some singletons that are exposed to the user. For me personally I'd be fine with saving a string, but I'm not sure if there is any downside to it.
What are basic types ? ;) HDF5 has endless different number and string definitions. Also, e.g. Dicts are also implemented using CustomSerialization.
Maybe the set of stuff that is supported by HDF5.jl ? https://juliaio.github.io/HDF5.jl/stable/#Supported-data-types If I understand correctly, then, what you are trying to say is that a Dict
is also deconstructed in some "across verison stable" way? If so, then it's of course possible to include that too. I guess somewhere in JLD2.jl are the custom serialization dispatches for these objects. Nonetheless supporting exactly what HDF5 supports feels somewhat robust, even though this is totally by gut and there is no rational argument behind from my side :D
Other remarks
Do normal JLD2 storage, dumping everything as normal.
Give JLD2 a list of Types to reconstruct and make everything else return NamedTuples.
This sounds nice, especially since this solves in my head somewhat a user experience problem. Usually I have one big simulation struct with the parameters and I want to reconstruct it, but everything else is okay to be a NamedTuple
especially intermediate "latent" results
Another thing to consider is #487 . Julia can construct arbitrarily large structs i.e. through structs with large ntuples as fields or through code generation. JLD2 has a hard limit on the size of the type description.
Does this mean that the ML model from the issue is serialized by JLD2 and the type parameters are too large which could be dodged by utilizing the "plain HDF5" approach with NamedTuple?
To start playing around with ideas, one doesn't really need to work with JLD2 at all.
I think it would be sufficient to use something like
That sounds nice, will do that as soon as you gave some feedback, because I'm quite unsure to what extend the ideas make sense. My thinking is probably a bit too narrow towards my specific problem, so, happy to hear other perspectives :)
from jld2.jl.
Why I'm unsure if I can follow is the following
Loading of (normal) structs with converted fields will not work.
Here's an example of a fundamental problem.
This already errors and would also error when you don't try to reconstruct types.
The only way to change this, I guess, would be to disallow non-concrete element types (except Any).
Here, the parent "structure" was just an array, but the same issue would appear for abstract type-restricted struct fields.
julia> struct N <: Real; x::Int; end
julia> arr = Real[1, 2, N(3)]
3-element Vector{Real}:
1
2
N(3)
julia> jldsave("test.jld2"; arr)
## new session
julia> load("test.jld2")
┌ Warning: type Main.N does not exist in workspace; reconstructing
└ @ JLD2 ~/.julia/dev/JLD2.jl/src/data/reconstructing_datatypes.jl:605
Error encountered while load FileIO.File{FileIO.DataFormat{:JLD2}, String}("test.jld2").
Fatal error:
ERROR: MethodError: Cannot `convert` an object of type JLD2.ReconstructedStatic{:N, (:x,), Tuple{Int64}} to an object of type Real
from jld2.jl.
I'm sorry, this is a complex topic and so my answers will be a bit disorganized.
In my view, there are few problems that could be addressed here.
-
Some objects have type signatures that JLD2 cannot store.
CustomSerialization
can not help here, because converts the data but still it tries to encode the originial type signature for loading. -
Some objects are (heavily nested) and immutable. JLD2 tries to inline immutable fields which yieds a struct that is too large for the HDF5 standard. (64kb is max for type description. This is what happens in #487.
Both (1) and (2) need different encoding in the file and if reconstruction is desired, some new way to encode the type signature. -
Some very basic julia objects such as
Vector{Real}
can be almost impossible to reconstruct in JLD2 currently. Anyone can define new subtypes that may be missing on load. It is not possible to detect this on the type-level and namedtuples won't fit be<: Real
. One could possibly try to reconstruct all abstract types e.g.Real
asAny
but someone would have to try. [ Quick background info: When loading, JLD2 retrieves all the type infos and generates fairly efficient and type-stable code that then loads the data from top to bottom. This is necessary to make things fast but inevitable fails when unexpected types pop up somewhere in between ]
from jld2.jl.
This sounds nice, especially since this solves in my head somewhat a user experience problem. Usually I have one big simulation struct with the parameters and I want to reconstruct it, but everything else is okay to be a NamedTuple especially intermediate "latent" results
A place to start experimenting might be in usability of typemap
.
referencing #504 , typemap section in the docs
Here is an experimental package I built at some point. It has some tooling for retrieving type info from JLD2 files.
This could be used to help generate the typemap
Dicts.
https://github.com/JonasIsensee/JLD3.jl/
The second (orthogonal) approach would be to implement a function that does all the conversions you can think of prior to handing it to JLD2.
from jld2.jl.
Please test out #522.
I can't say it's elegant but it worked for my test cases.
from jld2.jl.
Related Issues (20)
- File saved in Julia 1.10.0-beta3 cannot be loaded in Julia 1.10.0-rc1 with Random.Xoshiro HOT 5
- Don't map back to Julia composite type HOT 2
- Dict of mutable struct reconstruction fails with newer JLD2 versions HOT 1
- Out of disk space throws generic Bus Error HOT 3
- jldopen(filename, "w";) will fail in win7 HOT 2
- Saving vector of dictionaries initialized as #undef gives "This should not have happened" HOT 2
- error when serializing expression containing `Int128` HOT 4
- occasional corruption while loading `Matrix{Rational{BigInt}}` HOT 3
- Possible bug with version 0.4.42 HOT 6
- Error loading file saved with v0.4.41 on v0.4.43 HOT 1
- JLD2 writes zeros to first 3440 bytes in file HOT 4
- Cannot read custom struct with Dict subfield just written to jld file. HOT 2
- Reconstruction of UnionAll type parameters
- Add flag to disallow committing structs
- File created with earlier JLD2 version can't be opened with version 0.4.38 or later HOT 5
- JLD2 is failed to precompile on the latest nightly HOT 1
- How to print or show every group name of a JLD2 file HOT 2
- Manually flush written data to file HOT 2
- Saving/Reloading a struct with more than 255 fields leads to a "missing field" warning & corrupted loads HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jld2.jl.