Comments (8)
I had one for aligning dimensions, here:
https://github.com/mcabbott/NamedPlus.jl/blob/master/src/permute.jl
But have not thought in any detail about how aligning by keys should work. It's going to cat
on some zeros, along the indicated dims
, such that broadcasting will succeed?
AxisArrays also has some join
methods which I haven't ever used.
from axiskeys.jl.
Hmmm, that seems to work a bit differently than I was thinking. A common use-case we have is wanting to select just the data with intersecting keys for multiple KeyedArrays
. Right now, we just do this by taking intersect
or setdiff
(depending on the use-case) of the key values, and then use that for our manual selection. For example, I'd like to do something like:
aligned_A, aligned_B = align(A, B; join=:inner, on=:mydim, default=missing)
This seems pretty close to what the xarrays function does because we want the same number of returned arrays as passed in, rather than combining them.
from axiskeys.jl.
OK now I read the xarray doc, a bit.
It sounds a bit like whether you intersect or unite the keys ought to be per-dimension, not per function call. Maybe it looks like this?
A′, B′, C′ = collate(A, B, C; union=:y, fill=missing)
The NamedPlus function is I think quite close to Pytorch's align_to / align_as, these are much simpler of course. The most basic use is just to give the target names:
A′ = align(A, (:x, :y))
Should these be the same function? If the function which trims/extends according to keys gets differing dimension names, does it permute them to line up? By what rule? A′ = align(A,B)
just uses B's names, and pushes all others later. Since B is untouched, there was no need to return it, but perhaps that wouldn't hurt. Maybe if you omit the keywords, then it only looks at dimension names. And the full thing could be
A′, B′, C′ = align(A, B, C, (:x, :y); intersect=:y, fill=NaN, equal=:z)
from axiskeys.jl.
It sounds a bit like whether you intersect or unite the keys ought to be per-dimension, not per function call.
Sure. I'd be fine with supporting a union
, setdiff
or intersect
per dimension in case you wanted to mix and match. I do like the join
terminology that xarrays uses, but perhaps that's too misleading for a KeyedArray
🤷
I agree that these should probably be different functions as the behaviou seems pretty different? I don't think the behaviour should be completely different depending on whether you passed keywords, but rather that passing no keywords should perform some default behaviour (e.g., intersect
for all dims). AFAICT, the current align
function in NamedDims is basically just to doing a lazy permutedims
? I guess my only concern is that I feel like align
doesn't seem like an intuitive name for the permutedims
behaviour, IMHO.
from axiskeys.jl.
If you think this behaviour would be useful I can work on an implementation using the name collate
, and we can sort out the preferred signature/naming after.
from axiskeys.jl.
I don't think the behaviour should be completely different depending on whether you passed keywords, but rather that passing no keywords should perform some default behaviour (e.g., intersect for all dims).
I agree, but was thinking these might not be so different. Are A′, B′, C′
permuted so that their dimensions line up? It's not clear to me whether or not the xarray function does this. But if ours does, then doing just that seems like a fairly neutral default behaviour, i.e. each keyword's default is like dims=()
, because intersect=:
as a default would seem to be in conflict with specifying only union=:y
.
Instead of terminology from Base, I guess another possible source is DataFrames, innerjoin /outerjoin/antijoin:
https://juliadata.github.io/DataFrames.jl/stable/man/joins/
These functions from AxisArrays tend to make one big array, might be good not to collide with their terminology:
https://github.com/JuliaArrays/AxisArrays.jl/blob/master/src/combine.jl
feel like align doesn't seem like an intuitive name for the permutedims behaviour,
I guess they are aligned in the sense of the letters lining up here... but words are weird.
dimnames(A) == (:x, :_ , :z)
dimnames(A) == (:_, :y , :z, :t)
dimnames(C) == (:x, :y)
from axiskeys.jl.
Are A′, B′, C′ permuted so that their dimensions line up?
I didn't think so, I figured that it would just align values by name regardless of the orientation of the underlying data. If you'd like to support that as well to make one consistent function then I'm fine with that.
each keyword's default is like dims=(), because intersect=: as a default would seem to be in conflict with specifying only union=:y
Hmmm, I guess that would be a tradeoff with handling multiple dimensions simultaneously.
Instead of terminology from Base, I guess another possible source is DataFrames, innerjoin /outerjoin/antijoin:
I think that would be nice as it might make the transition from a DataFrames like API easier.
These functions from AxisArrays tend to make one big array, might be good not to collide with their terminology
Yeah, though I do think eventually supporting the same merge
and join
functionality would also be nice.
from axiskeys.jl.
I have named my try at this as sync() and sync_to() in the attached file. It is not particularly performance-optimised, but should be efficient when both the source and target keys are sorted. It seems to do the job for me. Perhaps you also find this useful.
Some examples:
K1, K2 = sync(K1, K2)
# sync all dimensions to the set of common keys (intersect).
K1, K2 = sync(K1, K2, type=:outer)
# sync all dimensions to the union of keys. New values are going to be set to NaN.
K1, K2 = sync(K1, K2, dims=:x)
# sync the :x dimension to the set of common keys. K1 and K2 must have :x with
# the same key type, but it can be at a different index.
K = sync_to(Dict(:x => ["foo","bar"]), K, fillval = 0)
# K's :x dimension will have "foo" and "bar" keys. If it had values at them,
# they are preserved, others are set to 0.
from axiskeys.jl.
Related Issues (20)
- wrapdims(::DataFrame) produces incorrect results when not all key combinations are present HOT 4
- Feature request: aggregation function for wrapdims (and populate!)
- Ambiguity error: ProjectTo(::KeyedArray(...)))((::NoTangent)) HOT 1
- `vcat`/`hcat`/`cat` bug at edge case with one `KeyedArray` HOT 2
- `getindex(::KeyedVector, ::Colon, ::Colon)` is broken
- Document `setkey!` in the README
- Error trying to `show` KeyedArray with undef values
- `ProjectTo` is too permissive? HOT 3
- Broadcasting ambiguity
- Interpolation
- isequal violates transitive property HOT 1
- Error in `LinearAlgebra.copy_oftype` on addition of symmetric `KeyedArray` and `UniformScaling` HOT 2
- `eachslice` fails on v1.9-beta2 HOT 1
- Feature request: `empty!`
- unsupported keyword argument "time" when taking a gradient with Zygote HOT 1
- Wrong FFT results
- Maybe update `LazyStack` so that this warning vanishes in Julia 1.9 HOT 1
- Slicing with larger key vectors is slow HOT 2
- Support for NaNStatistics HOT 2
- vcat / hcat is broken on julia 1.10 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from axiskeys.jl.