Code Monkey home page Code Monkey logo

Comments (7)

kenkehoe avatar kenkehoe commented on May 31, 2024

When you use xr.align it will create a tuple of objects that have each object aligned with the time dimension. Since xarray already has this idea of a tuple of xarray objects I suggest looking into this method of containerizing multiple xarray objects. I have written some code that we will use to extract data from the object (and do some other QC stuff when requested). It can auto detect if the container is an object or a tuple. It will use a second keyword parameter "datastream" to go through the global attributes to get the correct requested variable (since the same variable name can be in multiple objects). Since this would work on tuple or object I don't think we can use object modifiers, and will need to use it as a function. If we use the same concept as xr.align then I think we could have each read datastream object put into a tuple to containerize it so we can pass that single container into a plotting routine. I would like to find a way where we don't require merging or aligning the datasets before making a comparison plot as that can be a lot of extra work to align and add time steps when we don't really need to for just a plot.

from act.

rcjackson avatar rcjackson commented on May 31, 2024

The way I'm thinking of doing this is using a dictionary with string keys that map to each datastream. For example, if we have 2 xarray objects ds1 and ds2, the input can be:

input_dict = {'ds1_name': ds, 'ds2_name': ds}

In the case of one dataset, I can make the class constructor automatically generate the dictionary based on the datastream name if the user does not provide a dictionary for one file. This then makes it to where I could then have the user specify the dataset name and variable in the plot routine. Since we are just plotting data, no merging or aligning should be needed since matplotlib should automatically account for the different timesteps.

from act.

kenkehoe avatar kenkehoe commented on May 31, 2024

I think the storing of keys in the dictionary is nicer for finding the correct object, but it's different than the current method xarray already implements. Do we want to deviate from the base xarray functionality?

As long as it's documented well enough we can transform a tuple of objects to a dictionary of objects quite simply. I think you plan is worth trying.

from act.

rcjackson avatar rcjackson commented on May 31, 2024

from act.

rcjackson avatar rcjackson commented on May 31, 2024

If you merge another dataset (or a dictionary including data array objects), by default the resulting dataset will be aligned on the union of all index coordinates:

In [12]: other = xr.Dataset({'bar': ('x', [1, 2, 3, 4]), 'x': list('abcd')})

In [13]: xr.merge([ds, other])
Out[13]:
<xarray.Dataset>
Dimensions: (x: 4, y: 3)
Coordinates:

  • x (x) object 'a' 'b' 'c' 'd'
  • y (y) int64 10 20 30
    Data variables:
    foo (x, y) float64 0.4691 -0.2829 -1.509 -1.136 ... nan nan nan nan
    bar (x) int64 1 2 3 4

This ensures that merge is non-destructive. xarray.MergeError is raised if you attempt to merge two variables with the same name but different values:

xarray raises an error if two variables of the same name occur in separate datasets that are merged:

In [14]: xr.merge([ds, ds + 1])
MergeError: conflicting values for variable 'foo' on objects to be combined:
first value: <xarray.Variable (x: 2, y: 3)>
array([[ 0.4691123 , -0.28286334, -1.5090585 ],
[-1.13563237, 1.21211203, -0.17321465]])
second value: <xarray.Variable (x: 2, y: 3)>
array([[ 1.4691123 , 0.71713666, -0.5090585 ],
[-0.13563237, 2.21211203, 0.82678535]])

The same non-destructive merging between DataArray index coordinates is used.

I know that when looking at aircraft data, a common plot to do is to plot LWC from different sensors on the same timeseries. While we would hope that the LWC would have different names in different datasets, I can see an edge case there where it wouldn't. Therefore, I think using the dictionary would help avoid this from happening so that the user doesn't have to worry about changing variable names.

from act.

kenkehoe avatar kenkehoe commented on May 31, 2024

I think plotting the two or more datasets where they use the same variable name will actually be common. For example plotting all the SGP MET temp_mean values on the same plot. That is where xr.align would work but xr.merge would not. xr.merge puts all data in the same object, while xr.align will keep the objects separate. Also, there will be many cases where the data to be plotted will have different variable names, but there happens to be a variable name common between the datasets. So I think this issue of multiple objects with a common variable name will be a common issue.

We could set the default to only show the name of the datastream with the variable if there is more than one object. I find it helpful to show the datastream name even when the variables are not the same name where there are multiple instruments. I think we should make that a plotting keyword option.

from act.

rcjackson avatar rcjackson commented on May 31, 2024

We now use a dict to store the datasets in the display object...closing.

from act.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.