xarray-contrib / xeofs Goto Github PK

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis

Home Page: https://xeofs.readthedocs.io/

License: MIT License

Python 100.00%

eof-analysis pca climate-science xarray pattern-recognition dimensionality-reduction dask

xeofs's Introduction

Versions
Build & Testing
Code Quality
Documentation
Citation & Licensing
User Engagement

Overview

xeofs is a dedicated Python package for dimensionality reduction in the realm of climate science, offering methods like PCA, known as EOF analysis within the field, and related variants. Seamlessly integrated with xarray and Dask, it's tailored for easy handling and scalable computation on large, multi-dimensional datasets, making advanced climate data analysis both accessible and efficient.

Multi-Dimensional: Designed for xarray objects, it applies dimensionality reduction to multi-dimensional data while maintaining data labels.
Dask-Integrated: Supports large datasets via Dask xarray objects
Extensive Methods: Offers various dimensionality reduction techniques
Adaptable Output: Provides output corresponding to the type of input, whether single or list of xr.DataArray or xr.Dataset
Missing Values: Handles NaN values within the data
Bootstrapping: Comes with a user-friendly interface for model evaluation using bootstrapping
Efficient: Ensures computational efficiency, particularly with large datasets through randomized SVD
Modular: Allows users to implement and incorporate new dimensionality reduction methods

Installation

To install the package, use either of the following commands:

conda install -c conda-forge xeofs

pip install xeofs

Quickstart

In order to get started with xeofs, follow these simple steps:

Import the package

>>> import xarray as xr  # for example data only
>>> import xeofs as xe

Load example data

>>> t2m = xr.tutorial.open_dataset("air_temperature")
>>> t2m_west = t2m.isel(lon=slice(None, 20))
>>> t2m_east = t2m.isel(lon=slice(21, None))

EOF analysis Initiate and fit the EOF/PCA model to the data

>>> eof = xe.models.EOF(n_modes=10)
>>> eof.fit(t2m, dim="time")  # doctest: +ELLIPSIS
<xeofs.models.eof.EOF object at ...>

Now, you can access the model's EOF components and PC scores:

>>> comps = eof.components()  # EOFs (spatial patterns)
>>> scores = eof.scores()  # PCs (temporal patterns)

Varimax-rotated EOF analysis Initiate and fit an EOFRotator class to the model to obtain a varimax-rotated EOF analysis

>>> rotator = xe.models.EOFRotator(n_modes=3)
>>> rotator.fit(eof) # doctest: +ELLIPSIS
<xeofs.models.eof_rotator.EOFRotator object at ...>

>>> rot_comps = rotator.components()  # Rotated EOFs (spatial patterns)
>>> rot_scores = rotator.scores()  # Rotated PCs (temporal patterns)

Maximum Covariance Analysis (MCA)

>>> mca = xe.models.MCA(n_modes=10)
>>> mca.fit(t2m_west, t2m_east, dim="time")  # doctest: +ELLIPSIS
<xeofs.models.mca.MCA object at ...>

>>> comps1, comps2 = mca.components()  # Singular vectors (spatial patterns)
>>> scores1, scores2 = mca.scores()  # Expansion coefficients (temporal patterns)

Varimax-rotated MCA

>>> rotator = xe.models.MCARotator(n_modes=10)
>>> rotator.fit(mca)  # doctest: +ELLIPSIS
<xeofs.models.mca_rotator.MCARotator object at ...>

>>> rot_comps = rotator.components()  # Rotated singular vectors (spatial patterns)
>>> rot_scores = rotator.scores()  # Rotated expansion coefficients (temporal patterns)

To further explore the capabilities of xeofs, check out the available documentation and examples. For a full list of currently available methods, see the Reference API.

Documentation

For a more comprehensive overview and usage examples, visit the documentation.

Contributing

Contributions are highly welcomed and appreciated. If you're interested in improving xeofs or fixing issues, please read our Contributing Guide.

License

This project is licensed under the terms of the MIT license.

Contact

For questions or support, please open a Github issue.

Credits

Randomized PCA: scikit-learn
EOF analysis: Python package eofs by Andrew Dawson
MCA: Python package xMCA by Yefee
CCA: Python package CCA-Zoo by James Chapman
ROCK-PCA: Matlab implementation by Diego Bueso

How to cite?

When using xeofs, kindly remember to cite the original references of the methods employed in your work. Additionally, if xeofs is proving useful in your research, we'd appreciate if you could acknowledge its use with the following citation:

@article{rieger_xeofs_2024,
author = {Rieger, Niclas and Levang, Samuel J.},
doi = {10.21105/joss.06060},
journal = {Journal of Open Source Software},
month = jan,
number = {93},
pages = {6060},
title = {{xeofs: Comprehensive EOF analysis in Python with xarray}},
url = {https://joss.theoj.org/papers/10.21105/joss.06060},
volume = {9},
year = {2024}
}

Contributors

xeofs's People

Contributors

Stargazers

Watchers

Forkers

mschulzie dongyi1996 aaronspring marpyr mshiv baiquniyu slevang timsieker shahisonika roxyboy niclasrieger kris-cy damienirving malmans2 omaralhassan prakrutkansara marcelgeopy

xeofs's Issues

why PCS values are so small ?

I applied the EOF from xeofs module with S-mode, then I found that all the PCS are too small, between -1 and 1, how can I get the PCS looks like a Normalized time series as eofs module ?

add Rotated EOF analysis

Weird scores amplitude weightening after MCA rotation

The scores' amplitude changes after adjusting of n_modes and achieves unexpected values. Seems to originate from a wrong "dim" argument within the mca rotator class (dot product between scores and rotation matrix, dim = mode instead of dim = mode1)

add Optimally Persistent Patterns

DOI:
https://doi.org/10.1175/1520-0469(2001)058<1341:OPPITV>2.0.CO;2

p-values of correlation measures are not FDR-corrected

ideally the false discovery rate should be an argument provided to the method

project new "unseen" data to processed EOFS

Can the toolbox do that ?

For example :

model = EOF(sst, n_modes=5, norm=False, dim=['lat', 'lon'])
model.solve()
expvar = model.explained_variance_ratio()
eofs = model.eofs()
pcs = model.pcs()

Then something

model.transform(X)

which would yields the EOFs functions/values for each mode, for input array X

Thanks

Additional Information on the usage of package:xEOFs

Hello Team xEOFs,
Can you please help me with the following :

How do I set the PC scaling to 1 and 2 (square root of singular values and singular values respectively) from the default o (orthonormal)?
What if I want explained variance and SCFs alongside the plot of each mode?
I'm able to view them individually this way:

but I want them embedded in the plot.
Plus, how do I project new data onto singular vectors? I came across this in API : project_onto_right_singular_vectors(Y: Optional[Union[DataArray, List[DataArray]]] = None, scaling: int = 0) → DataArray, what is the right way to use it?
Also, is there a way we can find lagged relationships/covariances using xeofs?
Also, how can we try CCA with xeofs?

I'm trying to find answers in API - but, am unable to.
Please assist me with relevant material at least to proceed further,.
Thank in advance
Cheers

provide method to project unseen data onto EOFs

Dark masked data in homogenous patterns

Hello Niclas,
I just worked on XEOFs and it works really well.
However, I have dark patches (or maybe masked stuff - refer screenshot below) in my homogenous patterns that are quite evident in the first 3 modes. These drop after 4th mode (2nd image)

Is it something that is usual? or is something wrong with my result? in both cases, what does that actually mean?

Also, where can I change the line plot to bar graphs for PCs?

`xr.DataArray` named `pcs` from `eofs` but `PCs` from `xeofs`

small inconsistency I found

EOF.pcs().name # pcs from eofs # EOFs from xeofs

EDIT: its between xeofs and eofs

bug in reconstruction of rotated PCA

don't know why this happens but correct reconstruction should follow the structure below.

Xrec = (rot._pcs * np.sqrt(rot._explained_variance) * np.sqrt(pca.n_samples) @ rot._eofs.T)
Xrec = Xrec / pca._weights
Xrec = Xrec + pca._X_mean

How to implement own method using BaseModel

Broadcasting dimensions with `xr.Dataset`

Combining xr.Dataset as input with both multi-dimensional sample and feature dimensions will broadcast dimensions thus yielding components with inflated dimensions. The broadcasted dimensions are filled with NaN and results seem right. ideally, however, this broadcasting shouldn't happen and should be avoided.

In a nutshell, instead of obtaining components like the following

xarray.Dataset
    Dimensions: (sample1: 2, feature1: 2, feature2: 3)
    Coordinates:  
        sample1  (sample1)  int64  1 2
        feature1  (feature1)  <U1  'a' 'b'
        feature2  (feature2)  int64  0 1 2
    Data variables:
        da1  (sample1, feature1, feature2)   int64    0 1 2 3 4 5 6 7 8 9 10 11
        da2  (sample1, feature1)   int64    0 3 6 9
    Indexes: (3)
    Attributes: (0)

we currently get

xarray.Dataset
    Dimensions: sample1: 2,  feature1: 2,  feature2: 3
    Coordinates:
        sample1 (sample1)  int64 1 2
        feature1 (feature1)   <U1  'a' 'b'
        feature2 (feature2)  int  0 1 2
    Data variables:
        da1  (sample1, feature1, feature2)  int64   0 1 nan 3 ... 9 10 nan
        da2  (sample1, feature1, feature2)  int64   nan nan 0 nan ... 6 nan nan 9
    Indexes: (3)
    Attributes: (0)

This arises from a potential inconsistency in xarray's to_stacked_array()/to_unstacked_dataset() methods (see discussion).

refactor: complex utilties

rotation is in its own module, hilbert should also get its own one

improve documentation

things to improve

streamline doc strings of classes and methods
add relevant references for each method
mention that p values that are corrected for multiple testing consider each mode individually
add an example for complex EOF analysis
extend theory section

add Results class

should handle the the results of any model in a consistent way

Cartopy installed but unable to import

Hi Niclas @nicrie,
I'm installing xeofs in a newer desktop. I'm working on xeofs since months now, and greatly appreciate the support.

To my surprise, I'm unable to import cartopy this time.
I tried creating a newer enviromnet and installing

(xeof_c) C:\Users\buradagi>conda install -c conda-forge xeofs cartopy
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.10.3
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.


(xeof_c) C:\Users\buradagi>conda activate xeofs

(xeofs) C:\Users\buradagi>conda install -c conda-forge xeofs cartopy
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.10.3
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

When I try to import:

What could be the issue?
I almost created 4 new environments now, and tried multiple ways.
Cartopy is installed but on Jupyter Notebook I'm unable to install.

support for GWPCA

I would like to utilize xeofs in my research due to its compatibility with xarray datasets/data arrays, which makes working with them convenient. I am interested in determining if xeofs supports the calculation of Geographically Weighted PCA (GWPCA). Is it possible to do it?

Inverse transform of MCA fails if standardize = True?

Explained Local Variance EOF modes

Dear Niclas,

First, thank you so much for your xMCA and xeofs packages, they are excellent tools!
I am having a problem in defining the local explained variance, as in the following article:
https://www.researchgate.net/publication/252804054_A_coupled_bio-physical_model_HAMSOM-ECO_for_the_North_Sea_frontal_regions
It is more than a mathematical than programming problem, but I decided to ask your help because I am stuck on this.
I hope you can help me and I thank you in advance.

Best regards,

Felipe

wrong dot product dimension in transform of Rotated MCA

Similar to #65 , transform of Rotated MCA performs the dot product along wrong dimension. Instead of mode1 it should be mode. Can be fixed like #65 .

Complex MCA & CCA related

Greetings Niclas @nicrie ,
I was going through this paper of yours "https://arxiv.org/pdf/2105.04618.pdf" -> how do you perform complex rotated mca?
Like from xeofs, the routine MCA code does:

mca = MCA(
    X, Y,
    n_modes=20,
    dim='time',
    norm=False,
    weights_X='coslat',
    weights_Y='coslat'
)

I'm going through this documentation (link: https://xeofs.readthedocs.io/en/latest/_autosummary/xeofs.models.ComplexMCA.html), yet, I don't understand how to call Complex MCA.
I tried "cmca = ComplexMCA() " this way, but it says it isn't defined.

Also, I see an upgrade on CCA addition, can you please direct to the documentation too?

Lastly, my data is a daily one, not for all the months, just 3 months for year (continuing for 20 yrs).
When I plot mca time series, I get something like:

whereas, I wnated a continous one plotted on years based on my data like:

Any assistance and support will be greatly appreciated

Allow weighting (e.g. latitude weighting)

Differences from other packages?

Would it be useful to include a small paragraph discussing the notable differences from other packages? E.g., I was wondering what's the difference from eofs.

wrong order of clip in cosine weighting

stumbled over the following RuntimeWarning

RuntimeWarning: invalid value encountered in sqrt
  return np.sqrt(np.cos(np.deg2rad(data))).clip(0, 1)

It seems to me that the right order should be np.sqrt(np.cos(np.deg2rad(data)).clip(0, 1)), i.e. the values are clipped after the cosine and before the square root

add bootstrapping for significance analysis

Differences between T-mode and S-mode in EOFs

Dear Team xeofs,
I'm using xeofs for eof analysis.
I accidentally used it along spatial dimensions, but the API description says the traditional way is to use it along Time dimensions.

I got more information on this from: http://dx.doi.org/10.1016/j.rse.2014.03.015
However, now I'm a bit confused as the paper talks about 4 modes as extended EOFs. In S-mode, sS-mode and tS-mode, and T-mode: sT-mode and tT-mode. I just want to know the t-mode here refers to which of the two modes: tS or sT?

add Complex EOF analysis

Convert the attributes type for easier export to netcdf format with the xarray.Dataset.to_netcdf method

The attributes of the MCA method output, has a "None" type and "bool" type, which are not directly compatible with the xr.DataArray.to_netcdf method. I wonder if it is possible to convert these None and Logical values to the String type to faciliate an easier export to netcdf format with the Xarray package, if these attributes are used only to record model configuration details.

make sanity check optional

Currently, xeofs.preprocessing.Stacker performs a sanity check after transforming the data to detect isolated NaNs. In case of dask arrays, this will enforce computation which may blow up overall computation time. It would be desirable to choose whether to perform this check or not.

Usually NaNs in the SVD solver will trigger a LinAlgError. Alternatively, we could catch the error and trigger an error with a more informative error message

fix ignoring unrecognized parameters

Currently, providing a parameter to a model works without error message e.g. the following does not raise an error:

xe.models.EOF(n_modes=5, nonsense_parameter=3)

However, this is problematic as the recent major release changed some parameter names, e.g. performing standardized EOF analysis now requires standardize=True in contrast to version v.0.x.y which required norm=True.

generalize internal data structure

ideally something like a

DataContainer
-exclude: List
+data: Dict
+compute()

Data
+sample_dims: Sequence
+feature_dims: Sequence

InputData(Data)

Components(Data)

Scores(Data)

OtherData(Data)

Unable to install xeofs

My issue currently is very basic.
I'm unable to install xeofs. I already have xMCA and eofs installed. Does that cause any issues?
Screenshot below (this is taking forever)

support for MCA

hopefully in next release

LinAlg Error during Rotation for zero communalities

perhaps add a small stabilizer to communalities; this should avoid the error reported in #19

Maximum Covariance Analysis is slower than one in the nicrie/xmca

Hello, I recently migrated the dependency library from nicrie/xmca to nicrie/xeofs, it seems that the speed of SVD decomposition has slowed down significantly. I found that the calculation function was transferred from np.linalg.svd to sklearn.utils.extmath.randomized_svd. I personally don’t know the reason for this replacement. Maybe there are some tricks to speed up the calculation?

minimal reproducible code (xmca==1.4.2, xeofs==1.0.5)

import pooch 
pooch.retrieve(url="https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2/sst.mnmean.nc", known_hash=None, path ='.', fname='sst.mnmean.nc')
pooch.retrieve(url="https://downloads.psl.noaa.gov/Datasets/cmap/enh/precip.mon.mean.nc", known_hash=None, path ='.', fname='precip.mon.mean.nc')

data_input1 = xr.open_dataset('sst.mnmean.nc', chunks = 'auto').sst.sel(time = slice('1982-01-01', '2022-12-31'))
data_input2 = xr.open_dataset('precip.mon.mean.nc', chunks = 'auto').precip.sel(time = slice('1982-01-01', '2022-12-31'))

Calculates very fast on my computer, about 5 seconds (It has nothing to do with the use of dask after the data chunk)

from xmca.xarray import xMCA

mca = xMCA(data_input1, data_input2)
mca.normalize()
mca.apply_coslat()
mca.solve(complexify=False)

But the calculation using xeof is very slow, and the result cannot be obtained in a few minutes

from xeofs.models import MCA

model = MCA(n_modes = 5, standardize = False, use_coslat = True)
model.fit(data_input1, data_input2, dim = 'time')

Underlying theory/covariance or correlation PCA/EOF

Hi Niclas,

After my extensive reading on the topic of PCA, EOF I was wondering whether the multivariate xeof example here, (https://xeofs.readthedocs.io/en/latest/auto_examples/1uni/plot_multivariate-eof.html#sphx-glr-auto-examples-1uni-plot-multivariate-eof-py), uses the covariance or correlation matrix?
I wanted to run a multivariate EOF for three variables at each grid box of WRF output. And my supervisor has recommended using a correlation based PCA since the variables are different. I understand that your example uses subsets of the same variable, but I am wondering if it is suitable to replace these subsets with different variables ?
Many thanks.

provide dask support

support flexible input type

currently support xr.DataArray and a list of xr.DataArray as possible input types. Extend functionality to include xr.Dataset as input.

Add serialization methods

One thing missing from most EOF libraries is a method to quickly serialize/deserialize the solver to disk. This becomes necessary for any sort of real-time use case, where you may want to take new data and project onto the components using transform() with a pre-computed solver.

What do you thinking about adding this to xeofs? One can of course use pickle to do this, but that quickly leads to compatibility issues with new versions. It should be straightforward to dump to netcdf or zarr since we just need to save a few arrays and the essential attributes used to instantiate the solver. Perhaps a .save() method on the models, plus .load() as a classmethod or some other loading utility function?

add type hints

That should support code maintenance and documentation in the future.

In our case of subclasses we probably need TypeVar and Generic.

Here's an example:

from abc import ABC, abstractmethod
from typing import TypeVar, Generic

T = TypeVar('T')

class A(ABC, Generic[T]):
    @abstractmethod
    def method(self, input: T) -> T:
        pass

class B(A[int]):
    def method(self, input: int) -> int:
        # Implement the method specifically for ints
        pass

class C(A[float]):
    def method(self, input: float) -> float:
        # Implement the method specifically for floats
        pass

In this example, A is an abstract base class and uses a type variable T. It declares an abstract method method that takes an argument of type T and returns a value of type T. B and C inherit from A, specializing T to int and float respectively. Each of them implements method for their respective types.

This approach ensures that the subclasses B and C will work with the specific types you want (int and float respectively), while maintaining the generality of the base class A.

If we want to ensure that A should only be instantiated with certain types, we can add constraints to the TypeVar:

T = TypeVar('T', int, float)

With this, T can only be int or float, and trying to create a class that inherits from A[str] for example, would result in a type error.

PCA preprocessing before Hilbert transform in Complex MCA

PCA preprocessing should happen before Hilbert transform

https://github.com/nicrie/xeofs/blob/7a09efeaf835f43a9072fa2d9220d9fc3b1d379e/xeofs/models/mca.py#L607C1-L621C10

Multivariate EOF analysis

Provide standard kwarg for model fitting random seed

I am trying to make PCA fitting using xeofs.models.EOF reproducible by always selecting the randomized solver and passing a random_state kwarg to the solver. Unfortunately, not all solvers that xeofs uses internally, in this case, use a random_state parameter, one also uses seed, and neither accepts the other. At the moment, I need to loop over these possibilities, try each, and catch the type error if it was the wrong one. Could the model init or fit method take a random_state or seed parameter that is then passed to the correct kwarg for the chosen solver? Thanks for your help and the fantastic library!

squared covariance fraction is wrong for Complex MCA

it seems there should be a conjugate within the multiplication:

https://github.com/nicrie/xeofs/blob/75158c638f7809982820acbda60d495b173aef03/xeofs/models/decomposer.py#L151-L152C1

create a flexible class to create test data in conftest.py

something along the line...

import numpy as np
import xarray as xr
import pytest

# Define your fixtures
@pytest.fixture
def regular_data():
    return xr.DataArray(np.random.rand(5, 5))

@pytest.fixture
def regular_dataset():
    return xr.Dataset({'var1': (("x", "y"), np.random.rand(5, 5))})

@pytest.fixture
def nan_data():
    return xr.DataArray(np.random.rand(5, 5), dims=("x", "y")).where(lambda x: x > 0.2)

@pytest.fixture
def nan_dataset():
    return xr.Dataset({'var1': (("x", "y"), np.random.rand(5, 5))}).where(lambda x: x > 0.2)

# Define the parametrized fixture
@pytest.fixture
def model_data(request):
    data_type = request.param[0]
    fill_type = request.param[1]

    if fill_type == 'regular':
        return request.getfixturevalue(f'{data_type}_data')
    elif fill_type == 'nans':
        return request.getfixturevalue(f'{data_type}_dataset')
    else:
        raise ValueError('Invalid argument: {fill_type}')

# Use the fixture in your tests
@pytest.mark.parametrize('model_data', [('nan', 'regular'), ('nan', 'nans'), ('regular', 'regular'), ('regular', 'nans')], indirect=True)
def test_model_data(model_data):
    print(model_data)

``

About compatibility of xeofs with other packages

The xeofs is awesome and user friendly, but I just wonder how many packages it can safely coexist with. I have found that it can cause compatibility problems when installing other packages, such as cdo or ncview. While I can wrap around by creating a new environment for the xeofs, I am just curious about its package dependencies.

Hilbert transform / padding: verify that real and imag part are zero after transformation (padding may introduce a shift)
explained variance fraction is always <= 1

xarray-contrib / xeofs Goto Github PK

xeofs's Introduction

Overview

Installation

Quickstart

Documentation

Contributing

License

Contact

Credits

How to cite?

Contributors

xeofs's People

Contributors

Stargazers

Watchers

Forkers

xeofs's Issues

Recommend Projects

Recommend Topics

Recommend Org