Code Monkey home page Code Monkey logo

xeofs's Introduction

xeofs logo

Versions PyPI Conda
Build & Testing Build Coverage
Code Quality Black Ruff
Documentation Docs
Citation & Licensing JOSS Zenodo License
User Engagement Downloads

Overview

xeofs is a dedicated Python package for dimensionality reduction in the realm of climate science, offering methods like PCA, known as EOF analysis within the field, and related variants. Seamlessly integrated with xarray and Dask, it's tailored for easy handling and scalable computation on large, multi-dimensional datasets, making advanced climate data analysis both accessible and efficient.

  • Multi-Dimensional: Designed for xarray objects, it applies dimensionality reduction to multi-dimensional data while maintaining data labels.
  • Dask-Integrated: Supports large datasets via Dask xarray objects
  • Extensive Methods: Offers various dimensionality reduction techniques
  • Adaptable Output: Provides output corresponding to the type of input, whether single or list of xr.DataArray or xr.Dataset
  • Missing Values: Handles NaN values within the data
  • Bootstrapping: Comes with a user-friendly interface for model evaluation using bootstrapping
  • Efficient: Ensures computational efficiency, particularly with large datasets through randomized SVD
  • Modular: Allows users to implement and incorporate new dimensionality reduction methods

Installation

To install the package, use either of the following commands:

conda install -c conda-forge xeofs

or

pip install xeofs

Quickstart

In order to get started with xeofs, follow these simple steps:

Import the package

>>> import xarray as xr  # for example data only
>>> import xeofs as xe

Load example data

>>> t2m = xr.tutorial.open_dataset("air_temperature")
>>> t2m_west = t2m.isel(lon=slice(None, 20))
>>> t2m_east = t2m.isel(lon=slice(21, None))

EOF analysis Initiate and fit the EOF/PCA model to the data

>>> eof = xe.models.EOF(n_modes=10)
>>> eof.fit(t2m, dim="time")  # doctest: +ELLIPSIS
<xeofs.models.eof.EOF object at ...>

Now, you can access the model's EOF components and PC scores:

>>> comps = eof.components()  # EOFs (spatial patterns)
>>> scores = eof.scores()  # PCs (temporal patterns)

Varimax-rotated EOF analysis Initiate and fit an EOFRotator class to the model to obtain a varimax-rotated EOF analysis

>>> rotator = xe.models.EOFRotator(n_modes=3)
>>> rotator.fit(eof) # doctest: +ELLIPSIS
<xeofs.models.eof_rotator.EOFRotator object at ...>

>>> rot_comps = rotator.components()  # Rotated EOFs (spatial patterns)
>>> rot_scores = rotator.scores()  # Rotated PCs (temporal patterns)

Maximum Covariance Analysis (MCA)

>>> mca = xe.models.MCA(n_modes=10)
>>> mca.fit(t2m_west, t2m_east, dim="time")  # doctest: +ELLIPSIS
<xeofs.models.mca.MCA object at ...>

>>> comps1, comps2 = mca.components()  # Singular vectors (spatial patterns)
>>> scores1, scores2 = mca.scores()  # Expansion coefficients (temporal patterns)

Varimax-rotated MCA

>>> rotator = xe.models.MCARotator(n_modes=10)
>>> rotator.fit(mca)  # doctest: +ELLIPSIS
<xeofs.models.mca_rotator.MCARotator object at ...>

>>> rot_comps = rotator.components()  # Rotated singular vectors (spatial patterns)
>>> rot_scores = rotator.scores()  # Rotated expansion coefficients (temporal patterns)

To further explore the capabilities of xeofs, check out the available documentation and examples. For a full list of currently available methods, see the Reference API.

Documentation

For a more comprehensive overview and usage examples, visit the documentation.

Contributing

Contributions are highly welcomed and appreciated. If you're interested in improving xeofs or fixing issues, please read our Contributing Guide.

License

This project is licensed under the terms of the MIT license.

Contact

For questions or support, please open a Github issue.

Credits

  • Randomized PCA: scikit-learn
  • EOF analysis: Python package eofs by Andrew Dawson
  • MCA: Python package xMCA by Yefee
  • CCA: Python package CCA-Zoo by James Chapman
  • ROCK-PCA: Matlab implementation by Diego Bueso

How to cite?

When using xeofs, kindly remember to cite the original references of the methods employed in your work. Additionally, if xeofs is proving useful in your research, we'd appreciate if you could acknowledge its use with the following citation:

@article{rieger_xeofs_2024,
author = {Rieger, Niclas and Levang, Samuel J.},
doi = {10.21105/joss.06060},
journal = {Journal of Open Source Software},
month = jan,
number = {93},
pages = {6060},
title = {{xeofs: Comprehensive EOF analysis in Python with xarray}},
url = {https://joss.theoj.org/papers/10.21105/joss.06060},
volume = {9},
year = {2024}
}

Contributors

Contributors

xeofs's People

Contributors

aaronspring avatar actions-user avatar damienirving avatar malmans2 avatar nicrie avatar slevang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

xeofs's Issues

why PCS values are so small ?

I applied the EOF from xeofs module with S-mode, then I found that all the PCS are too small, between -1 and 1, how can I get the PCS looks like a Normalized time series as eofs module ?

Weird scores amplitude weightening after MCA rotation

The scores' amplitude changes after adjusting of n_modes and achieves unexpected values. Seems to originate from a wrong "dim" argument within the mca rotator class (dot product between scores and rotation matrix, dim = mode instead of dim = mode1)

project new "unseen" data to processed EOFS

Can the toolbox do that ?

For example :

model = EOF(sst, n_modes=5, norm=False, dim=['lat', 'lon'])
model.solve()
expvar = model.explained_variance_ratio()
eofs = model.eofs()
pcs = model.pcs()

Then something

model.transform(X)

which would yields the EOFs functions/values for each mode, for input array X

Thanks

Additional Information on the usage of package:xEOFs

Hello Team xEOFs,
Can you please help me with the following :

  1. How do I set the PC scaling to 1 and 2 (square root of singular values and singular values respectively) from the default o (orthonormal)?
  2. What if I want explained variance and SCFs alongside the plot of each mode?
    I'm able to view them individually this way:
    image
    but I want them embedded in the plot.
  3. Plus, how do I project new data onto singular vectors? I came across this in API : project_onto_right_singular_vectors(Y: Optional[Union[DataArray, List[DataArray]]] = None, scaling: int = 0) → DataArray, what is the right way to use it?
  4. Also, is there a way we can find lagged relationships/covariances using xeofs?
  5. Also, how can we try CCA with xeofs?

I'm trying to find answers in API - but, am unable to.
Please assist me with relevant material at least to proceed further,.
Thank in advance
Cheers

Dark masked data in homogenous patterns

Hello Niclas,
I just worked on XEOFs and it works really well.
However, I have dark patches (or maybe masked stuff - refer screenshot below) in my homogenous patterns that are quite evident in the first 3 modes. These drop after 4th mode (2nd image)
image
image
Is it something that is usual? or is something wrong with my result? in both cases, what does that actually mean?

Also, where can I change the line plot to bar graphs for PCs?

bug in reconstruction of rotated PCA

don't know why this happens but correct reconstruction should follow the structure below.

Xrec = (rot._pcs * np.sqrt(rot._explained_variance) * np.sqrt(pca.n_samples) @ rot._eofs.T)
Xrec = Xrec / pca._weights
Xrec = Xrec + pca._X_mean

Broadcasting dimensions with `xr.Dataset`

Combining xr.Dataset as input with both multi-dimensional sample and feature dimensions will broadcast dimensions thus yielding components with inflated dimensions. The broadcasted dimensions are filled with NaN and results seem right. ideally, however, this broadcasting shouldn't happen and should be avoided.

In a nutshell, instead of obtaining components like the following

xarray.Dataset
    Dimensions: (sample1: 2, feature1: 2, feature2: 3)
    Coordinates:  
        sample1  (sample1)  int64  1 2
        feature1  (feature1)  <U1  'a' 'b'
        feature2  (feature2)  int64  0 1 2
    Data variables:
        da1  (sample1, feature1, feature2)   int64    0 1 2 3 4 5 6 7 8 9 10 11
        da2  (sample1, feature1)   int64    0 3 6 9
    Indexes: (3)
    Attributes: (0)

we currently get

xarray.Dataset
    Dimensions: sample1: 2,  feature1: 2,  feature2: 3
    Coordinates:
        sample1 (sample1)  int64 1 2
        feature1 (feature1)   <U1  'a' 'b'
        feature2 (feature2)  int  0 1 2
    Data variables:
        da1  (sample1, feature1, feature2)  int64   0 1 nan 3 ... 9 10 nan
        da2  (sample1, feature1, feature2)  int64   nan nan 0 nan ... 6 nan nan 9
    Indexes: (3)
    Attributes: (0)

This arises from a potential inconsistency in xarray's to_stacked_array()/to_unstacked_dataset() methods (see discussion).

improve documentation

things to improve

  • streamline doc strings of classes and methods
  • add relevant references for each method
  • mention that p values that are corrected for multiple testing consider each mode individually
  • add an example for complex EOF analysis
  • extend theory section

Cartopy installed but unable to import

Hi Niclas @nicrie,
I'm installing xeofs in a newer desktop. I'm working on xeofs since months now, and greatly appreciate the support.

To my surprise, I'm unable to import cartopy this time.
I tried creating a newer enviromnet and installing

(xeof_c) C:\Users\buradagi>conda install -c conda-forge xeofs cartopy
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.10.3
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.


(xeof_c) C:\Users\buradagi>conda activate xeofs

(xeofs) C:\Users\buradagi>conda install -c conda-forge xeofs cartopy
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.10.3
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

When I try to import:
image

What could be the issue?
I almost created 4 new environments now, and tried multiple ways.
Cartopy is installed but on Jupyter Notebook I'm unable to install.

support for GWPCA

I would like to utilize xeofs in my research due to its compatibility with xarray datasets/data arrays, which makes working with them convenient. I am interested in determining if xeofs supports the calculation of Geographically Weighted PCA (GWPCA). Is it possible to do it?

Explained Local Variance EOF modes

Dear Niclas,

First, thank you so much for your xMCA and xeofs packages, they are excellent tools!
I am having a problem in defining the local explained variance, as in the following article:
https://www.researchgate.net/publication/252804054_A_coupled_bio-physical_model_HAMSOM-ECO_for_the_North_Sea_frontal_regions
It is more than a mathematical than programming problem, but I decided to ask your help because I am stuck on this.
I hope you can help me and I thank you in advance.

Best regards,

Felipe

Complex MCA & CCA related

Greetings Niclas @nicrie ,
I was going through this paper of yours "https://arxiv.org/pdf/2105.04618.pdf" -> how do you perform complex rotated mca?
Like from xeofs, the routine MCA code does:

mca = MCA(
    X, Y,
    n_modes=20,
    dim='time',
    norm=False,
    weights_X='coslat',
    weights_Y='coslat'
)

I'm going through this documentation (link: https://xeofs.readthedocs.io/en/latest/_autosummary/xeofs.models.ComplexMCA.html), yet, I don't understand how to call Complex MCA.
I tried "cmca = ComplexMCA() " this way, but it says it isn't defined.

Also, I see an upgrade on CCA addition, can you please direct to the documentation too?

Lastly, my data is a daily one, not for all the months, just 3 months for year (continuing for 20 yrs).
When I plot mca time series, I get something like:
111
whereas, I wnated a continous one plotted on years based on my data like:
222

Any assistance and support will be greatly appreciated

Differences from other packages?

Would it be useful to include a small paragraph discussing the notable differences from other packages? E.g., I was wondering what's the difference from eofs.

wrong order of clip in cosine weighting

stumbled over the following RuntimeWarning

RuntimeWarning: invalid value encountered in sqrt
  return np.sqrt(np.cos(np.deg2rad(data))).clip(0, 1)

It seems to me that the right order should be np.sqrt(np.cos(np.deg2rad(data)).clip(0, 1)), i.e. the values are clipped after the cosine and before the square root

Differences between T-mode and S-mode in EOFs

Dear Team xeofs,
I'm using xeofs for eof analysis.
I accidentally used it along spatial dimensions, but the API description says the traditional way is to use it along Time dimensions.
image

I got more information on this from: http://dx.doi.org/10.1016/j.rse.2014.03.015
However, now I'm a bit confused as the paper talks about 4 modes as extended EOFs. In S-mode, sS-mode and tS-mode, and T-mode: sT-mode and tT-mode. I just want to know the t-mode here refers to which of the two modes: tS or sT?

make sanity check optional

Currently, xeofs.preprocessing.Stacker performs a sanity check after transforming the data to detect isolated NaNs. In case of dask arrays, this will enforce computation which may blow up overall computation time. It would be desirable to choose whether to perform this check or not.

Usually NaNs in the SVD solver will trigger a LinAlgError. Alternatively, we could catch the error and trigger an error with a more informative error message

fix ignoring unrecognized parameters

Currently, providing a parameter to a model works without error message e.g. the following does not raise an error:

xe.models.EOF(n_modes=5, nonsense_parameter=3)

However, this is problematic as the recent major release changed some parameter names, e.g. performing standardized EOF analysis now requires standardize=True in contrast to version v.0.x.y which required norm=True.

generalize internal data structure

ideally something like a

DataContainer
-exclude: List
+data: Dict
+compute()

Data
+sample_dims: Sequence
+feature_dims: Sequence

InputData(Data)

Components(Data)

Scores(Data)

OtherData(Data)

Unable to install xeofs

My issue currently is very basic.
I'm unable to install xeofs. I already have xMCA and eofs installed. Does that cause any issues?
Screenshot below (this is taking forever)

image

Maximum Covariance Analysis is slower than one in the nicrie/xmca

Hello, I recently migrated the dependency library from nicrie/xmca to nicrie/xeofs, it seems that the speed of SVD decomposition has slowed down significantly. I found that the calculation function was transferred from np.linalg.svd to sklearn.utils.extmath.randomized_svd. I personally don’t know the reason for this replacement. Maybe there are some tricks to speed up the calculation?

minimal reproducible code (xmca==1.4.2, xeofs==1.0.5)

import pooch 
pooch.retrieve(url="https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2/sst.mnmean.nc", known_hash=None, path ='.', fname='sst.mnmean.nc')
pooch.retrieve(url="https://downloads.psl.noaa.gov/Datasets/cmap/enh/precip.mon.mean.nc", known_hash=None, path ='.', fname='precip.mon.mean.nc')

data_input1 = xr.open_dataset('sst.mnmean.nc', chunks = 'auto').sst.sel(time = slice('1982-01-01', '2022-12-31'))
data_input2 = xr.open_dataset('precip.mon.mean.nc', chunks = 'auto').precip.sel(time = slice('1982-01-01', '2022-12-31'))

Calculates very fast on my computer, about 5 seconds (It has nothing to do with the use of dask after the data chunk)

from xmca.xarray import xMCA

mca = xMCA(data_input1, data_input2)
mca.normalize()
mca.apply_coslat()
mca.solve(complexify=False)

But the calculation using xeof is very slow, and the result cannot be obtained in a few minutes

from xeofs.models import MCA

model = MCA(n_modes = 5, standardize = False, use_coslat = True)
model.fit(data_input1, data_input2, dim = 'time')

Underlying theory/covariance or correlation PCA/EOF

Hi Niclas,

After my extensive reading on the topic of PCA, EOF I was wondering whether the multivariate xeof example here, (https://xeofs.readthedocs.io/en/latest/auto_examples/1uni/plot_multivariate-eof.html#sphx-glr-auto-examples-1uni-plot-multivariate-eof-py), uses the covariance or correlation matrix?
I wanted to run a multivariate EOF for three variables at each grid box of WRF output. And my supervisor has recommended using a correlation based PCA since the variables are different. I understand that your example uses subsets of the same variable, but I am wondering if it is suitable to replace these subsets with different variables ?
Many thanks.

support flexible input type

currently support xr.DataArray and a list of xr.DataArray as possible input types. Extend functionality to include xr.Dataset as input.

Add serialization methods

One thing missing from most EOF libraries is a method to quickly serialize/deserialize the solver to disk. This becomes necessary for any sort of real-time use case, where you may want to take new data and project onto the components using transform() with a pre-computed solver.

What do you thinking about adding this to xeofs? One can of course use pickle to do this, but that quickly leads to compatibility issues with new versions. It should be straightforward to dump to netcdf or zarr since we just need to save a few arrays and the essential attributes used to instantiate the solver. Perhaps a .save() method on the models, plus .load() as a classmethod or some other loading utility function?

add type hints

That should support code maintenance and documentation in the future.

In our case of subclasses we probably need TypeVar and Generic.

Here's an example:

from abc import ABC, abstractmethod
from typing import TypeVar, Generic

T = TypeVar('T')

class A(ABC, Generic[T]):
    @abstractmethod
    def method(self, input: T) -> T:
        pass

class B(A[int]):
    def method(self, input: int) -> int:
        # Implement the method specifically for ints
        pass

class C(A[float]):
    def method(self, input: float) -> float:
        # Implement the method specifically for floats
        pass

In this example, A is an abstract base class and uses a type variable T. It declares an abstract method method that takes an argument of type T and returns a value of type T. B and C inherit from A, specializing T to int and float respectively. Each of them implements method for their respective types.

This approach ensures that the subclasses B and C will work with the specific types you want (int and float respectively), while maintaining the generality of the base class A.

If we want to ensure that A should only be instantiated with certain types, we can add constraints to the TypeVar:

T = TypeVar('T', int, float)

With this, T can only be int or float, and trying to create a class that inherits from A[str] for example, would result in a type error.

Provide standard kwarg for model fitting random seed

I am trying to make PCA fitting using xeofs.models.EOF reproducible by always selecting the randomized solver and passing a random_state kwarg to the solver. Unfortunately, not all solvers that xeofs uses internally, in this case, use a random_state parameter, one also uses seed, and neither accepts the other. At the moment, I need to loop over these possibilities, try each, and catch the type error if it was the wrong one. Could the model init or fit method take a random_state or seed parameter that is then passed to the correct kwarg for the chosen solver? Thanks for your help and the fantastic library!

create a flexible class to create test data in conftest.py

something along the line...

import numpy as np
import xarray as xr
import pytest

# Define your fixtures
@pytest.fixture
def regular_data():
    return xr.DataArray(np.random.rand(5, 5))

@pytest.fixture
def regular_dataset():
    return xr.Dataset({'var1': (("x", "y"), np.random.rand(5, 5))})

@pytest.fixture
def nan_data():
    return xr.DataArray(np.random.rand(5, 5), dims=("x", "y")).where(lambda x: x > 0.2)

@pytest.fixture
def nan_dataset():
    return xr.Dataset({'var1': (("x", "y"), np.random.rand(5, 5))}).where(lambda x: x > 0.2)

# Define the parametrized fixture
@pytest.fixture
def model_data(request):
    data_type = request.param[0]
    fill_type = request.param[1]

    if fill_type == 'regular':
        return request.getfixturevalue(f'{data_type}_data')
    elif fill_type == 'nans':
        return request.getfixturevalue(f'{data_type}_dataset')
    else:
        raise ValueError('Invalid argument: {fill_type}')

# Use the fixture in your tests
@pytest.mark.parametrize('model_data', [('nan', 'regular'), ('nan', 'nans'), ('regular', 'regular'), ('regular', 'nans')], indirect=True)
def test_model_data(model_data):
    print(model_data)

``

About compatibility of xeofs with other packages

The xeofs is awesome and user friendly, but I just wonder how many packages it can safely coexist with. I have found that it can cause compatibility problems when installing other packages, such as cdo or ncview. While I can wrap around by creating a new environment for the xeofs, I am just curious about its package dependencies.

add more test cases

  • Hilbert transform / padding: verify that real and imag part are zero after transformation (padding may introduce a shift)
  • explained variance fraction is always <= 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.