Comments (3)
I've made a small tweak that eliminates at least one unnecessary computation. Feel free to push to the branch for any improvements regarding this issue.
from xeofs.
This also leads to some serious inefficiencies if we set compute=False
, because we end up computing the SVD multiple times. With the current structure, it's unavoidable that we compute during the sign flip and then again when we compute the DataContainer
objects. Although I wonder if this step could be moved as some postprocessing layer after whatever our first compute call is. May take some significant restructuring.
DataContainer.compute()
is particularly inefficient though in this respect, because the loop will end up computing the SVD for each object we need to compute:
https://github.com/nicrie/xeofs/blob/1f38a5b818a5d5fc55720e1119e6f686aef89168/xeofs/data_container/data_container.py#L29-L36
Easily optimized with something like this, where dask.compute
will optimize the task graph for all objects simultaneously:
def compute(self, verbose=False):
computed_data = {k: v for k, v in self.items() if self._allow_compute[k]}
if verbose:
with ProgressBar():
computed_data = dask.compute(computed_data)[0]
else:
computed_data = dask.compute(computed_data)[0]
for k, v in computed_data.items():
self[k] = v
from xeofs.
Indeed, off the top of my head, just some additional thoughts:
In the Sanitizer
- make the sanity check optional for those cases where we know/expect to not have any NaNs (cross-ref #83 )
-
self.is_valid_sample
is computed twice, once infit
and then again intransform
. It is sufficient to compute in transform only and then assign itself.is_valid_sample
if it does not exist yet
-
usenot sure if actually feasible since we can compute valid samples only when we know the valid featuresdask.compute()
to avoid redundant computations forself.is_valid_feature
andself.is_valid_sample
I hoped that in many cases the decomposition result would fit into memory (say if you only need 10 modes). In that case compute=True
is already not too bad. Apart from the Sanitizer
check there shouldn't be any redundant computations.
from xeofs.
Related Issues (20)
- ModuleNotFound Error datatree HOT 5
- `scores()` don't match `transform()` with dask data HOT 4
- Improving release process
- Keep the documentation in sync with the code HOT 3
- Add dependencies section in the documentation HOT 1
- numpy and pandas dependency question
- numpy and pandas dependencies HOT 3
- Migrating repository to xarray-contrib HOT 2
- why the explained_variance_ratio of CCA so small HOT 4
- Serialization fails with `xarray>=2024.1.0` HOT 2
- Single mode data reconstruction fails with normalized scores HOT 9
- Support for complex input data
- MCA incorrect coords alignment in transform method HOT 3
- Is the reconstruction of unseen data possible using EOF? HOT 5
- Model cannot fit DataArray without coordiantes
- Add option `n_modes="all"` to perform the full decomposition HOT 2
- Trouble creating the eof object HOT 8
- TypeError: __init__() missing 1 required positional argument: 'X' HOT 1
- Data with Nan values HOT 5
- Questions to the REOF power parameter in the examples HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xeofs.