Comments (8)
concerning compute_persistence_pm
I found a way: I transfer the initialisation time axis into the index values of control.time, then I can easily add lags to the index.
dim='time'
nlags=ds.time.size
init_month_index=0 # januar init, 10 for november
init_years = ds['initialization'].values
init_cftimes = []
for year in init_years:
init_cftimes.append(control.sel(time=str(year)).isel(time=init_month_index).time)
init_cftimes=xr.concat(init_cftimes,'time')
init_index=[]
l=list(control.time.values)
for i, inits in enumerate(init_cftimes.time.values):
init_index.append(l.index(init_cftimes[i]))
plag=[]
from xskillscore import pearson_r,rmse
metric=pearson_r
for lag in range(1, 1 + nlags):
inits_index_plus_lag = [x+lag for x in inits_index]
ref = control.isel({dim: inits_index_plus_lag})
fct = control.isel({dim: inits_index})
ref[dim] = fct[dim]
plag.append(metric(ref, fct, dim=dim))
pers_new = xr.concat(plag, 'time')
pers_new['time'] = np.arange(1, 1 + nlags)
from climpred.
Decide if we want to maintain a separate significance level for init/uninit and persistence. If this is the case, a "quantile_persistence" and "quantile_ensemble" or something similar dimension distinction should be made to make plotting easy. The graphics plot was breaking in the notebook if the significance levels were different.
I want to keep different significance levels for the calculation. For the plotting I havent implemented that. Now raises an error.
I am unsure still how to gather all the results into a dataset nicely (I prefer here only the variable name as one data_var).
Fix pytest to deal with datetime[ns]. (I think this was the problem you identified?)
Changed compute_persistence_pm
. Didnt adapt compute_persistence
yet. But should be easily adaptable.
Revise the perfect_model notebook with your new bootstrap functions so that the whole thing compiles as you see fit.
Compiles.
from climpred.
separate significance level: actually it shouldnt be different levels. although its much harder to beat the persistence forecast than the uninit one in the first lead years.
from climpred.
I like the consolidated approach, but it leads to some data_vars which will only contains nan. as soon as we put them all in a dataset, all dimensions are available to all dataArrays and many will end up as nan.
In theory we should aviod useless nan fields.
As we use Datasets, this should in the end allow users to get results for more variables. Therefore I would opt for a result where we only have one data_vars=variable and the rest of information somehow be stored in the coordinates.
But somehow you also have that kind of problem because you have a threshold for you p-value from the z-score (at least implicit) and then decide whether p-value is acceptable or not.
from climpred.
I like the consolidated approach, but it leads to some data_vars which will only contains nan. as soon as we put them all in a dataset, all dimensions are available to all dataArrays and many will end up as nan. In theory we should aviod useless nan fields.
I think xarray
handles this with its broadcasting. So when it goes into a dataset, the DataArrays only maintain the dimensions they have going in. The only time NaNs appear is if the dimensions mismatch, like when quartile
mismatched with different significance levels.
As we use Datasets, this should in the end allow users to get results for more variables. Therefore I would opt for a result where we only have one data_vars=variable and the rest of information somehow be stored in the coordinates.
Yeah I agree that the current implementation isn't perfect. Although see my discussion comments in https://github.com/bradyrx/climpred/pull/86. I think the current bootstrap_perfect_model
does too much. It should only bootstrap, i.e., return a bootstrapped form of the control run. Currently it has switches to do all sorts of significance testing which can get wrapped into the class-based system. Perhaps just bootstrapping each variable in a dataset will prevent these issues above.
from climpred.
I also had the idea to somehow put the p-values and CIs in coordinates.
How would you split up bootstrap_perfect_model
? I thought about it but didnt really come to better idea yet:
- If we only do bootstrapping we get very large arrays (1000xnlonxnlat). I didnt want to keep these. Therefore I just calc the CIs and p_value to return these.
What do you mean but bootstrapped form?
It should only bootstrap, i.e., return a bootstrapped form of the control run.
Totally agree on:
Currently it has switches to do all sorts of significance testing which can get wrapped into the class-based system.
Well we could split up the part of for _ in range(bootstrap):
into a function, but there is few things to do with that output. Therefore in bootstrap_pm
I just extract p_value and CI.
We could just write the function that it does only one comparison: vs persistence or vs uninitialized. but persistence and uninitialized have different dimensions anyway (persistence has lead years, uninitialized not really in a meaningful way). therefore I just put them all in this big function.
from climpred.
What do you mean but bootstrapped form?
I think I was confusing this with the _pseudo_ens
function. My thinking was so that you don't have to run _pseudo_ens
many times, you could append it to a special category on the PerfectModelEnsemble
object that can be referenced. Since I understand it as generating an ensemble of the same dimensions as the initialized, but to simulate an uninitialized form.
How would you split up bootstrap_perfect_model? I thought about it but didnt really come to better idea yet: If we only do bootstrapping we get very large arrays (1000xnlonxnlat). I didnt want to keep these. Therefore I just calc the CIs and p_value to return these.
I see now that bootstrap_perfect_model
is mainly just to get CIs and p_values. So perhaps it just returns an object with the same lat/lon dimensions with variables "p", "upper", and "lower" for confidence intervals or something similar. But this would only work for DataArrays I think.
Well we could split up the part of for _ in range(bootstrap): into a function, but there is few things to do with that output. Therefore in bootstrap_pm I just extract p_value and CI.
We could just write the function that it does only one comparison: vs persistence or vs uninitialized. but persistence and uninitialized have different dimensions anyway (persistence has lead years, uninitialized not really in a meaningful way). therefore I just put them all in this big function.
Agreed on these points. You can clean it up a little bit in https://github.com/bradyrx/climpred/pull/87, but don't worry too much about it. Let's get the bootstrapping working, persistence fixed, pytest, etc. in https://github.com/bradyrx/climpred/pull/87 and then get the object-oriented system merged. Then with the object-oriented system we can work on cleaning things up "under the hood".
from climpred.
Implemented in https://github.com/bradyrx/climpred/pull/87
from climpred.
Related Issues (20)
- `climpred.classes.HindcastEnsemble.bootstrap` dim `member` and `init` simultaneously HOT 3
- HindcastEnsemble.verify() fails when valid_time of different object time than time HOT 3
- Refactor doctests for xarray 2022.06.01 HOT 1
- `HindcastEnsemble.remove_bias` with only one `lead` element drops `lead` dimension
- change `skill(init, lead)` to `skill(valid_time, lead)` to visualize forecasting barrier and reverse HOT 1
- make tqdm optional
- `HindcastEnsemble.resample()` working on `time` and `lead` HOT 1
- ⚠️ upstream-dev CI: `test_PerfectModelEnsemble_plot_bootstrapped_skill_over_leadyear` yerr negative HOT 2
- Using datatrees to represent datasets HOT 3
- Notebook downloading forecasts with herbie
- ValueError 'init' because it is not a variable or dimension in this dataset HOT 2
- Package dependency troubles with `python 3.11` HOT 7
- alignment=same_verifs throws CoordinateError HOT 1
- Reporting a vulnerability HOT 1
- Issue on page /quick-start.html HOT 1
- Implementing mean error as a verification metric? HOT 1
- `xclim.DetrendedQuantileMapping` with `train_test_split="unfair"` failing HOT 5
- Duplicate Classification entry
- New release of Climpred ? HOT 9
- Numerous documentation build warnings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from climpred.