Hi jcmgray, this package is really fantastics, it solves exactly the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hmm, if you could provide <a href="http://matthewrocklin.com/blog/work/2018/02/28/mini

Accessing intermediate results about xyzpy HOT 12 CLOSED

AdrianSosic commented on August 22, 2024

Accessing intermediate results

from xyzpy.

Comments (12)

jcmgray commented on August 22, 2024

@AdrianSosic

this package is really fantastics, it solves exactly the problems that I've been struggling with for years! Thank for your work!

Thanks! Glad its useful.

With regards to your query - it would definitely be a nice feature to have but nothing is implemented yet. For harvesting, if you want some kind of staged progress you could manually loop over some of values in combos:

combos = {'a': range(10), 'b': range(10)}
for a_val in combos.pop('a'):
    harvester.harvest_combos({**combos, 'a': [a_val]})

An kwarg like save_every={'a': 2} might be a nice api to do this automatically (i.e. slice the 'a' values in steps of 2).

On the other hand, Crop feels like the more natural place to put functionality relating to this kind of persistence. One approach here would be a reap method that defaults to an all-nan result for missing results and ofc doesn't delete the disk data afterwards. Everytime you ran this it would happily merge in the new data only.

Would either of those suit?

from xyzpy.

AdrianSosic commented on August 22, 2024

Hey again, thanks for your immediate answer!

Concerning the first idea:
Yes, in principle it solves the problem. Yet, it comes with a number of drawbacks:

It's an external workaround, making the code less compact again and thus reducing the benefits of using xyzpy in the first place
--> the suggested kwarg option would be nice
Also, in many scenarios, I would like to be able to access the results after each case, which would require as many external loops as there are dimensions in order to cover the entire product space of combinations
--> in this case, there would be no more reason to use the package since its purpose is exactly to take over this task
More importantly, by external looping, I loose the built-in functionality of parallelization
If manually parallelize the external loop(s), am I guaranteed that the package handles the file access correctly, i.e., that there will be no data loss when the different processes write to the same file?

Concerning the second idea:
Yes, this would be a perfectly suited solution. Are you planning to add this feature in the future? Nevertheless, it would be nice to have a simple solution like the first one in addition, especially for working only on one machine.

from xyzpy.

jcmgray commented on August 22, 2024

For the moment I think it would make sense to add this functionality to Crop, both for the reasons you list and I think just conceptually.

Should also be quite easy, the only slightly tricky part maybe is inferring the sequence of shapes of the all-nan result. I might try adding something in the next few days, unless you want to give it a try?

from xyzpy.

AdrianSosic commented on August 22, 2024

I wish I could contribute but, as I said, I've just started using xarray/xyzpy and don't feel confident enough to work on the underlying code since I have not yet fully understood all details of the packages =/ Maybe I can help in the future when I have more experience with them!

from xyzpy.

jcmgray commented on August 22, 2024

Of course no worries, I would like to use this functionality so will add shortly. If you have any preference/ideas for the API let me know. I was thinking of something along the lines of:

crop.reap(allow_incomplete=True)

from xyzpy.

jcmgray commented on August 22, 2024

@AdrianSosic I've added this functionality in 2ce4895. If you get a chance, let me know if it's working for you.

from xyzpy.

AdrianSosic commented on August 22, 2024

Hey, thanks a lot for adding the functionality. I think the allow_incomplete option should be fine. One issue that could cause problems is when the function itself returns np.nan for some inputs. An alternative might be to use masked arrays (or to provide the option to choose the default value).

However, I am getting some unexpected behavior:

One thing that is a bit weird is that, when I try to grow some sown combos using a new Crop object (e.g. one that was created on a different machine), I need to explicitly pass the correct batchsize information again. Otherwise, I get the error can't multiply sequence by non-int of type 'NoneType'. Shouldn't the batchsize be automatically loaded from the sown combos on the disk?
When I load the incomplete result using c.reap(allow_incomplete=True), I get a tuple of Datasets instead of a single merged Dataset.

Any thoughts?

from xyzpy.

jcmgray commented on August 22, 2024

Hmm, if you could provide minimal examples that would be enormously helpful - as well as debugging they would be good starting points for unit tests.

With regard to using np.nan, this is really set because its what pandas and xarray use for missing data. Things like merging datasets would be much trickier if another value was used.

from xyzpy.

AdrianSosic commented on August 22, 2024

Sure, here is an example:

Run the following code to let the seeds grow:

import xyzpy as xyz
import xarray as xr
from time import sleep


def fn(a, b):
    if b == 3:
        sleep(10000)
    y = xr.Dataset({'sum': a+b, 'diff': a-b})
    return y

combos = dict(
    a=[1],
    b=[1, 2, 3]
)

runner = xyz.Runner(fn, var_names=None)
harvester = xyz.Harvester(runner, 'test.h5')
crop = harvester.Crop(name='fn', batchsize=1)
crop.sow_combos(combos)
for i in range(1, 4):
    crop.grow(i)

If you then, while the code is running, access the intermediate results via

import xyzpy as xyz

c = xyz.Crop(name='fn', batchsize=1)
X = c.reap(allow_incomplete=True)

you get as output a one-element tuple containing a tuple of Datasets.

Moreover, if you remove the kwarg batchsize=1 in the latter crop, you receive the error can't multiply sequence by non-int of type 'NoneType'.

from xyzpy.

jcmgray commented on August 22, 2024

Thanks for the example! I think both are fixed by automatically loading on-disk information if it exists for any new crop. I'll push an update once I have a test in place shortly.

from xyzpy.

jcmgray commented on August 22, 2024

@AdrianSosic, this should be fixed in 1eb2b1d, let me know if it's not working for you.

from xyzpy.

AdrianSosic commented on August 22, 2024

@jcmgray, great, seems to work perfectly, thanks! When I encounter any other issues, I will let you know!

from xyzpy.

Accessing intermediate results about xyzpy HOT 12 CLOSED

Comments (12)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent