Code Monkey home page Code Monkey logo

Comments (12)

jcmgray avatar jcmgray commented on August 22, 2024

@AdrianSosic

this package is really fantastics, it solves exactly the problems that I've been struggling with for years! Thank for your work!

Thanks! Glad its useful.


With regards to your query - it would definitely be a nice feature to have but nothing is implemented yet. For harvesting, if you want some kind of staged progress you could manually loop over some of values in combos:

combos = {'a': range(10), 'b': range(10)}
for a_val in combos.pop('a'):
    harvester.harvest_combos({**combos, 'a': [a_val]})

An kwarg like save_every={'a': 2} might be a nice api to do this automatically (i.e. slice the 'a' values in steps of 2).

On the other hand, Crop feels like the more natural place to put functionality relating to this kind of persistence. One approach here would be a reap method that defaults to an all-nan result for missing results and ofc doesn't delete the disk data afterwards. Everytime you ran this it would happily merge in the new data only.

Would either of those suit?

from xyzpy.

AdrianSosic avatar AdrianSosic commented on August 22, 2024

Hey again, thanks for your immediate answer!

Concerning the first idea:
Yes, in principle it solves the problem. Yet, it comes with a number of drawbacks:

  • It's an external workaround, making the code less compact again and thus reducing the benefits of using xyzpy in the first place
    --> the suggested kwarg option would be nice
  • Also, in many scenarios, I would like to be able to access the results after each case, which would require as many external loops as there are dimensions in order to cover the entire product space of combinations
    --> in this case, there would be no more reason to use the package since its purpose is exactly to take over this task
  • More importantly, by external looping, I loose the built-in functionality of parallelization
  • If manually parallelize the external loop(s), am I guaranteed that the package handles the file access correctly, i.e., that there will be no data loss when the different processes write to the same file?

Concerning the second idea:
Yes, this would be a perfectly suited solution. Are you planning to add this feature in the future? Nevertheless, it would be nice to have a simple solution like the first one in addition, especially for working only on one machine.

from xyzpy.

jcmgray avatar jcmgray commented on August 22, 2024

For the moment I think it would make sense to add this functionality to Crop, both for the reasons you list and I think just conceptually.

Should also be quite easy, the only slightly tricky part maybe is inferring the sequence of shapes of the all-nan result. I might try adding something in the next few days, unless you want to give it a try?

from xyzpy.

AdrianSosic avatar AdrianSosic commented on August 22, 2024

I wish I could contribute but, as I said, I've just started using xarray/xyzpy and don't feel confident enough to work on the underlying code since I have not yet fully understood all details of the packages =/ Maybe I can help in the future when I have more experience with them!

from xyzpy.

jcmgray avatar jcmgray commented on August 22, 2024

Of course no worries, I would like to use this functionality so will add shortly. If you have any preference/ideas for the API let me know. I was thinking of something along the lines of:

crop.reap(allow_incomplete=True)

from xyzpy.

jcmgray avatar jcmgray commented on August 22, 2024

@AdrianSosic I've added this functionality in 2ce4895. If you get a chance, let me know if it's working for you.

from xyzpy.

AdrianSosic avatar AdrianSosic commented on August 22, 2024

Hey, thanks a lot for adding the functionality. I think the allow_incomplete option should be fine. One issue that could cause problems is when the function itself returns np.nan for some inputs. An alternative might be to use masked arrays (or to provide the option to choose the default value).

However, I am getting some unexpected behavior:

  • One thing that is a bit weird is that, when I try to grow some sown combos using a new Crop object (e.g. one that was created on a different machine), I need to explicitly pass the correct batchsize information again. Otherwise, I get the error can't multiply sequence by non-int of type 'NoneType'. Shouldn't the batchsize be automatically loaded from the sown combos on the disk?
  • When I load the incomplete result using c.reap(allow_incomplete=True), I get a tuple of Datasets instead of a single merged Dataset.

Any thoughts?

from xyzpy.

jcmgray avatar jcmgray commented on August 22, 2024

Hmm, if you could provide minimal examples that would be enormously helpful - as well as debugging they would be good starting points for unit tests.

With regard to using np.nan, this is really set because its what pandas and xarray use for missing data. Things like merging datasets would be much trickier if another value was used.

from xyzpy.

AdrianSosic avatar AdrianSosic commented on August 22, 2024

Sure, here is an example:

Run the following code to let the seeds grow:

import xyzpy as xyz
import xarray as xr
from time import sleep


def fn(a, b):
    if b == 3:
        sleep(10000)
    y = xr.Dataset({'sum': a+b, 'diff': a-b})
    return y

combos = dict(
    a=[1],
    b=[1, 2, 3]
)

runner = xyz.Runner(fn, var_names=None)
harvester = xyz.Harvester(runner, 'test.h5')
crop = harvester.Crop(name='fn', batchsize=1)
crop.sow_combos(combos)
for i in range(1, 4):
    crop.grow(i)

If you then, while the code is running, access the intermediate results via

import xyzpy as xyz

c = xyz.Crop(name='fn', batchsize=1)
X = c.reap(allow_incomplete=True)

you get as output a one-element tuple containing a tuple of Datasets.

Moreover, if you remove the kwarg batchsize=1 in the latter crop, you receive the error can't multiply sequence by non-int of type 'NoneType'.

from xyzpy.

jcmgray avatar jcmgray commented on August 22, 2024

Thanks for the example! I think both are fixed by automatically loading on-disk information if it exists for any new crop. I'll push an update once I have a test in place shortly.

from xyzpy.

jcmgray avatar jcmgray commented on August 22, 2024

@AdrianSosic, this should be fixed in 1eb2b1d, let me know if it's not working for you.

from xyzpy.

AdrianSosic avatar AdrianSosic commented on August 22, 2024

@jcmgray, great, seems to work perfectly, thanks! When I encounter any other issues, I will let you know!

from xyzpy.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.