Code Monkey home page Code Monkey logo

Comments (4)

rabernat avatar rabernat commented on June 10, 2024 2

This is a tricky issue. One problem we have in our stack is that we currently outsourced nearly all actual parallelism to Dask. (The one exception to this is fsspec's async capabilities, which are hidden behind a separate thread housing an async event loop.)

Ideally, there would be one single runtime responsible for actually implementing concurrent data access and I/O. If all the libraries implemented async methods, then that could be placed completely in the user's responsibility, i.e. you could right code like

async def my_processing_function():
    await xr.open_dataset(...)
    # which would call
    await zarr.open_group(...)
    # which would call
    await object_store.get_object(...)

The user would be responsible for starting an event loop and running the coroutine. The event loop would manage the concurrency for the whole stack and everything would be fine.

In Zarr we are in the process of adding the async methods. That begs the question...should Xarray add them too?

If not, then Xarray has to decide how to call async code. It could use the fsspec approach of managing an async event loop on another thread. It could manage a threadpool of its own. How would these interact with Dask / fsspec / Zarr / etc. The futures approach proposed here is one example of how to add concurrency within Xarray.

I feel like this conundrum really illustrates the limitations of the modularity that we value so much from our stack. I have no idea what the "right" answer is. However, my perspective has been greatly influenced by writing Tokio Rust code, which does not suffer from this delegation problem. It's a very different situation from Python.

from xarray.

rabernat avatar rabernat commented on June 10, 2024

Would that be compatible with async stores?

from xarray.

TomNicholas avatar TomNicholas commented on June 10, 2024

This idea of passing an arbitrary concurrent executor to xarray seems potentially related to #7810, which suggests allowing open_mfdataset(parallel=true) to use something other than dask.delayed to parallelize the opening of the files.

from xarray.

dcherian avatar dcherian commented on June 10, 2024

FWIW this appears to do what I wanted with Zarr at least, i.e. issue concurrent loads per variable.

def concurrent_compute(ds: xr.Dataset) -> xr.Dataset:
    from concurrent.futures import ThreadPoolExecutor, as_completed

    copy = ds.copy()

    def load_variable_data(name: str, var: xr.Variable) -> np.ndarray:
        return (name, var.compute().data)

    with ThreadPoolExecutor(max_workers=None) as executor:
        futures = [
            executor.submit(load_variable_data, k, v) for k, v in copy.variables.items()
        ]
        for future in as_completed(futures):
            name, loaded = future.result()
            copy.variables[name].data = loaded
    return copy

concurrent_compute(ds)

from xarray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.