The current implementation of the coiled-runtime meta

Thanks for raising this issue <a class="user-mention notranslate" data-hovercard-type=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Allow for optional dependencies about benchmarks HOT 6 CLOSED

hayesgb commented on September 18, 2024 1

Allow for optional dependencies

from benchmarks.

Comments (6)

jrbourbeau commented on September 18, 2024 2

I took a quick look at this by comparing a few cluster spinup times with the existing coiled-runtime=0.0.3 release (i.e. the current default software environment):

%time cluster = coiled.Cluster()

and a modified version of coiled-runtime=0.0.3 that doesn't include libraries not strictly needed on the cluster, which I'm calling coiled-runtime-core. Specifically, using this conda environment file:

name: coiled-runtime-core
channels:
- conda-forge
dependencies:
- python ==3.9
- pip
- coiled
# - nodejs ==17.8.0
# - nb_conda_kernels ==2.3.1
- numpy ==1.21.5
- pandas ==1.3.5
- dask ==2022.1.0
- distributed ==2022.1.0
- fsspec ==2022.3.0
- s3fs ==2022.3.0
- gcsfs ==2022.3.0
- pyarrow ==7.0.0
- python-snappy ==0.6.0
# - jupyterlab ==3.3.2
# - dask-labextension ==5.2.0
- lz4 ==4.0.0
# - ipywidgets ==7.7.0
- numba ==0.55.1
- scikit-learn ==1.0.2
# - python-graphviz ==0.19.1
- click ==8.0.0
- xarray ==0.20.2
- zarr ==2.11.3

and

%time cluster = coiled.Cluster(software="jrbourbeau/coiled-runtime-core")

The corresponding cluster spin up times were:

coiled-runtime: 2min 12s, 2min 16s, 1min 57s
coiled-runtime-core: 2min 11s, 2min 2s, 1min 51s

To me these times look identical given the spread in spinup times for a single, specified software environment.

Because of this I think we should stick with a single coiled-runtime metapackage, at least for now. Thoughts from others?

from benchmarks.

jrbourbeau commented on September 18, 2024 1

Perhaps it could be difficult to ensure exactly same versions of all the non-optional packages

I think this issue talking about something different. There are some packages which are already included and pinned in the coiled-runtime like jupyterlab and dask-labextension

https://github.com/coiled/coiled-runtime/blob/1fbfc124a6f497855b767511b668a7c59010a9ca/recipe/meta.yaml#L37-L38

that are commonly used alongside Dask, but don't need to be installed on cluster because they are purely needed client-side. This is nice because users don't need to worry about manually installing these packages, but comes at a cost because these extra packages will contribute to overall cluster startup times. This issue is concerned specifically with these sorts of packages

from benchmarks.

jrbourbeau commented on September 18, 2024 1

Sounds good. I'm proposing we add a benchmark which monitors how long it takes a cluster to spin up over in #172. This will help inform future decisions around adding new packages / optional dependencies

from benchmarks.

jrbourbeau commented on September 18, 2024

Thanks for raising this issue @hayesgb. Totally agree that packages like jupyterlab and matplotlib generally don't need to be installed on workers or schedulers. Have we tried comparing cluster spinup times with and without, for example, jupyterlab? I'm curious about how much this slows cluster spinup. For example, is this a 30 second impact (where removing jupyterlab would be a big win) or a 3 second impact?

from benchmarks.

SultanOrazbayev commented on September 18, 2024

Perhaps it could be difficult to ensure exactly same versions of all the non-optional packages... for some use-cases, it might also be tricky, if user has a custom function that (as example) generates and saves some matplotlib image. If this is submitted to a worker that doesn't have matlplotlib, then the function will fail.

from benchmarks.

ncclementi commented on September 18, 2024

@jrbourbeau this sound very reasonable. Let's stick with what we have for now and re-evaluate in the future if we have more packages.

from benchmarks.

Allow for optional dependencies about benchmarks HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent