microsoft / planetary-computer-containers Goto Github PK
View Code? Open in Web Editor NEWContainer definitions for the Planetary Computer
License: MIT License
Container definitions for the Planetary Computer
License: MIT License
Hi there, just starting to try out Planetary Computer and it's working pretty well so far! Just bumped into this issue though when trying to import opencv (which is a dependency on one of the packages I'm using).
Specifically, the error is ImportError: libGL.so.1: cannot open shared object file: No such file or directory
. Looking at https://stackoverflow.com/questions/55313610/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directo, it seems like there's some package that needs to be installed.
Minimal working example to reproduce, do in a terminal:
mamba install opencv
then in a Jupyter notebook, try:
import cv2
produces
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Input In [1], in <module>
----> 1 import cv2
File /srv/conda/envs/notebook/lib/python3.8/site-packages/cv2/__init__.py:8, in <module>
5 import importlib
6 import sys
----> 8 from .cv2 import *
9 from .cv2 import _registerMatType
10 from . import mat_wrapper
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
Will update each separately for easier review:
Outstanding issues:
planetary-computer-containers/python/requirements.txt
Lines 1 to 4 in df53aee
As a team that is using Planetary Computer but relies on Tensorflow for our ML modeling work, it would be great to have a GPU option with Tensorflow (v 2.4.0 or above) so we can easily access PC data and develop models in one place. I tried installing tensorflow into the current GPU option but it doesn't end up working because the CUDA development and runtime libraries aren't installed. On a related note, it would be great to have Tensorflow in the python CPU option as well.
Thanks!
Hi again, just trying to use onnxruntime
to run a neural network as a follow up from #32 (comment). The CPU execution works fine, but it seems that the GPU execution isn't working for some reason.
Steps to reproduce on the gpu-pytorch
container.
pip install onnxruntime-gpu
then restart the kernel before running the below
import onnxruntime
print(onnxruntime.__version__)
print(onnxruntime.get_available_providers())
# 1.11.0
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
so it seems to know there there is a CUDA-capable GPU. But when I try to get an onnxruntime session going, it only picks up the CPU. Get a sample .onnx file, e.g. from https://media.githubusercontent.com/media/onnx/models/main/vision/object_detection_segmentation/tiny-yolov2/model/tinyyolov2-7.onnx
ort_session = onnxruntime.InferenceSession(
path_or_bytes="tinyyolov2-7.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
input_name = ort_session.get_inputs()[0].name
print(input_name)
produces a warning:
2022-04-15 15:09:38.624858540 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
Looking at the output of nvidia-smi
though, the CUDA version is 11.0 which should be ok if I understand https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements correctly:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000001:00:00.0 Off | Off |
| N/A 30C P8 11W / 70W | 0MiB / 16127MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
So I'm wondering if there's some other library that needs to be added to the container to make onnxruntime's GPU execution work. Maybe related to microsoft/onnxruntime#11092
Another thing I'd like to ask if there's room to get onnxruntime
into the gpu-pytorch
image? Happy to submit a pull request to add it in.
Hi there,
Was thinking if it's possible to enable NVIDIA GPU Direct Storage on Microsoft Planetary Computer? This could enable reading Zarr files directly into GPU memory from cloud storage, and we'd be excited to have a demo use-case running (xref xarray-contrib/xbatcher#87).
Packages that need to be installed:
References:
Might need to check if the Azure cluster supports GPU direct storage first, but if it does, I can open up PRs to add these into the Pytorch and/or Tensorflow containers 😄
I started clean GPU PyTorch
instance where I tried to run the tutorial landcover.ipynb
, but it failed in:
cluster = LocalCUDACluster(threads_per_worker=4)
2022-06-14 11:31:42,398 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/srv/conda/envs/notebook/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/srv/conda/envs/notebook/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.8/site-packages/dask_cuda/initialize.py", line 41, in _create_cuda_context
ctx = has_cuda_context()
File "/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 120, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
I was not sure where to report this issues.
https://planetarycomputer.microsoft.com/docs/tutorials/landcover/ relies on stac-vrt, but it's missing from the environments here.
arrow 6.0.0 was just released. We should update it for the python and R images in the next build (conda-forge/r-arrow-feedstock#43).
The R bindings include a nice way to work with duckdb, which is available through conda-forge as r-duckdb.
conda-forge/staged-recipes#13992 was an attempt to package with conda-forge. Ran into an issue described in mlverse/torch#341 (perhaps something about how mlverse/torch is bundling binaries?). Doesn't look straightforward to do via conda / conda-forge.
Perhaps we can get some / all of the binaries from the Linux binaries RStudio provides.
Please could the make
utility be added to the standard install so that it is available in the terminal?
which make
shows that it isn't available and users do not have permissions to install through apt-get
Rationale: its quite common to use make
as a dependency manager for data pipelines (example software carpentry lesson), and its normally part of a standard installaton.
Hi,
Wasn't sure where to post this, but I noticed that the python environment on the Jupyter Hub doesn't yet match the package versions in conda lock file.
For example, the conda lock file set-up has stackstac==0.4.0, yet I'm seeing version 0.3.1 of stackstac on the hub environment.
Does it some time for the container filers to be updated?
Thanks, Ogi
Is there a specific reason why TensorFlow is being pulled from main as opposed to conda-forge? We have TensorFlow 2.7.0 functioning in conda-forge now, so if you don't object, let's update it.
Also, in the future, it may be worthwhile to consider pulling in NVIDIA's containers as base instead of pangeo --- nothing against pangeo (I am a big fan!) but for GPU-related activities, I think it's safer to rely on NVIDIA or the package providers (Google/TensorFlow or PyTorch) as they do more rigorous testing.
The issue is that, oftentimes, the set of volunteers at conda-forge (pangeo builds on conda-forge; I am a contributor at conda-forge) cannot keep up with the demanding builds of PyTorch and TensorFlow (currently, PyTorch is good, but we do not have the full ecosystem; for TensorFlow, we have 2.7.0, but we are sort of stuck --- we also don't have the full ecosystem). Any little tweak (especially with TensorFlow, e.g. trying 2.8.0 through pip) will break everything as one would need to stitch together the cuda-related packages. Pulling in a GPU-ready container through NVIDIA or Google will likely be safer, but will almost definitely be larger in terms of storage.
Thanks for the good work!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.