microsoft / planetary-computer-containers Goto Github PK

View Code? Open in Web Editor NEW

51.0 5.0 11.0 782 KB

Container definitions for the Planetary Computer

License: MIT License

Dockerfile 13.91% Python 22.05% Shell 40.23% Makefile 23.81%

planetary-computer-containers's People

Stargazers

Watchers

Forkers

standardgalactic pomadchin tomaugspurger ngam ping-p-yang giswqs floriscalkoen kevhainfo giorgiobasile aminsafri kbodolai

planetary-computer-containers's Issues

ImportError: libGL.so.1: cannot open shared object file

Hi there, just starting to try out Planetary Computer and it's working pretty well so far! Just bumped into this issue though when trying to import opencv (which is a dependency on one of the packages I'm using).

Specifically, the error is ImportError: libGL.so.1: cannot open shared object file: No such file or directory. Looking at https://stackoverflow.com/questions/55313610/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directo, it seems like there's some package that needs to be installed.

Minimal working example to reproduce, do in a terminal:

mamba install opencv

then in a Jupyter notebook, try:

import cv2

produces

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [1], in <module>
----> 1 import cv2

File /srv/conda/envs/notebook/lib/python3.8/site-packages/cv2/__init__.py:8, in <module>
      5 import importlib
      6 import sys
----> 8 from .cv2 import *
      9 from .cv2 import _registerMatType
     10 from . import mat_wrapper

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

updating environments

Will update each separately for easier review:

pytorch #48
python #49
tensorflow #45

Outstanding issues:

We can probably remove these and add them to the conda env. Instead of doing this now, I will wait until these essential updates go through and I can submit another PR later for cleanup/tidying.

planetary-computer-containers/python/requirements.txt

Lines 1 to 4 in df53aee

    
           odc-algo>=0.2.0a3 
        
           odc-stac>=0.2.0a6 
        
           azure-data-tables 
        
           stac-geoparquet

GPU option with tensorflow

As a team that is using Planetary Computer but relies on Tensorflow for our ML modeling work, it would be great to have a GPU option with Tensorflow (v 2.4.0 or above) so we can easily access PC data and develop models in one place. I tried installing tensorflow into the current GPU option but it doesn't end up working because the CUDA development and runtime libraries aren't installed. On a related note, it would be great to have Tensorflow in the python CPU option as well.

Thanks!

Getting onnxruntime to work with CUDAExecutionProvider on gpu-pytorch container

Hi again, just trying to use onnxruntime to run a neural network as a follow up from #32 (comment). The CPU execution works fine, but it seems that the GPU execution isn't working for some reason.

Steps to reproduce on the gpu-pytorch container.

pip install onnxruntime-gpu

then restart the kernel before running the below

import onnxruntime

print(onnxruntime.__version__)
print(onnxruntime.get_available_providers())
# 1.11.0
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

so it seems to know there there is a CUDA-capable GPU. But when I try to get an onnxruntime session going, it only picks up the CPU. Get a sample .onnx file, e.g. from https://media.githubusercontent.com/media/onnx/models/main/vision/object_detection_segmentation/tiny-yolov2/model/tinyyolov2-7.onnx

ort_session = onnxruntime.InferenceSession(
    path_or_bytes="tinyyolov2-7.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
input_name = ort_session.get_inputs()[0].name
print(input_name)

produces a warning:

2022-04-15 15:09:38.624858540 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Looking at the output of nvidia-smi though, the CUDA version is 11.0 which should be ok if I understand https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements correctly:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000001:00:00.0 Off |                  Off |
| N/A   30C    P8    11W /  70W |      0MiB / 16127MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

So I'm wondering if there's some other library that needs to be added to the container to make onnxruntime's GPU execution work. Maybe related to microsoft/onnxruntime#11092

Another thing I'd like to ask if there's room to get onnxruntime into the gpu-pytorch image? Happy to submit a pull request to add it in.

NVIDIA GPU direct storage

Hi there,

Was thinking if it's possible to enable NVIDIA GPU Direct Storage on Microsoft Planetary Computer? This could enable reading Zarr files directly into GPU memory from cloud storage, and we'd be excited to have a demo use-case running (xref xarray-contrib/xbatcher#87).

Packages that need to be installed:

nvidia-gds (via apt)
kvikIO (via conda)

References:

Might need to check if the Azure cluster supports GPU direct storage first, but if it does, I can open up PRs to add these into the Pytorch and/or Tensorflow containers 😄

Unable to start CUDA Context

I started clean GPU PyTorch instance where I tried to run the tutorial landcover.ipynb, but it failed in:

cluster = LocalCUDACluster(threads_per_worker=4)

2022-06-14 11:31:42,398 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/srv/conda/envs/notebook/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/srv/conda/envs/notebook/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/dask_cuda/initialize.py", line 41, in _create_cuda_context
    ctx = has_cuda_context()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 120, in has_cuda_context
    running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

I was not sure where to report this issues.

Add stac-vrt

https://planetarycomputer.microsoft.com/docs/tutorials/landcover/ relies on stac-vrt, but it's missing from the environments here.

Add voilà

https://github.com/voila-dashboards/voila

Add arrow 6.0, r-duckdb

arrow 6.0.0 was just released. We should update it for the python and R images in the next build (conda-forge/r-arrow-feedstock#43).

The R bindings include a nice way to work with duckdb, which is available through conda-forge as r-duckdb.

Add torch backend for sits [R]

conda-forge/staged-recipes#13992 was an attempt to package with conda-forge. Ran into an issue described in mlverse/torch#341 (perhaps something about how mlverse/torch is bundling binaries?). Doesn't look straightforward to do via conda / conda-forge.

Perhaps we can get some / all of the binaries from the Linux binaries RStudio provides.

`make` build tool not available

Please could the make utility be added to the standard install so that it is available in the terminal?
which make shows that it isn't available and users do not have permissions to install through apt-get

Rationale: its quite common to use make as a dependency manager for data pipelines (example software carpentry lesson), and its normally part of a standard installaton.

Curious when the Jupyter Hub python container will match the image definitions

Hi,

Wasn't sure where to post this, but I noticed that the python environment on the Jupyter Hub doesn't yet match the package versions in conda lock file.

For example, the conda lock file set-up has stackstac==0.4.0, yet I'm seeing version 0.3.1 of stackstac on the hub environment.

Does it some time for the container filers to be updated?

Thanks, Ogi

update tensorflow

Is there a specific reason why TensorFlow is being pulled from main as opposed to conda-forge? We have TensorFlow 2.7.0 functioning in conda-forge now, so if you don't object, let's update it.

planetary-computer-containers/gpu-tensorflow/environment.yml

Line 108 in b5e229f

- pkgs/main/linux-64::tensorflow-gpu>=2

Also, in the future, it may be worthwhile to consider pulling in NVIDIA's containers as base instead of pangeo --- nothing against pangeo (I am a big fan!) but for GPU-related activities, I think it's safer to rely on NVIDIA or the package providers (Google/TensorFlow or PyTorch) as they do more rigorous testing.

The issue is that, oftentimes, the set of volunteers at conda-forge (pangeo builds on conda-forge; I am a contributor at conda-forge) cannot keep up with the demanding builds of PyTorch and TensorFlow (currently, PyTorch is good, but we do not have the full ecosystem; for TensorFlow, we have 2.7.0, but we are sort of stuck --- we also don't have the full ecosystem). Any little tweak (especially with TensorFlow, e.g. trying 2.8.0 through pip) will break everything as one would need to stitch together the cuda-related packages. Pulling in a GPU-ready container through NVIDIA or Google will likely be safer, but will almost definitely be larger in terms of storage.

Thanks for the good work!!

	odc-algo>=0.2.0a3
	odc-stac>=0.2.0a6
	azure-data-tables
	stac-geoparquet

microsoft / planetary-computer-containers Goto Github PK

planetary-computer-containers's People

Stargazers

Watchers

Forkers

planetary-computer-containers's Issues

Recommend Projects

Recommend Topics

Recommend Org