Code Monkey home page Code Monkey logo

planetary-computer-containers's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

planetary-computer-containers's Issues

ImportError: libGL.so.1: cannot open shared object file

Hi there, just starting to try out Planetary Computer and it's working pretty well so far! Just bumped into this issue though when trying to import opencv (which is a dependency on one of the packages I'm using).

Specifically, the error is ImportError: libGL.so.1: cannot open shared object file: No such file or directory. Looking at https://stackoverflow.com/questions/55313610/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directo, it seems like there's some package that needs to be installed.

Minimal working example to reproduce, do in a terminal:

mamba install opencv

then in a Jupyter notebook, try:

import cv2

produces

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [1], in <module>
----> 1 import cv2

File /srv/conda/envs/notebook/lib/python3.8/site-packages/cv2/__init__.py:8, in <module>
      5 import importlib
      6 import sys
----> 8 from .cv2 import *
      9 from .cv2 import _registerMatType
     10 from . import mat_wrapper

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

updating environments

Will update each separately for easier review:

Outstanding issues:

  • We can probably remove these and add them to the conda env. Instead of doing this now, I will wait until these essential updates go through and I can submit another PR later for cleanup/tidying.
    odc-algo>=0.2.0a3
    odc-stac>=0.2.0a6
    azure-data-tables
    stac-geoparquet

GPU option with tensorflow

As a team that is using Planetary Computer but relies on Tensorflow for our ML modeling work, it would be great to have a GPU option with Tensorflow (v 2.4.0 or above) so we can easily access PC data and develop models in one place. I tried installing tensorflow into the current GPU option but it doesn't end up working because the CUDA development and runtime libraries aren't installed. On a related note, it would be great to have Tensorflow in the python CPU option as well.

Thanks!

Getting onnxruntime to work with CUDAExecutionProvider on gpu-pytorch container

Hi again, just trying to use onnxruntime to run a neural network as a follow up from #32 (comment). The CPU execution works fine, but it seems that the GPU execution isn't working for some reason.

Steps to reproduce on the gpu-pytorch container.

pip install onnxruntime-gpu

then restart the kernel before running the below

import onnxruntime

print(onnxruntime.__version__)
print(onnxruntime.get_available_providers())
# 1.11.0
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

so it seems to know there there is a CUDA-capable GPU. But when I try to get an onnxruntime session going, it only picks up the CPU. Get a sample .onnx file, e.g. from https://media.githubusercontent.com/media/onnx/models/main/vision/object_detection_segmentation/tiny-yolov2/model/tinyyolov2-7.onnx

ort_session = onnxruntime.InferenceSession(
    path_or_bytes="tinyyolov2-7.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
input_name = ort_session.get_inputs()[0].name
print(input_name)

produces a warning:

2022-04-15 15:09:38.624858540 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Looking at the output of nvidia-smi though, the CUDA version is 11.0 which should be ok if I understand https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements correctly:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000001:00:00.0 Off |                  Off |
| N/A   30C    P8    11W /  70W |      0MiB / 16127MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

So I'm wondering if there's some other library that needs to be added to the container to make onnxruntime's GPU execution work. Maybe related to microsoft/onnxruntime#11092

Another thing I'd like to ask if there's room to get onnxruntime into the gpu-pytorch image? Happy to submit a pull request to add it in.

NVIDIA GPU direct storage

Hi there,

Was thinking if it's possible to enable NVIDIA GPU Direct Storage on Microsoft Planetary Computer? This could enable reading Zarr files directly into GPU memory from cloud storage, and we'd be excited to have a demo use-case running (xref xarray-contrib/xbatcher#87).

Packages that need to be installed:

References:

Might need to check if the Azure cluster supports GPU direct storage first, but if it does, I can open up PRs to add these into the Pytorch and/or Tensorflow containers 😄

Unable to start CUDA Context

I started clean GPU PyTorch instance where I tried to run the tutorial landcover.ipynb, but it failed in:

cluster = LocalCUDACluster(threads_per_worker=4)
2022-06-14 11:31:42,398 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/srv/conda/envs/notebook/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/srv/conda/envs/notebook/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/dask_cuda/initialize.py", line 41, in _create_cuda_context
    ctx = has_cuda_context()
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 120, in has_cuda_context
    running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

I was not sure where to report this issues.

`make` build tool not available

Please could the make utility be added to the standard install so that it is available in the terminal?
which make shows that it isn't available and users do not have permissions to install through apt-get

Rationale: its quite common to use make as a dependency manager for data pipelines (example software carpentry lesson), and its normally part of a standard installaton.

Curious when the Jupyter Hub python container will match the image definitions

Hi,

Wasn't sure where to post this, but I noticed that the python environment on the Jupyter Hub doesn't yet match the package versions in conda lock file.

For example, the conda lock file set-up has stackstac==0.4.0, yet I'm seeing version 0.3.1 of stackstac on the hub environment.

Does it some time for the container filers to be updated?

Thanks, Ogi

update tensorflow

Is there a specific reason why TensorFlow is being pulled from main as opposed to conda-forge? We have TensorFlow 2.7.0 functioning in conda-forge now, so if you don't object, let's update it.

- pkgs/main/linux-64::tensorflow-gpu>=2

Also, in the future, it may be worthwhile to consider pulling in NVIDIA's containers as base instead of pangeo --- nothing against pangeo (I am a big fan!) but for GPU-related activities, I think it's safer to rely on NVIDIA or the package providers (Google/TensorFlow or PyTorch) as they do more rigorous testing.

The issue is that, oftentimes, the set of volunteers at conda-forge (pangeo builds on conda-forge; I am a contributor at conda-forge) cannot keep up with the demanding builds of PyTorch and TensorFlow (currently, PyTorch is good, but we do not have the full ecosystem; for TensorFlow, we have 2.7.0, but we are sort of stuck --- we also don't have the full ecosystem). Any little tweak (especially with TensorFlow, e.g. trying 2.8.0 through pip) will break everything as one would need to stitch together the cuda-related packages. Pulling in a GPU-ready container through NVIDIA or Google will likely be safer, but will almost definitely be larger in terms of storage.

Thanks for the good work!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.