Code Monkey home page Code Monkey logo

ngff-zarr's Introduction

ngff-zarr

PyPI - Version PyPI - Python Version Test DOI Documentation Status


A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.

✨ Features

  • Minimal dependencies
  • Work with arbitrary Zarr store types
  • Lazy, parallel, and web ready -- no local filesystem required
  • Process extremely large datasets
  • Multiple downscaling methods
  • Supports Python>=3.8
  • Implements version 0.4 of the OME-Zarr NGFF specification

Documentation

More information an command line usage, the Python API, library features, and how to contribute can be found in our documentation.

See also

License

ngff-zarr is distributed under the terms of the MIT license.

ngff-zarr's People

Contributors

tbirdso avatar thewtex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ngff-zarr's Issues

Dask task graphs are huge (non-performant) for large data

I think this will be a tougher issue to solve than typos, math bugs, and formatting; so it will warrant some conversation with @thewtex and @tbirdso.

I have ngff-zarr running well on small in-memory sized datasets (e.g. 512x512x512 voxels). When formatted with the nested directory style dimension separator these files open just fine with the napari-ome-zarr plugin (just my first check for compatibility with other tools - I'll use other visualization software later as well). So my next task was to run on real data.

My real image is {'z': 1596, 'y': 8983, 'x': 5451} voxels and I am attempting the following:

ngff_image = to_ngff_image(
    my_big_zarr_array,
    dims=('z', 'y', 'x'),
    scale={a:b for a, b in zip('zyx', my_voxel_spacing)},
    axes_units={a:'micrometer' for a in 'zyx'},
)

multiscales = to_multiscales(
    ngff_image,
    scale_factors=[{'z':4, 'y':8, 'x':8}],
    chunks=(64, 128, 128),
)

to_ngff_zarr(
    './my_data_as_ome_ngff_zarr.zarr',
    'multiscales,
)

Importantly, this does actually work. If I wait long enough I get the file I want. However there is a performance issue that IMO will be lethal for the package if not resolved. For this (4, 8, 8) downsample level, the task graph has about 2.5M tasks. The dask distributed scheduler really can't handle graphs of this size well, which it acknowledges with a warning that the task graph itself is about 220MB when compressed, which it knows is huge, and says it will cause long slow downs.

It takes about 45 minutes for the scheduler to finish parsing all those tasks (just about right with the 2.5M tasks and the documented 0.001 seconds per task), and during this time the workers are allocated but idle. And this has to happen for every scale level. In a real scenario I might have 10 or more scale levels I want to create. So that's a huge amount of time to be paying for cpus that are just waiting for the scheduler to finish parsing the task graph. Making matters worse, this is actually only a small to medium size image. We could be running this on things that are much much bigger.

I feel that there are three different options for approaching this problem:
(1) small hacks, maybe something will help
(2) optimize the task graph by reworking the series of dask array operations to simplify simplify simplify
(3) abandon dask in as many places as possible

I tried three different things for (1):
(a) ran dask.array.optimize on the big graph (only eliminated about 10K tasks and at the expense of running this function which was long)
(b) forced even in-memory size scale levels to run through the serialization stuff (resulted in 6 task graphs each about 500K tasks, so the total time when breaking the problem up was even longer)
(c) forced every scale to run through this "Minimize task graph depth" block (just resulted in the same graph as before)

So now I want to decide between options (2) and (3). I'd love for there to be a solution with (2). It would be great to stay consistent with the sort of intended dask workflow and also to not have to refactor your package too much. But while building my package bigstream, and running into the same kind of huge task graph problems, I ultimately settled on (3) and everything got really performant really fast. After all, we're just doing a bunch of smoothing and downsampling here. It's not rocket science. Having all of that be consistent with dask.array and coordinated (or micromanaged) by a scheduler is nice, but not if it turns the program into a bureaucracy.

So, sorry for the long message, but I'd love to discuss this with you guys to determine where to go next.

IPython shell command failure

Overview

Attempting to run ngff-zarr from an IPython cell in Jupyter Notebook yields an error.

Steps to Reproduce

In a Jupyter Notebook cell:

!ngff-zarr -i "path/to/image.nii.gz" -o "path/to/image.zarr"

Expected behavior

CLI runs and OME-Zarr image is generated successfully

Observed behavior

Traceback (most recent call last):
  File "C:\venvs\venv-itk\lib\site-packages\rich\live.py", line 122, in start
    self.refresh()
  File "C:\venvs\venv-itk\lib\site-packages\rich\live.py", line 241, in refresh
    with self.console:
  File "C:\venvs\venv-itk\lib\site-packages\rich\console.py", line 864, in __exit__
    self._exit_buffer()
  File "C:\venvs\venv-itk\lib\site-packages\rich\console.py", line 822, in _exit_buffer
    self._check_buffer()
  File "C:\venvs\venv-itk\lib\site-packages\rich\console.py", line 2027, in _check_buffer
    legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
  File "C:\venvs\venv-itk\lib\site-packages\rich\_windows_renderer.py", line 17, in legacy_windows_render
    term.write_styled(text, style)
  File "C:\venvs\venv-itk\lib\site-packages\rich\_win32_console.py", line 442, in write_styled
    self.write_text(text)
  File "C:\venvs\venv-itk\lib\site-packages\rich\_win32_console.py", line 403, in write_text
    self.write(text)
  File "C:\Users\tom.birdsong\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\tom.birdsong\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\tom.birdsong\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\venvs\venv-itk\Scripts\ngff-zarr.exe\__main__.py", line 7, in <module>
  File "C:\venvs\venv-itk\lib\site-packages\ngff_zarr\cli.py", line 192, in main
    with Live(initial, console=console) as live:
  File "C:\venvs\venv-itk\lib\site-packages\rich\live.py", line 166, in __enter__
    self.start(refresh=self._renderable is not None)
  File "C:\venvs\venv-itk\lib\site-packages\rich\live.py", line 128, in start
    self.stop()
  File "C:\venvs\venv-itk\lib\site-packages\rich\live.py", line 147, in stop
    with self.console:
  File "C:\venvs\venv-itk\lib\site-packages\rich\console.py", line 864, in __exit__
    self._exit_buffer()
  File "C:\venvs\venv-itk\lib\site-packages\rich\console.py", line 822, in _exit_buffer
    self._check_buffer()
  File "C:\venvs\venv-itk\lib\site-packages\rich\console.py", line 2027, in _check_buffer
    legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
  File "C:\venvs\venv-itk\lib\site-packages\rich\_windows_renderer.py", line 17, in legacy_windows_render
    term.write_styled(text, style)
  File "C:\venvs\venv-itk\lib\site-packages\rich\_win32_console.py", line 442, in write_styled
    self.write_text(text)
  File "C:\venvs\venv-itk\lib\site-packages\rich\_win32_console.py", line 403, in write_text
    self.write(text)
  File "C:\Users\tom.birdsong\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

Additional Notes

Copying and pasting the same command into a shell outside of Jupyter Notebook results in expected execution:

ngff-zarr -i "path/to/image.nii.gz" -o "path/to/image.zarr"
╭─────────────────────────────────────────────────── NGFF OME-Zarr ────────────────────────────────────────────────────╮
│   4/4 0:00:14 Writing scales ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00                                   │
╰───────────────────────────────────────────────────── generation ─────────────────────────────────────────────────────╯

Document anti-aliasing filters

Methods list, theory, implementations, label vs intensity image, performance-fidelity trade-offs, hardware limitations.

Describe recursive downsampling strategy as suggested by @GFleishman in #40.

set downsampling factors in the cli

I noticed that the default behavior of the cli appears to be anisotropic downsampling, e.g. x and y dimensions get downsampled but not z. Is there a way to change this via the cli? I have isotropic data, and so I want to downsample it isotropically.

Image data not displayed in napari

Creating an ome-ngff-zarr image with this package, then dragging into napari does not produce the expected result. An image layer does open, but no data is displayed. More details here: ome/napari-ome-zarr#91

This is solved, just submitting here for documentation purposes. Will submit a PR with the fix.

Enable kvikio GDSStore in cli

Provide as a option to enable in the cli.

See test/test_to_ngff_zarr_kvikio.py.

More interesting once the serialized result is compatible with other zarr implementations.

In the kvikio zarr notebook, it mentions:

Some algorithms, such as LZ4, can be used interchangeably on CPU and GPU but Zarr will always use the compressor used to write the Zarr file. We are working with the Zarr team to fix this shortcoming but for now, we will use a workaround where we patch the metadata manually.

xref:

CC @madsbk @ivirshup @jakirkham @joshmoore

Fail to open converted zarr file in Neuroglancer

Overview

After converting a NIFTI file to Zarr with the NGFF-Zarr CLI I am unable to open the resulting file in Neuroglancer due to unrecognized metadata.

Expected Behavior

Data loads in neuroglancer

Observed Behavior

Error parsing "axes" property: Error parsing "unit" property: Unsupported unit: null

Steps to Reproduce

  1. ngff-zarr -i ./Ex_561_Em_593_res4_registered.nii.gz -o ./Ex_561_Em_593_res4_registered.zarr
  2. Upload Ex_561_Em_593_res4_registered.zarr to s3 bucket
  3. Attempt to view in neuroglancer

Other Notes

  • NIFTI data has spacing of [0.025, 0.025, 0.025] implied to be in units of millimeters
  • Output .zmetadata file verified to contain null units (see attachment for complete output):
    zmetadata.txt
{
    "metadata": {
        ".zattrs": {
            "multiscales": [
                {
                    "@type": "ngff:Image",
                    "axes": [
                        {
                            "name": "z",
                            "type": "space",
                            "unit": null
                        },
                        {
                            "name": "y",
                            "type": "space",
                            "unit": null
                        },
                        {
                            "name": "x",
                            "type": "space",
                            "unit": null
                        }
                    ],
...
}
  • Need to determine whether null is a valid axis unit in the NGFF-Zarr spec. If so, find a workaround for working with Neuroglancer; if not, update NGFF-Zarr or its dependencies to adhere to the specification.

Unit Test Failures

Overview

When running unit tests in the recent [email protected] release I see several unit tests failures, apparently due to missing or misversioned packages.

Steps to Reproduce

Follow unit tests instructions in README:

pip install -e ".[test,dask-image,itk,cli]"
pytest

Errors Description

test\test_cli_input_to_ngff_image.py ....F                                                                       [ 17%]
test\test_detect_cli_input_backend.py ...                                                                        [ 27%]
test\test_from_ngff_zarr.py ..                                                                                   [ 34%]
test\test_itk_image_to_ngff_image.py .....                                                                       [ 51%]
test\test_large_serialization.py .                                                                               [ 55%]
test\test_memory_usage.py .                                                                                      [ 58%]
test\test_ngff_image_scale_factors.py ..                                                                         [ 65%]
test\test_ngff_image_to_itk_image.py F..F.F                                                                      [ 86%]
test\test_task_count.py .                                                                                        [ 89%]
test\test_to_ngff_zarr_dask_image.py .                                                                           [ 93%]
test\test_to_ngff_zarr_itk.py ..                                                                                 [100%]
====================================================== FAILURES =======================================================
________________________________________ test_cli_input_to_ngff_image_imageio _________________________________________

input_images = {'brain_two_components': NgffImage(data=dask.array<array, shape=(250, 350, 300, 2), dtype=int16, chunksize=(250, 350, ...: 0.0}, name='image', axes_units=None), 'lung_series': WindowsPath('C:/repos/ngff-zarr/test/data/input/lung_series/*')}

    def test_cli_input_to_ngff_image_imageio(input_images):
        input = [test_data_dir / "input" / "cthead1.png",]
>       image = cli_input_to_ngff_image(ConversionBackend.IMAGEIO, input)

test\test_cli_input_to_ngff_image.py:33:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

backend = <ConversionBackend.IMAGEIO: 'imageio'>
input = [WindowsPath('C:/repos/ngff-zarr/test/data/input/cthead1.png')], output_scale = 0

    def cli_input_to_ngff_image(backend: ConversionBackend, input, output_scale: int=0) -> NgffImage:
...
        elif backend is ConversionBackend.IMAGEIO:
            try:
                import imageio
            except ImportError:
                print('[red]Please install the [i]imageio[/i] package.')
                sys.exit(1)
>           import imageio.v3 as iio
E           ModuleNotFoundError: No module named 'imageio.v3'

ngff_zarr\cli_input_to_ngff_image.py:57: ModuleNotFoundError
________________________________________________ test_2d_itkwasm_image ________________________________________________

input_images = {'brain_two_components': NgffImage(data=dask.array<array, shape=(250, 350, 300, 2), dtype=int16, chunksize=(250, 350, ...: 0.0}, name='image', axes_units=None), 'lung_series': WindowsPath('C:/repos/ngff-zarr/test/data/input/lung_series/*')}

    def test_2d_itkwasm_image(input_images):
        itk_image = itk.imread(test_data_dir / "input" / "cthead1.png")
        itk_image_dict = itk.dict_from_image(itk_image)
        itkwasm_image = itkwasm.Image(**itk_image_dict)
        ngff_image = itk_image_to_ngff_image(itkwasm_image)
>       itkwasm_image_back = ngff_image_to_itk_image(ngff_image)

test\test_ngff_image_to_itk_image.py:54:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ngff_image = NgffImage(data=dask.array<array, shape=(256, 256), dtype=uint8, chunksize=(256, 256), chunktype=numpy.ndarray>, dims=('y', 'x'), scale={'y': 1.0, 'x': 1.0}, translation={'y': 0.0, 'x': 0.0}, name='image', axes_units=None)
wasm = True

    def ngff_image_to_itk_image(
        ngff_image: NgffImage,
        wasm: bool = True,
        ):

>       from itkwasm import IntTypes, PixelTypes
E       ImportError: cannot import name 'IntTypes' from 'itkwasm' (C:\Users\tom.birdsong\Anaconda3\envs\venv-itk\lib\site-packages\itkwasm\__init__.py)

ngff_zarr\ngff_image_to_itk_image.py:35: ImportError
__________________________________________________ test_2d_itk_image __________________________________________________

input_images = {'brain_two_components': NgffImage(data=dask.array<array, shape=(250, 350, 300, 2), dtype=int16, chunksize=(250, 350, ...: 0.0}, name='image', axes_units=None), 'lung_series': WindowsPath('C:/repos/ngff-zarr/test/data/input/lung_series/*')}

    def test_2d_itk_image(input_images):
        itk_image = itk.imread(test_data_dir / "input" / "cthead1.png")
        ngff_image = itk_image_to_ngff_image(itk_image)
>       itk_image_back = ngff_image_to_itk_image(ngff_image, wasm=False)

test\test_ngff_image_to_itk_image.py:12:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ngff_image = NgffImage(data=dask.array<array, shape=(256, 256), dtype=uint8, chunksize=(256, 256), chunktype=numpy.ndarray>, dims=('y', 'x'), scale={'y': 1.0, 'x': 1.0}, translation={'y': 0.0, 'x': 0.0}, name='image', axes_units=None)
wasm = False

    def ngff_image_to_itk_image(
        ngff_image: NgffImage,
        wasm: bool = True,
        ):

>       from itkwasm import IntTypes, PixelTypes
E       ImportError: cannot import name 'IntTypes' from 'itkwasm' (C:\Users\tom.birdsong\Anaconda3\envs\venv-itk\lib\site-packages\itkwasm\__init__.py)

ngff_zarr\ngff_image_to_itk_image.py:35: ImportError

Versions

ngff-zarr [email protected]
Python 3.8.5
Windows 10

Improper intermediate downscale visible size

Overview

Observed an issue where itkwidgets.view(image) with three levels shows incorrect spacing for Level 1, while Level 0 and Level 2 appear correctly.

Level 0 (original image)

scale0

Level 1 (improper scale, pancake image)

scale1

Level 2 (apparently correct scale, lowest resolution)

scale2

Versions

itkwidgets[all]==1.0a32
itk==v5.3.0

Steps to Reproduce

3D image file: https://drive.google.com/file/d/1rKB7a65EIPvtNYclea1C7cvOhGhbCG_0/view?usp=share_link

itkwidgets.view(image)

Additional Notes

Original posted in itkwidgets: InsightSoftwareConsortium/itkwidgets#655

Observed after converting from NIFTI to OME-Zarr with ngff-zarr CLI. Image below shows where top-right chunk matches expected image bounds (yellow) but remaining chunks appear larger than expected. It appears that the issue is limited to Level 1 downsampling, whereas Levels 0 and 2 for the image appear as expected within the image bounds.

neuroglancer-ngff-zarr-scaling

cc @thewtex

add imagecodecs dependency

Would there be any objection to adding imagecodecs as a dependency here? I'm asking because I tried this library on some of our tiff files and tifffile failed to load them because of a missing LZW codec. Installing imagecodecs solved the problem. Happy to submit a PR if there's interest.

ngff_zarr.to_ngff_zarr: "ValueError: path containing '.' or '..' segment not allowed"

Hi, thanks for this great package.

Just wanted to report here a problem I ran into when trying to read and write an example dataset available on https://idr.github.io/ome-ngff-samples/.

import ngff_zarr

ngff_multiscales = ngff_zarr.from_ngff_zarr(
    'https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0101A/13457537.zarr')

ngff_zarr.to_ngff_zarr('test.zarr', ngff_multiscales)

Reading from S3 works nicely, but while writing to disk zarr complains about

ValueError: path containing '.' or '..' segment not allowed

It seems that ngff_zarr.from_ngff_zarr doesn't like dataset paths at the top level (such as '0' and '1' in the example file).

Empty `multiscales/multiscaleTransformations` metadata field

Expected Behavior

Under the ome-ngff 0.4 schema, any NGFF multiscaleTransformations entry must have at least one transformation sub-item.

https://github.com/ome/ngff/blob/main/0.4/schemas/image.schema#L169

Observed Behavior

ngff-zarr outputs a multiscales/coordinateTransformations entry with zero transformations in .zattrs and .zmetadata.

Sample .zattrs:

{
    "multiscales": [
        {
            "@type": "ngff:Image",
            "axes": [
                ...
            ],
            "coordinateTransformations": [],          # <---------- here
            "datasets": [
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                0.01600000075995922,
                                0.014399999752640724,
                                0.014399999752640724
                            ],
                            "type": "scale"
                        },
                        {
                            "translation": [
                                7.4079999923706055,
                                18.417600631713867,
                                13.3056001663208
                            ],
                            "type": "translation"
                        }
                    ],
                    "path": "scale0/image"
                },
               ...

Steps to Reproduce

> ngff-zarr -i "image.nii.gz" -o "image.zarr" -u "x" "millimeter" "y" "millimeter" "z" "millimeter" -m itk_gaussian

Additional Notes

Out-of-spec multiscaleTransformations entry currently results in failure to read image with https://github.com/InsightSoftwareConsortium/ITKIOOMEZarrNGFF . Can be worked around by manually deleting the multiscales/multiscaleTransformations entry in .zattrs.

dask-image dependency missing?

@thewtex thanks a lot for writing this library!

Running the below code I am are getting an error at the last line (see code).

# dependencies
# mamba create -n ngff-zarr python=3.9
# pip install 'ngff-zarr'

import ngff_zarr

# open
input_multiscales = ngff_zarr.from_ngff_zarr("/Users/tischer/Documents/ome-zarr-image-analysis-nextflow/data/xy_8bit__nuclei_PLK1_control.ome.zarr")

# inspect one resolution
num_resolutions = len(input_multiscales.images)
image = input_multiscales.images[0]

data = image.data
scale = image.scale
translation = image.translation
units = image.axes_units

# process one resolution
numpy_array = data.compute()
print("some pixel value:", numpy_array[0, 0, 0, 1, 1])
spatial_indices = [index for index, value in enumerate(image.dims) if value in ('x', 'y', 'z')]
numpy_array = numpy_array + 2
print("same pixel after adding 2:", numpy_array[0, 0, 0, 1, 1])

# save one resolution
output_image = ngff_zarr.to_ngff_image(numpy_array, image.dims, image.scale, image.translation, "processed", image.axes_units )

# Error: ModuleNotFoundError: No module named 'dask_image'
# Fix: pip install dask-image
output_multiscales = ngff_zarr.to_multiscales(output_image, 1)

Am I doing something wrong or should dask-image be a dependency of ngff-zarr?

Document `NGFF-Zarr` Specification

It is unclear from the README what version of the OME-NGFF specification is implemented by this project, though it appears to be 0.4.
This is important for understanding what features are supported, such as planned rotation coordinate transformations in the upcoming 0.5 spec.

It would be helpful to add discussion to the README linking to the OME-NGFF specification and possibly describing some of the relevant features.

Multiscales bug

Seems like a scale level is being skipped? My expectation for this example is scale levels with sizes 512, 256, 128, and 64. But scale3 is 32. ome_ngff_zarr.zarr/.zattrs also shows voxel spacings of: 1, 2, 4, 16 - so consistent with the voxel grid sizes - but not what I thought I would get.

Screen Shot 2023-07-12 at 3 12 09 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.