allencellmodeling / aicsimageio Goto Github PK

View Code? Open in Web Editor NEW

198.0 10.0 51.0 177.24 MB

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

Home Page: https://allencellmodeling.github.io/aicsimageio

License: Other

Python 98.95% Makefile 0.29% Jupyter Notebook 0.76%

python imageio microscopy bio-formats image-metadata scientific-computing scientific-formats xarray dask

aicsimageio's Introduction

AICSImageIO

Warning

AICSImageIO is now in maintenance mode only. Please take a look at its compatible successor bioio (see here for migration guide)

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python

Features

Supports reading metadata and imaging data for:
- OME-TIFF
- TIFF
- ND2 -- (pip install aicsimageio[nd2])
- DV -- (pip install aicsimageio[dv])
- CZI -- (pip install aicspylibczi>=3.1.1 fsspec>=2022.8.0)
- LIF -- (pip install readlif>=0.6.4)
- PNG, GIF, etc. -- (pip install aicsimageio[base-imageio])
- Files supported by Bio-Formats -- (pip install aicsimageio bioformats_jar) (Note: requires java and maven, see below for details.)
Supports writing metadata and imaging data for:
- OME-TIFF
- PNG, GIF, etc. -- (pip install aicsimageio[base-imageio])
Supports reading and writing to fsspec supported file systems wherever possible:
- Local paths (i.e. my-file.png)
- HTTP URLs (i.e. https://my-domain.com/my-file.png)
- s3fs (i.e. s3://my-bucket/my-file.png)
- gcsfs (i.e. gcs://my-bucket/my-file.png)
See Cloud IO Support for more details.

Installation

Stable Release: pip install aicsimageio
Development Head: pip install git+https://github.com/AllenCellModeling/aicsimageio.git

AICSImageIO is supported on Windows, Mac, and Ubuntu. For other platforms, you will likely need to build from source.

Extra Format Installation

TIFF and OME-TIFF reading and writing is always available after installing aicsimageio, but extra supported formats can be optionally installed using [...] syntax.

For a single additional supported format (e.g. ND2): pip install aicsimageio[nd2]
For a single additional supported format (e.g. ND2), development head: pip install "aicsimageio[nd2] @ git+https://github.com/AllenCellModeling/aicsimageio.git"
For a single additional supported format (e.g. ND2), specific tag (e.g. v4.0.0.dev6): pip install "aicsimageio[nd2] @ git+https://github.com/AllenCellModeling/[email protected]"
For faster OME-TIFF reading with tile tags: pip install aicsimageio[bfio]
For multiple additional supported formats: pip install aicsimageio[base-imageio,nd2]
For all additional supported (and openly licensed) formats: pip install aicsimageio[all]
Due to the GPL license, LIF support is not included with the [all] extra, and must be installed manually with pip install aicsimageio readlif>=0.6.4
Due to the GPL license, CZI support is not included with the [all] extra, and must be installed manually with pip install aicsimageio aicspylibczi>=3.1.1 fsspec>=2022.8.0
Due to the GPL license, Bio-Formats support is not included with the [all] extra, and must be installed manually with pip install aicsimageio bioformats_jar. Important!! Bio-Formats support also requires a java and mvn executable in the environment. The simplest method is to install bioformats_jar from conda: conda install -c conda-forge bioformats_jar (which will additionally bring openjdk and maven packages).

Documentation

For full package documentation please visit allencellmodeling.github.io/aicsimageio.

Quickstart

Full Image Reading

If your image fits in memory:

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.tiff")  # selects the first scene found
img.data  # returns 5D TCZYX numpy array
img.xarray_data  # returns 5D TCZYX xarray data array backed by numpy
img.dims  # returns a Dimensions object
img.dims.order  # returns string "TCZYX"
img.dims.X  # returns size of X dimension
img.shape  # returns tuple of dimension sizes in TCZYX order
img.get_image_data("CZYX", T=0)  # returns 4D CZYX numpy array

# Get the id of the current operating scene
img.current_scene

# Get a list valid scene ids
img.scenes

# Change scene using name
img.set_scene("Image:1")
# Or by scene index
img.set_scene(1)

# Use the same operations on a different scene
# ...

Full Image Reading Notes

The .data and .xarray_data properties will load the whole scene into memory. The .get_image_data function will load the whole scene into memory and then retrieve the specified chunk.

Delayed Image Reading

If your image doesn't fit in memory:

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.tiff")  # selects the first scene found
img.dask_data  # returns 5D TCZYX dask array
img.xarray_dask_data  # returns 5D TCZYX xarray data array backed by dask array
img.dims  # returns a Dimensions object
img.dims.order  # returns string "TCZYX"
img.dims.X  # returns size of X dimension
img.shape  # returns tuple of dimension sizes in TCZYX order

# Pull only a specific chunk in-memory
lazy_t0 = img.get_image_dask_data("CZYX", T=0)  # returns out-of-memory 4D dask array
t0 = lazy_t0.compute()  # returns in-memory 4D numpy array

# Get the id of the current operating scene
img.current_scene

# Get a list valid scene ids
img.scenes

# Change scene using name
img.set_scene("Image:1")
# Or by scene index
img.set_scene(1)

# Use the same operations on a different scene
# ...

Delayed Image Reading Notes

The .dask_data and .xarray_dask_data properties and the .get_image_dask_data function will not load any piece of the imaging data into memory until you specifically call .compute on the returned Dask array. In doing so, you will only then load the selected chunk in-memory.

Mosaic Image Reading

Read stitched data or single tiles as a dimension.

Readers that support mosaic tile stitching:

LifReader
CziReader

AICSImage

If the file format reader supports stitching mosaic tiles together, the AICSImage object will default to stitching the tiles back together.

img = AICSImage("very-large-mosaic.lif")
img.dims.order  # T, C, Z, big Y, big X, (S optional)
img.dask_data  # Dask chunks fall on tile boundaries, pull YX chunks out of the image

This behavior can be manually turned off:

img = AICSImage("very-large-mosaic.lif", reconstruct_mosaic=False)
img.dims.order  # M (tile index), T, C, Z, small Y, small X, (S optional)
img.dask_data  # Chunks use normal ZYX

If the reader does not support stitching tiles together the M tile index will be available on the AICSImage object:

img = AICSImage("some-unsupported-mosaic-stitching-format.ext")
img.dims.order  # M (tile index), T, C, Z, small Y, small X, (S optional)
img.dask_data  # Chunks use normal ZYX

Reader

If the file format reader detects mosaic tiles in the image, the Reader object will store the tiles as a dimension.

If tile stitching is implemented, the Reader can also return the stitched image.

reader = LifReader("ver-large-mosaic.lif")
reader.dims.order  # M, T, C, Z, tile size Y, tile size X, (S optional)
reader.dask_data  # normal operations, can use M dimension to select individual tiles
reader.mosaic_dask_data  # returns stitched mosaic - T, C, Z, big Y, big, X, (S optional)

Single Tile Absolute Positioning

There are functions available on both the AICSImage and Reader objects to help with single tile positioning:

img = AICSImage("very-large-mosaic.lif")
img.mosaic_tile_dims  # Returns a Dimensions object with just Y and X dim sizes
img.mosaic_tile_dims.Y  # 512 (for example)

# Get the tile start indices (top left corner of tile)
y_start_index, x_start_index = img.get_mosaic_tile_position(12)

Metadata Reading

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.tiff")  # selects the first scene found
img.metadata  # returns the metadata object for this file format (XML, JSON, etc.)
img.channel_names  # returns a list of string channel names found in the metadata
img.physical_pixel_sizes.Z  # returns the Z dimension pixel size as found in the metadata
img.physical_pixel_sizes.Y  # returns the Y dimension pixel size as found in the metadata
img.physical_pixel_sizes.X  # returns the X dimension pixel size as found in the metadata

Xarray Coordinate Plane Attachment

If aicsimageio finds coordinate information for the spatial-temporal dimensions of the image in metadata, you can use xarray for indexing by coordinates.

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("my_file.ome.tiff")

# Get the first ten seconds (not frames)
first_ten_seconds = img.xarray_data.loc[:10]  # returns an xarray.DataArray

# Get the first ten major units (usually micrometers, not indices) in Z
first_ten_mm_in_z = img.xarray_data.loc[:, :, :10]

# Get the first ten major units (usually micrometers, not indices) in Y
first_ten_mm_in_y = img.xarray_data.loc[:, :, :, :10]

# Get the first ten major units (usually micrometers, not indices) in X
first_ten_mm_in_x = img.xarray_data.loc[:, :, :, :, :10]

See xarray "Indexing and Selecting Data" Documentation for more information.

Cloud IO Support

File-System Specification (fsspec) allows for common object storage services (S3, GCS, etc.) to act like normal filesystems by following the same base specification across them all. AICSImageIO utilizes this standard specification to make it possible to read directly from remote resources when the specification is installed.

from aicsimageio import AICSImage

# Get an AICSImage object
img = AICSImage("http://my-website.com/my_file.tiff")
img = AICSImage("s3://my-bucket/my_file.tiff")
img = AICSImage("gcs://my-bucket/my_file.tiff")

# Or read with specific filesystem creation arguments
img = AICSImage("s3://my-bucket/my_file.tiff", fs_kwargs=dict(anon=True))
img = AICSImage("gcs://my-bucket/my_file.tiff", fs_kwargs=dict(anon=True))

# All other normal operations work just fine

Remote reading requires that the file-system specification implementation for the target backend is installed.

For s3: pip install s3fs
For gs: pip install gcsfs

See the list of known implementations.

Saving to OME-TIFF

The simpliest method to save your image as an OME-TIFF file with key pieces of metadata is to use the save function.

from aicsimageio import AICSImage

AICSImage("my_file.czi").save("my_file.ome.tiff")

Note: By default aicsimageio will generate only a portion of metadata to pass along from the reader to the OME model. This function currently does not do a full metadata translation.

For finer grain customization of the metadata, scenes, or if you want to save an array as an OME-TIFF, the writer class can also be used to customize as needed.

import numpy as np
from aicsimageio.writers import OmeTiffWriter

image = np.random.rand(10, 3, 1024, 2048)
OmeTiffWriter.save(image, "file.ome.tif", dim_order="ZCYX")

See OmeTiffWriter documentation for more details.

Other Writers

In most cases, AICSImage.save is usually a good default but there are other image writers available. For more information, please refer to our writers documentation.

Benchmarks

AICSImageIO is benchmarked using asv. You can find the benchmark results for every commit to main starting at the 4.0 release on our benchmarks page.

Development

See our developer resources for information related to developing the code.

Citation

If you find aicsimageio useful, please cite this repository as:

Eva Maxfield Brown, Dan Toloudis, Jamie Sherman, Madison Swain-Bowden, Talley Lambert, AICSImageIO Contributors (2021). AICSImageIO: Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python [Computer software]. GitHub. https://github.com/AllenCellModeling/aicsimageio

bibtex:

@misc{aicsimageio,
  author    = {Brown, Eva Maxfield and Toloudis, Dan and Sherman, Jamie and Swain-Bowden, Madison and Lambert, Talley and {AICSImageIO Contributors}},
  title     = {AICSImageIO: Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Pure Python},
  year      = {2021},
  publisher = {GitHub},
  url       = {https://github.com/AllenCellModeling/aicsimageio}
}

Free software: BSD-3-Clause

(The LIF component is licensed under GPLv3 and is not included in this package) (The Bio-Formats component is licensed under GPLv2 and is not included in this package) (The CZI component is licensed under GPLv3 and is not included in this package)

aicsimageio's People

Contributors

Stargazers

Watchers

Forkers

toloudis ieivanov mehta-lab joshmoore marcusgj13 dimi-huer zeroth jburel ianhi trendingtechnology jrussell25 renovate-testing psobolewskiphd diascar1 bkntr nhpatterson brisvag mundzuk zwdiscover emay2022 nicholas-schaub griffinfujioka mrocklin tlambert03 koenderinklab allenneuraldynamics henrypinkard colobas lukaslcf jasonyu1 prasadbandodkar wisamreid ennamarie19 armavica sameeul yichechang d-s-jokhun healthonrails jplumail jackyko1991 hungvo304ml alanocallaghan m-albert sophiamaedler gauravgardi alisterburt-forks joshua-gould

aicsimageio's Issues

Store `.data` and interrupt future `.dask_data` calls once it is available

So that we don't have to recompute or get the data after someone has already called .data store it and interrupt dask_data calls with it.

Dependency for #56

Allow passing `range` or a sequence of integers to image data getters

Use Case

Please provide a use case to help us understand your request in context
Currently when requesting chunk or slice data from AICSImage.get_image_data and dask_data, the user can only specify a single integer for the dimensions not explicitly specified I.E.

AICSImage.get_image_data("ZYX", S=0, T=0, C=0)

That is the limit of the request / parameter format. However, there is an obvious use case, esp with timeseries data for range requests.

AICSImage.get_image_data("TZYX", S=0, T=range(0, 100, 5) C=0)

This means, gather every fifth timepoint and merge into a single array.

Solution

Please describe your ideal solution
The above use case gives some examples but here are more:

Get the first and second channels

AICSImage.get_image_data("CZYX", S=0, T=0, C=range(0, 2))

Get the first and second channels using a Tuple[int] of indices

AICSImage.get_image_data("CZYX", S=0, T=0, C=(0, 1))

Get the first and second channels using a List[int] of indices

AICSImage.get_image_data("CZYX", S=0, T=0, C=[0, 1])

The last one is an idea as I believe there may be value in it when you combine metadata:

AICSImage.get_image_data(
    "CZYX",
    S=0,
    T=0,
    C=[channels.index(c) for c in channels if c in ["EGFP", "RFP"]]
)

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them
Using normal dask and numpy array slicing:

img = AICSImage("my_file.tiff")
lazy_chunk = img.get_image_dask_data("ZYX", S=0, T=0, C=0)
middle_slice = lazy_chunk[20:30, :].compute()

It's fine and I wouldn't discourage people from doing it, but I don't see why adding the range / list of index functionality is bad either.

Other Comments

If the user provides a Sequence[int] or range to a dimension, they must also have that dimension in the dim_order string parameter. I.E. the following would error because Z is requested as a range, but isn't in the dim_order string:

AICSImage.get_image_data("YX", S=0, T=0, C=0, Z=range(10))

Optimize Delayed Reads

Use Case

Please provide a use case to help us understand your request in context
Currently, many of the readers currently reopen and reindex the file on individual chunk reads but while large files this is a serious optimization hit. It would be great to reduce the number of times the file needs to be indexed as much as possible.

Solution

Please describe your ideal solution
There is a proposal that on Reader initialization, construct a numpy array that stores the buffer offsets to the chunks. So if you have a 4D image and your chunks are 2D large, you could store a 2D numpy array with the buffer offsets that get passes to an actual read_chunk function. Not much has been done to experiment with this but in the process of seeing how it may work.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them
No others have appeared as front-runners. But note that the numpy array of buffer offsets may work for some file formats and may not work for others. We can optimize readers independent from one another.

Discussion: Readers holding onto both the file pointer and the entire data block

Currently in v3.0.* all of the aicsimageio.readers submodules / classes actually hold onto the file pointer after reading the data in, this isn't the problem, the problem is that they also hold onto the entire already read data block which leads to a more than doubling of memory when using AICSImage. After discussion with @toloudis, this is because when we originally designed the AICSImage to keep using self.reader operations to access data for the user.

A minimal example to show that there is at least a doubling of memory when using AICSImage:

from aicsimageio import AICSImage
im = AICSImage("aicsimageio/tests/resources/s_1_t_1_c_10_z_1.ome.tiff")
id(im.reader.data)  # returns memory location of reader data numpy array
id(im.data.shape)  # returns memory location of AICSImage data numpy array

With @toloudis most recent PR (#44), to address adding a context manager to AICSImage, and after brief discussion with @heeler, I think it is perfectly acceptable and desirable to encourage the following behavior:

from aicsimageio import AICSImage
with AICSImage("aicsimageio/tests/resources/s_1_t_1_c_10_z_1.ome.tiff") as im:
    # some operation

But, I think we should make any operation that is done by the reader be an on-demand operation. If someone calls self.reader.data the reader object uses the open file pointer to re-read the data block instead of having it stored in memory. AICSImage will still store the transformed data, so that users have fast access to the transformed data but if they really want to use the reader, then they will have to read the file each time.

API "UX" changes suggested by @heeler that I agree with, would be changing reader.data to reader.get_data() and reader.metadata to reader.get_metadata() for two reasons:

It makes it much more explicit about "Hey this will be reading the file again"
It sets us up nicely for the long term goal of chunked reading (Ex.reader.get_data(Z=0, T=1))

Thoughts?

Rewrite DefaultReader to use dask

Update DefaultReader to use dask and support more file types in a delayed fashion. Specifically, do delayed frame reads for iterable image types.

Required for #56

Add local cluster spawner

There was interest in having a utility function with config options passthrough to spawn a local dask cluster so that the dask operations have short computation time.

Rewrite NumpyReader to ArrayLikeReader to support Dask

Very simply, just support more data types passed to NumpyReader.

Required for #56

Deprecate AICSImage and Reader LocalCluster spawning

This would allow the distributed dependencies would be entirely encapsulated by dask_utils so that the barebones library wouldn't have any weird importing and management.

Make `OmeTiffWriter` use filename as image name when `None` provided

Currently all the images produced when no specific image name is provided default to IMAGE0, this is fine but minor enhancement would be to use the filename without the suffixes passed into the writer.

with writers.OmeTiffWriter("my_file.ome.tiff") as im_writer:
    im_writer.save(data)

Produced OME-XML has my_file_0 as the first image name.

lxml objects can't be pickled / serialized and distributed to dask workers

After calling any image with underlying CZIReader object which in turn uses lxml, because the _Element is held on to, the AICSImage can't be serialized.

Rewrite Writers Module

Stabilize the API, fix bugs, add documentation, make it easy to add new writers if an when needed, etc. More issues related to each individual writer to come.

Write generalized "Given some reader function" produce dask array

After reviewing #99 and seeing that it's _daread is an exact copy of CZIReader._daread, it would be useful to expose a _daread function on the base reader so that Reader authors only have to write the _imread function as long as it conforms to spec. This is an entirely optional implementation. The authors can still implement their own readers however they want but so far there are two readers with the same code and I suspect it will only grow as we address #89

Fix OME-Tiff metadata parsing to actually use dims

Discovered by @donovanr ome-tiff files are correctly parsing metadata to detect accurate dim sizes but they aren't using that information to actually provide the AICSImage container with the accurate dim's.

im = AICSImage(some_ome_file)
im.dims  # returns "STCZYX"
im.data.shape  # returns [1, 1, 1, 10, 1768, 1768]
im.reader.size_c  # returns 10

AICSImage known_dims kwarg does nothing

The imread function in this library forwards its kwargs to AICSImage class. There is a documented kwarg on the AICSImage constructor called known_dims, but it looks like it doesn't do anything. If it is not implemented yet, it should be removed from the docstrings (or else be implemented).
I would also propose calling it "dims", at least at the imread level.

Allow `OmeTiffWriter` `image_name` parameter to be a `List[str]` for multi-scene support

Similar to #50, for future proofing when we allow multi-scene OME-Tiff writing, we need to allow or create multiple image names for the OME-XML.

Fix Generalizable Id Generator

The current ID Generation template only runs if the calling object (node) contains an @Id attribute. We should be able to call the template and get an ID attribute back out regardless of if the node has a pre-existing @Id attribute or not.

Add view napari utility function

Add napari as interactive dependency and raise error if not installed with message saying "run pip install .[interactive]"

Add back dummy context manager functionality to AICSImage object

To maintain current API with 3.0.* I need to add a context manager back to the feature/use-dask branch and PR that does nothing but initializes the object and logs a deprecated warning.

CZI dimensions not being read properly for multi scene

Reproducible using the largest czi in the resources directory:

from aicsimageio import AICSImage

im = AICSImage("s_3_t_1_c_3_z_5.czi")
im.data.shape  # returns (3, 1, 3, 5, 4029, 5476)

It finds the correct Scene size but something is wonky about the Y and X. Y should be 325 and X should be 475.

For comparison here is the output of reading the smallest czi:

im = AICSImage("s_1_t_1_c_1_z_1.czi")
im.data.shape  # returns (1, 1, 1, 1, 325, 475)

Move test resources to open S3 bucket

The test resources directory is now at ~80MB which is okay but any larger and the repo starts to get expensive to clone.

To stop this, move the test files to an open S3 bucket and write a helper script to pull the data prior to CI.

Add `imwrite` function

We have an imread we should probably have an imwrite

Follow imageio.imwrite spec in having a format parameter

AICS .png writer transposes images before writing

file streams don't work with AICSImage - Is this intended?

>>> fp = open(fname, "rb")
>>> fp
<_io.BufferedReader name='/Users/jamies/Sandbox/Python/pylibczi/pylibczi/tests/resources/s_1_t_1_c_1_z_1.czi'>
>>> img = AICSImage(fp)
Traceback (most recent call last):
...
TypeError: Reader only accepts types: [str, pathlib.Path, bytes, io.BytesIO], received: <class '_io.BufferedReader'>

opening it as text "r" doesn't work either but that seems right.

Update release procedure

Update release procedure so that we are no longer pushing to stable to release.

This or something like it seems promising.

TiffReader and DefaultReader do not have `get_channel_names` functions

Somewhat standardizing with the get_physical_pixel_size, all readers should implement or inherit from the base class get_channel_names function.

Deprecate AICSImage.view_napari

Having a function that is highly dependent on a different projects API for a single utility should be removed. An example can be placed in the documentation instead with pointers to napari-aicsimageio as well.

Channel & Z Switched?

When loading in a tif file. It seems that the channels and z dimension are switched in the order that they are presented in the 6D array. So instead of STCZYX it's STZCYX.

Was just wondering if someone could confirm this. Additionally, if there was any other ordering issues.

Rewrite TiffReader to use dask

Do a simple update of TiffReader to support dask. For now I won't be writing delayed plane or z stack reads but simply keeping the existing code and wrapping the numpy array in a dask array. We can optimize chunks later.

Required for #56

Write benchmark suite and integrate into CI

Use Case

Please provide a use case to help us understand your request in context
Related to #104

Monitor image read performance on every commit to master to watch for performance hits introduced with each patch.

Solution

Please describe your ideal solution
Look into asv which is used by pandas for benchmarking which is generally a good sign.

On every commit to master, run the benchmarks with different configurations on fargate instances? Potentially?

GitHub Actions server replicates no cluster
Fargate single node 8 vCPU replicates standard local-cluster
Fargate single node 32 vCPU replicates nice local-cluster
Fargate multi node 32 * 1 vCPU replicates standard distributed cluster
Fargate multi node 8 * 4 vCPU replicates standard distributed cluster (many CPU configuration)
Fargate multi node 100 * 1 vCPU replicates nice distributed cluster
Fargate multi node 25 * 4 vCPU replicates nice distributed cluster (many CPU configuration)

Each benchmark suite publishes image and CSV to S3 quilt package to version benchmarks

Single gather job at the end to pull all individual CSVs together and create final benchmark image, upload to S3 at standard URL so that the image can be displayed on README / documentation.

This would require building and publishing a Docker image prior to running the fargate clusters.

Why have two different distributed fargate cluster configurations for each level? We still aren't entirely sure if have many cores on a single worker is faster or slower than have many single core workers. Our assumption is that having many cores on a single worker will reduce the amount of transfer between workers but it's also just a test of how the reads will hold up under more varied conditions.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them
Use the janky script currently written with a few patches.

Can't import package without network access

System and Software

aicsimageio Version: 3.1.2
Python Version: 3.8.1
Operating System: Docker container built from python:3 base image

Description

Without network access, the aicsimageio package cannot be imported.

We're trying to use this as part of a pipeline written in CWL, with each tool encapsulated in a Docker container. By default, the cwltool command-line workflow runner invokes Docker with --net=none, which causes imports of aicsimageio to fail.

This is straightforward for us to fix/work around, but it seems like this package should probably be usable without network access.

Expected Behavior

import aicsimageio should succeed without network access.

Reproduction

$ docker run -it --net=none hubmap/codex-scripts
Python 3.8.1 (default, Feb  2 2020, 08:37:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import aicsimageio
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/toolz/functoolz.py", line 456, in memof
    return cache[k]
KeyError: ('8.8.8.8', 80)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 138, in _get_ip
    sock.connect((host, port))
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/aicsimageio/__init__.py", line 4, in <module>
    from .aics_image import AICSImage  # noqa: F401
  File "/usr/local/lib/python3.8/site-packages/aicsimageio/aics_image.py", line 9, in <module>
    from distributed import Client, LocalCluster
  File "/usr/local/lib/python3.8/site-packages/distributed/__init__.py", line 3, in <module>
    from .actor import Actor, ActorFuture
  File "/usr/local/lib/python3.8/site-packages/distributed/actor.py", line 6, in <module>
    from .client import Future, default_client
  File "/usr/local/lib/python3.8/site-packages/distributed/client.py", line 44, in <module>
    from .batched import BatchedSend
  File "/usr/local/lib/python3.8/site-packages/distributed/batched.py", line 8, in <module>
    from .core import CommClosedError
  File "/usr/local/lib/python3.8/site-packages/distributed/core.py", line 17, in <module>
    from .comm import (
  File "/usr/local/lib/python3.8/site-packages/distributed/comm/__init__.py", line 25, in <module>
    _register_transports()
  File "/usr/local/lib/python3.8/site-packages/distributed/comm/__init__.py", line 16, in _register_transports
    from . import inproc
  File "/usr/local/lib/python3.8/site-packages/distributed/comm/inproc.py", line 75, in <module>
    global_manager = Manager()
  File "/usr/local/lib/python3.8/site-packages/distributed/comm/inproc.py", line 39, in __init__
    self.ip = get_ip()
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 162, in get_ip
    return _get_ip(host, port, family=socket.AF_INET)
  File "/usr/local/lib/python3.8/site-packages/toolz/functoolz.py", line 460, in memof
    cache[k] = result = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/distributed/utils.py", line 147, in _get_ip
    addr_info = socket.getaddrinfo(
  File "/usr/local/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

Environment

N/A

get_image_dask does not handle chunk_by_dims

This code

if __name__ == "__main__":
	with AICSImage("20190809_R02.czi", chunk_by_dims=['Y','X']) as raw:
		img = raw.get_image_dask_data("TYX", S=0, C=0, Z=12)
		print(img.shape)

gives this error:

Process Dask Worker process (from Nanny):
Traceback (most recent call last):
  File "/home/matheus.viana/anaconda3/envs/aicsimageio-test/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/matheus.viana/anaconda3/envs/aicsimageio-test/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/matheus.viana/anaconda3/envs/aicsimageio-test/lib/python3.7/site-packages/distributed/process.py", line 191, in _run
    target(*args, **kwargs)
  File "/home/matheus.viana/anaconda3/envs/aicsimageio-test/lib/python3.7/site-packages/distributed/nanny.py", line 674, in _run
    worker = Worker(**worker_kwargs)
  File "/home/matheus.viana/anaconda3/envs/aicsimageio-test/lib/python3.7/site-packages/distributed/worker.py", line 638, in __init__
    **kwargs
TypeError: __init__() got an unexpected keyword argument 'chunk_by_dims

Detect No Cluster and read single shot

Use Case

Please provide a use case to help us understand your request in context

Make the blue dots be comparable to the the orange dots, on "No Cluster"

Speed up image reading when no cluster.

Solution

Please describe your ideal solution
Do the image read with the base libraries but still read into a dask array.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

populate_TiffData is assuming a dimension_order

ome tiff writer was modified to accept a dimension_order but the automatic generation of tiffdata elements is still assuming XYCZT all the time. This means the writer is still broken if you use any other dimension ordering. Working on fix.

Incorporate build procedure standards

Current modeling standards are:
Travis-ci
Codecov

Allow arbitrary dim ordering for `writers`

The writers module has been updated with docstrings but hasn't been updated to follow our new spec. Currently it saves the data out in TZCYX instead of our new STCZYX.

LIF / Leica File Reader

As cross-team projects are approaching and with the FISH project we have already encountered areas where our current readers can't support a team's workflow. Adding a reader to support this file format would increase adoption.

Rewrite in Dask

As files become too large to hold in memory at once and after seeing the usefulness of dask in general to distribute operations and optimize pipelines for us rewrite all readers to no longer hold onto bytes io objects and instead just hold onto the filename and use delayed distributed readers for each type of reader.

API Changes:
We lose support for streams
All readers have an exposed dask_data property
AICSImage objects have an exposed dask_data property
By default all operations are routed though dask_data

Add black formatting checks

Multiple authors and my nitpickiness is getting to me, add black to the repo like all other cookiecutter generated repos.

Add `AICSImage.save` function

If a user is working with the AICSImage object they should just be able to call save and save it to ome-tiff? Or at least specify a format as a parameter? Very related to #16.

get_physical_pixel_size should be a function on AICSImage

Use Case

AICSImage supports get_channel_names, but you have to get the reader to call get_physical_pixel_size. As a user, I want a simple way to get the physical pixel size from an AICSImage object.

Solution

Make get_physical_pixel_size behave the same way as get_channel_names

Add `reader.format` attribute that will give image format back as a string

Somewhat specific to DefaultReader, would be nice to have an attribute exposed to all readers though that simply returned the string format, (file suffix / extension), for whatever image data was read in.

im = aicsimageio.AICSImage({some_buffer_or_file_missing_extension})
im.format  # returns ".png"

Extracting ROIs from Metadata?

Hi,

Does the meta data parser hold ROIs in the aicsimage struct? Or is this something that we have to divulge ourselves?

dask_utils.spawn_cluster_and_client

def spawn_cluster_and_client(
    address: Optional[str] = None,
    **kwargs
) -> Tuple[Optional[LocalCluster], Optional[Client]]:
    """
    If provided an address, create a Dask Client connection.
    If not provided an address, create a LocalCluster and Client connection.
    If not provided an address, other Dask kwargs are accepted and passed down to the LocalCluster object.
    """
    if address is not None:
        client = Client(address)
        log.info(f"Connected to Remote Dask Cluster: {client}")
    else:
        cluster = LocalCluster(**kwargs)
        client = Client(cluster)
        log.info(f"Connected to Local Dask Cluster: {client}")

    return cluster, client

If you provide an address cluster is never defined
I'd just make the first line of the function:
cluster = None
and we should be good.

implemented on bugfix/remote_client

pixels_physical_size in omeTifWriter not applying to image

System and Software

aicsimageio Version: Latest - downloaded today
Python Version: 3.6
Operating System: Windows

Description

I am trying to save a segmentation from my Jupyter Notebook to my computer as TIFF. Below is the code that I am trying to use:

Expected Behavior

I expected that the saved image would have voxel spacing as I specified above. After opening the image in ImageJ, this is not the case. Image info from ImageJ attached below:

Am I doing anything wrong here?

Write delayed CZI reader

Use aicspylibczi to update the CZIReader class to return a delayed dask array for planes or on demand z stacks.

Required for #56

Update `omexml.py` / `OmeTiffWriter` to allow writing of six dim images

For future support of our our normal STCZYX dim ordering we will need to update how the ome xml is constructed to allow the Scene dimension.

Allow passing of "known_dims" to AICSImage objects

Because of our default dim order and assumptions made by readers, they can sometimes be wrong. Allow the user to pass known dims to the AICSImage object on init.

Remove git lfs from repository

The repo cannot currently even be cloned which build procedures will break and it's just nice to reduce the size of the repo.

Allow `AbstractBufferedFile` as input to AICSImage and Readers

Use Case

Please provide a use case to help us understand your request in context
If I want to read a file from an S3, GCS, or Azure bucket, I would currently first need to download the file locally then read it. Not optimal in two capacities.

If the user doesn't have enough storage on the local machine in the first place
If the compute is being done by EC2 or GC Compute Engine, or etc, then there would be very little value in copying to the worker.

Solution

Please describe your ideal solution

AbstractBufferedFile is the ABC from fsspec which is the basis for the file system handlers for S3, GCS, Azure, etc (s3fs) for example. By allowing all children of AbstractBufferedFile we are allowing all S3File and the like. We are not returning to open buffer handling. S3File is similar to Path in a sense where it is just a pointer to something, and notably, S3File and AbstractBufferedFile in general can be pickled, meaning distributed reads to these targets still work. The underlying actual reading libraries (tifffile, aicspylibczi, readlif, etc.) simply need to support reading from an open buffer.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them
As mentioned above, download the file in full then read.

Notes

Some work has already been done on this to support: here

Fix CziReader.get_channel_names to actually use scene index provided

Scenes can have different channels and currently our function completely ignores the provided scene index. I am going to lump this into the czi-to-ome conversion work because it is metadata related.

allencellmodeling / aicsimageio Goto Github PK

aicsimageio's Introduction

AICSImageIO

Features

Installation

Extra Format Installation

Documentation

Quickstart

Full Image Reading

Full Image Reading Notes

Delayed Image Reading

Delayed Image Reading Notes

Mosaic Image Reading

AICSImage

Reader

Single Tile Absolute Positioning

Metadata Reading

Xarray Coordinate Plane Attachment

Cloud IO Support

Saving to OME-TIFF

Other Writers

Benchmarks

Development

Citation

aicsimageio's People

Contributors

Stargazers

Watchers

Forkers

aicsimageio's Issues

Use Case

Solution

Alternatives

Other Comments

Use Case

Solution

Alternatives

Use Case

Solution

Alternatives

System and Software

Description

Expected Behavior

Reproduction

Environment

Use Case

Solution

Alternatives

Use Case

Solution

System and Software

Description

Expected Behavior

Use Case

Solution

Alternatives

Notes

Recommend Projects

Recommend Topics

Recommend Org