Code Monkey home page Code Monkey logo

Comments (31)

jbms avatar jbms commented on May 14, 2024 2

Thanks, this is already helpful.

Some syntax help geared towards interoperability would be significant. How to get domain's origin and shape into std::vectors?

auto shape_span = store.domain().shape();
std::vector<int64_t> shape(shape_span.begin(), shape_span.end());
// similar for origin

How to get underlying element type (dtype)?

tensorstore::DataType dtype = store.dtype();
if (dtype == tensorstore::dtype_v<int32_t>) { /* ... */ }

How to get entire metadata as nlohmann::json?

TensorStore doesn't specifically support zarr user-defined metadata, so this relies on its generic "json" driver:

auto attrs_store = tensorstore::Open<::nlohmann::json, 0>({{"driver", "json"}, {"kvstore", {{"driver", "file"}, {"path", ".../.zattrs"}}}}).result().value();

// Sets attrs_array to a rank-0 array of ::nlohmann::json
auto attrs_array_result = tensorstore::Read(attrs_store).result();

::nlohmann::json attrs;
if (attrs_array_result.ok()) {
  attrs = attrs_array_result->value()();
} else if (absl::IsNotFound(attrs_array_result.status()) {
  attrs = ::nlohmann::json::object_t();
} else {
  return attrs_array_result.status();
}

What is the C++ equivalent of x = dataset_3d[15000:15100, 15000:15100, 20000]?

auto x = tensorstore::Read<tensorstore::zero_origin>(dataset_3d | tensorstore::Dims(0, 1).HalfOpenInterval({15000, 15000}, {15100, 15100}) | tensorstore::Dims(2).IndexSlice(20000)).result().value();

How to get buffer pointer (pointer to first element in memory)? > I assume this is possible only for the entire currently buffered chunk, and not its sub-views. Or an equivalent question: how to get elements of an array into an std::vector<dtype>?

Following example above:

void *ptr = x.data();
auto dtype = x.dtype();

You can also read directly into an std::vector by adapting it into a tensorstore::Array:

std::vector<int32_t> vec(100 * 100);
auto arr = tensorstore::Array(vec.data(), {100, 100}, tensorstore::c_order);
tensorstore::Read(dataset_3d | tensorstore::Dims(0, 1).HalfOpenInterval({15000, 15000}, {15100, 15100}) | tensorstore::Dims(2).IndexSlice(20000), tensorstore::UnownedToShared(arr)).value();

How to write an n-dimensional array with shape given in std::vector and const dtype * buffer?

std::vector<int64_t> shape{100, 100};
const int32_t *buffer = ...;
auto arr = tensorstore::Array(buffer, shape, tensorstore::c_order);
tensorstore::Write(tensorstore::UnownedToShared(arr), dataset_3d | tensorstore::Dims(0, 1).HalfOpenInterval({15000, 15000}, {15100, 15100}) | tensorstore::Dims(2).IndexSlice(20000)).value();

from tensorstore.

jbms avatar jbms commented on May 14, 2024 1

Here the error is actually that there is no error: Result::status() currently can only be used if the result is in an error state --- we should fix that, though.

To fix your example:

#include "tensorstore/context.h"
#include "tensorstore/open.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto result = openFuture.result();
  if (result.ok())
  {
    std::cout << "status OK";
    auto store = result.value();
    std::cout << store.domain().shape();
  }
  else
  {
    std::cout << "status BAD\n" << result.status();
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}

from tensorstore.

jbms avatar jbms commented on May 14, 2024 1

A TensorStore object logically contains:

tensorstore::Dims(0).HalfOpenInterval(0, shape[0]) is a DimExpression. It can be called as a function on an IndexTransform, and also on other supported objects like TensorStore objects that contain an IndexTransform, in which case it just applies to the contained IndexTransform.

The operator| "pipeline operator" syntax is just a convenience syntax for calling the right hand side as a function:

store = (std::move(store) | tensorstore::Dims(0).HalfOpenInterval(0, shape[0])).value();

is equivalent to

store = tensorstore::Dims(0).HalfOpenInterval(0, shape[0])(std::move(store)).value();

However, as you noted, the operator| "pipeline operator" is not associative, and Tensorstore doesn't include a type that can directly hold a sequence of (unapplied) DimExpression objects. In principle you could store them in e.g. std::vector<std::function<Result<TensorStore<>> (TensorStore<>)>>, but additionally, you shouldn't normally have DimExpression objects outlive the full expression in which they are constructed, because they can easily have dangling references to temporaries.

Instead, if you want to compose multiple indexing operations without applying them directly to a TensorStore object, you can use an IndexTransform object:

tensorstore::IndexTransform<> transform = tensorstore::IdentityTransform(store.domain());
for (unsigned d = 0; d < shape.size(); ++d)
{
    transform = (std::move(transform) | tensorstore::Dims(d).HalfOpenInterval(0, shape[d] / 2)).value();
}

auto x = tensorstore::Read<tensorstore::zero_origin>(store | transform).value();

from tensorstore.

toloudis avatar toloudis commented on May 14, 2024 1

Thanks to guidance from this issue I have begun to integrate OME-Zarr in my volume viewer as seen here:
allen-cell-animated/agave#73
https://github.com/allen-cell-animated/agave/pull/73/files#diff-c2505cb0ef29a0b26d0eedacbc8049b5b3dd87214baa7ff4ae72a5bb23d7168f

from tensorstore.

jbms avatar jbms commented on May 14, 2024

We are working on adding C++ API documentation and examples, but unfortunately haven't had the time to finish that yet. In the meantime you can look here for some usage examples:

https://github.com/google/tensorstore/blob/master/tensorstore/driver/zarr/driver_test.cc

from tensorstore.

sameeul avatar sameeul commented on May 14, 2024

Thanks. If I may ask some naive questions based on the example code there...
I have the following code sample

int main(int argc, char** argv) {
  tensorstore::Context context = Context::Default();
  TENSORSTORE_CHECK_OK_AND_ASSIGN(auto store, tensorstore::Open({{"driver", "zarr"},
                            {"kvstore", {{"driver", "file"},
                                         {"path", "p01_x01_y01_wx0_wy0_c1.ome.zarr"}}
                            }},
                            context,
                            tensorstore::OpenMode::open,
                            tensorstore::RecheckCached{false},
                            tensorstore::ReadWriteMode::read).result());

  return 0;
}

When I run this, I get the following message: INVALID_ARGUMENT: Error parsing object member "driver": "zarr" is not registered.

It is not very clear to me on how to register a driver.

from tensorstore.

jbms avatar jbms commented on May 14, 2024

Assuming you are using bazel to build, drivers are registered by including the appropriate driver target as a dependency in your build. For zarr and file drivers you need:

//tensorstore/driver/zarr and //tensorstore/kvstore/file

To include all drivers you can instead add as dependencies:

//tensorstore:all_drivers

from tensorstore.

sameeul avatar sameeul commented on May 14, 2024

Thanks! Worked like a charm!

from tensorstore.

laramiel avatar laramiel commented on May 14, 2024

You can also look at some of the C++ examples in:

https://github.com/google/tensorstore/tree/master/tensorstore/examples

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

Having a C++ example and/or tutorial would be good. Tests are focused on testing - which would be totally fine if a user example existed.

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

My attempt at reading a zarr file and examining what I read ran into a road block:

#include "tensorstore/context.h"
#include "tensorstore/open.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";

  auto store = tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                                 context,
                                 tensorstore::OpenMode::open,
                                 tensorstore::RecheckCached{ false },
                                 tensorstore::ReadWriteMode::read)
                 .result();
  std::cout << store.domain().shape();

  return EXIT_SUCCESS;
}
13>------ Build started: Project: tester, Configuration: Debug x64 ------
13>tester.cpp
13>C:\Misc\Tester\tester.cpp(18,22): error C2039: 'domain': is not a member of 'tensorstore::Result<tensorstore::TensorStore<void,-1,tensorstore::ReadWriteMode::dynamic>>'
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/util/future_impl.h(502): message : see declaration of 'tensorstore::Result<tensorstore::TensorStore<void,-1,tensorstore::ReadWriteMode::dynamic>>'
13>Done building project "tester.vcxproj" -- FAILED.

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

My attempt was inspired by

EXPECT_THAT(store.domain().shape(), ::testing::ElementsAre(100, 100));
.

from tensorstore.

jbms avatar jbms commented on May 14, 2024

tensorstore::Open returns a Future<TensorStore<>>, calling result() gives you a Result<TensorStore<>> (which holds either a TensorStore<> value (indicating success) or an error absl::Status). If you instead call value() you will get a plain TensorStore<> object:

auto store = tensorstore::Open({{"driver", "zarr"}, {"kvstore", {{"driver", "file"}, {"path", path}}}}, ...).value();

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

I think I have an example which "works", but it does not open example zarr files.

#include "tensorstore/context.h"
#include "tensorstore/open.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto status = openFuture.result().status();
  if (status.ok())
  {
    std::cout << "status OK";
    auto store = openFuture.value();
    std::cout << store.domain().shape();
  }
  else
  {
    std::cout << "status BAD\n" << status;
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}
status BAD
NOT_FOUND: Error opening "zarr" driver: Metadata at local file "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zarray" does not exist [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{}},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/\"},\"recheck_cached_data\":false,\"recheck_cached_metadata\":false}']
C:\Misc\Tester\Debug\tester.exe (process 16716) exited with code 1.

from tensorstore.

sameeul avatar sameeul commented on May 14, 2024

I think the issue is related to the OMEZarr file structure. OMEZarr actually can contain multiple datasets (each of them as a zarr file). So, The top level file directory actually has a .zattrs file which gives you the dataset name (the actual zarr file). You can parse that JSON file to get a dataset name and then you can append that to your filename and read via tensorstore. For your case, one of these file can be C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0 .

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

Sadly, no. If I use C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0 as the path, I get a crash:

C:\Misc\Tester\_deps\tensorstore-src\tensorstore/util/result.h:506: CHECK failed: !has_value()
C:\Misc\Tester\Debug\tester.exe (process 11244) exited with code 3.

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

Thanks, this is already helpful.

Some syntax help geared towards interoperability would be significant. How to get domain's origin and shape into std::vectors? How to get underlying element type (dtype)? How to get entire metadata as nlohmann::json? What is the C++ equivalent of x = dataset_3d[15000:15100, 15000:15100, 20000]? How to get buffer pointer (pointer to first element in memory)? I assume this is possible only for the entire currently buffered chunk, and not its sub-views. Or an equivalent question: how to get elements of an array into an std::vector<dtype>? How to write an n-dimensional array with shape given in std::vector and const dtype * buffer?

Direct answers are preferable, but even pointers to most similar/relevant code currently existing in examples or tests would be good. https://github.com/google/tensorstore/blob/f6da8b5696a04cb6f30fab07183756d0d67d5eaa/tensorstore/driver/zarr/driver_test.cc is 3000 lines long, and seems to be heavily oriented towards testing. Other examples also aren't oriented towards education of interoperability. I am trying to convert these 150 lines from using netCDF's NCZarr to using tensorstore.

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

This is super-helpful, thank you. Updated example:

#include "tensorstore/context.h"
#include "tensorstore/open.h"
#include "tensorstore/index_space/dim_expression.h"

int
main(int argc, char ** argv)
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/test/zarr_implementations/examples/zarr.zr";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto result = openFuture.result();
  if (result.ok())
  {
    auto store = result.value();
    auto domain = store.domain();
    std::cout << "domain.shape(): " << domain.shape() << std::endl;
    std::cout << "domain.origin(): " << domain.origin() << std::endl;
    auto shape_span = store.domain().shape();

    std::vector<int64_t> shape(shape_span.begin(), shape_span.end());

    tensorstore::DataType dtype = store.dtype();
    std::cout << "dtype: " << dtype << std::endl;
    if (dtype == tensorstore::dtype_v<uint16_t>)
    {
      auto x = tensorstore::Read<tensorstore::zero_origin>(store).result().value();

      auto * p = reinterpret_cast<uint16_t *>(x.data());
      std::cout << "p: " << *p << " " << p[1] << " " << p[2] << " " << p[3] << " " << p[4] << std::endl;
    }
    else
    {
      std::cerr << "Unsupported dtype";
      return EXIT_FAILURE;
    }
  }
  else
  {
    std::cout << "status BAD\n" << result.status();
    return EXIT_FAILURE;
  }


  // JSON uses a separate driver
  auto attrs_store =
    tensorstore::Open<::nlohmann::json, 0>(
      { { "driver", "json" },
        { "kvstore", { { "driver", "file" }, { "path", "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zattrs" } } } })
      .result()
      .value();

  // Sets attrs_array to a rank-0 array of ::nlohmann::json
  auto attrs_array_result = tensorstore::Read(attrs_store).result();

  ::nlohmann::json attrs;
  if (attrs_array_result.ok())
  {
    attrs = attrs_array_result.value()();
    std::cout << "attrs: " << attrs << std::endl;
  }
  else if (absl::IsNotFound(attrs_array_result.status()))
  {
    attrs = ::nlohmann::json::object_t();
  }
  else
  {
    std::cout << "Error: " << attrs_array_result.status();
  }

  return EXIT_SUCCESS;
}

If I attempt to customize the read region:

      auto dimSpec = tensorstore::Dims(0).HalfOpenInterval(0, shape[0]);
      for (unsigned d = 1; d < shape.size(); ++d)
      {
        dimSpec = dimSpec | tensorstore::Dims(d).HalfOpenInterval(0, shape[d] / 2);
      }
      auto x = tensorstore::Read<tensorstore::zero_origin>(store | dimSpec).result().value();

I get a compile error:

13>C:\Misc\Tester\tester.cpp(38,83): error C2678: binary '|': no operator found which takes a left-hand operand of type 'tensorstore::DimExpression<tensorstore::internal_index_space::IntervalSliceOp<tensorstore::Index,tensorstore::Index,tensorstore::Index>,tensorstore::internal_index_space::DimensionList<std::array<tensorstore::DimensionIndex,1>>>' (or there is no acceptable conversion)
13>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\include\cstddef(42,27): message : could be 'std::byte std::operator |(const std::byte,const std::byte) noexcept' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/data_type.h(376,42): message : or       'tensorstore::DataTypeConversionFlags tensorstore::operator |(tensorstore::DataTypeConversionFlags,tensorstore::DataTypeConversionFlags)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/open_mode.h(72,27): message : or       'tensorstore::OpenMode tensorstore::operator |(tensorstore::OpenMode,tensorstore::OpenMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/open_mode.h(117,32): message : or       'tensorstore::ReadWriteMode tensorstore::operator |(tensorstore::ReadWriteMode,tensorstore::ReadWriteMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/index_space/alignment.h(55,41): message : or       'tensorstore::DomainAlignmentOptions tensorstore::operator |(tensorstore::DomainAlignmentOptions,tensorstore::DomainAlignmentOptions)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/resize_options.h(55,36): message : or       'tensorstore::ResolveBoundsMode tensorstore::operator |(tensorstore::ResolveBoundsMode,tensorstore::ResolveBoundsMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\_deps\tensorstore-src\tensorstore/resize_options.h(128,29): message : or       'tensorstore::ResizeMode tensorstore::operator |(tensorstore::ResizeMode,tensorstore::ResizeMode)' [found using argument-dependent lookup]
13>C:\Misc\Tester\tester.cpp(38,83): message : while trying to match the argument list '(tensorstore::DimExpression<tensorstore::internal_index_space::IntervalSliceOp<tensorstore::Index,tensorstore::Index,tensorstore::Index>,tensorstore::internal_index_space::DimensionList<std::array<tensorstore::DimensionIndex,1>>>, tensorstore::DimExpression<tensorstore::internal_index_space::IntervalSliceOp<tensorstore::Index,tensorstore::Index,tensorstore::Index>,tensorstore::internal_index_space::DimensionList<std::array<tensorstore::DimensionIndex,1>>>)'

Why can't I do this index composition outside of a function call? And what is the proper syntax to accomplish this?

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

I got my example working. Thank you.

#include "tensorstore/context.h"
#include "tensorstore/open.h"
#include "tensorstore/index_space/dim_expression.h"

#include "itkImageFileReader.h"

void
jsonRead()
{
  // JSON uses a separate driver
  auto attrs_store =
    tensorstore::Open<::nlohmann::json, 0>(
      { { "driver", "json" },
        { "kvstore", { { "driver", "file" }, { "path", "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zattrs" } } } })
      .result()
      .value();

  // Sets attrs_array to a rank-0 array of ::nlohmann::json
  auto attrs_array_result = tensorstore::Read(attrs_store).result();

  ::nlohmann::json attrs;
  if (attrs_array_result.ok())
  {
    attrs = attrs_array_result.value()();
    std::cout << "attrs: " << attrs << std::endl;
  }
  else if (absl::IsNotFound(attrs_array_result.status()))
  {
    attrs = ::nlohmann::json::object_t();
  }
  else
  {
    std::cout << "Error: " << attrs_array_result.status();
  }
}

int
exampleRead()
{
  tensorstore::Context context = tensorstore::Context::Default();

  std::string path = "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/s0";
  // std::string path = "C:/Dev/ITKIOOMEZarrNGFF/test/zarr_implementations/examples/zarr.zr";

  auto openFuture =
    tensorstore::Open({ { "driver", "zarr" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                      context,
                      tensorstore::OpenMode::open,
                      tensorstore::RecheckCached{ false },
                      tensorstore::ReadWriteMode::read);

  auto result = openFuture.result();
  if (result.ok())
  {
    auto store = result.value();
    auto domain = store.domain();
    std::cout << "domain.shape(): " << domain.shape() << std::endl;
    std::cout << "domain.origin(): " << domain.origin() << std::endl;
    auto shape_span = store.domain().shape();

    std::vector<int64_t> shape(shape_span.begin(), shape_span.end());

    tensorstore::DataType dtype = store.dtype();
    std::cout << "dtype: " << dtype << std::endl;
    if (dtype == tensorstore::dtype_v<uint16_t>)
    {
      tensorstore::IndexTransform<> transform = tensorstore::IdentityTransform(store.domain());
      for (unsigned d = 0; d < shape.size(); ++d)
      {
        transform = (std::move(transform) | tensorstore::Dims(d).HalfOpenInterval(0, shape[d] / 2)).value();
      }
      auto x = tensorstore::Read<tensorstore::zero_origin>(store | transform).value();

      auto * p = reinterpret_cast<uint16_t *>(x.data());
      std::cout << "p: " << *p << " " << p[1] << " " << p[2] << " " << p[3] << " " << p[4] << std::endl;
    }
    else
    {
      std::cerr << "Unsupported dtype";
      return EXIT_FAILURE;
    }
  }
  else
  {
    std::cout << "status BAD\n" << result.status();
    return EXIT_FAILURE;
  }

  return EXIT_SUCCESS;
}

void
exampleWrite()
{
  using ImageType = itk::Image<short, 3>;
  auto                 image = itk::ReadImage<ImageType>("C:/a/DzZ_T1.mha");
  auto                 size = image->GetLargestPossibleRegion().GetSize(); // ijk = 512x512x12
  std::vector<int64_t> shape(size.rbegin(), size.rend());                  // kji

  auto context = tensorstore::Context::Default();
  auto dataset = tensorstore::Open(
                   {
                     { "driver", "zarr" },
                     { "kvstore", { { "driver", "file" }, { "path", "C:/a/DzZ_T1.zarr" } } },
                     { "metadata",
                       {
                         { "compressor", { { "id", "blosc" } } },
                         { "dtype", "<i2" },
                         { "shape", shape },
                         // { "dimension_separator", "." },
                       } },
                   },
                   context,
                   tensorstore::OpenMode::create | tensorstore::OpenMode::delete_existing,
                   tensorstore::ReadWriteMode::read_write)
                   .result();

  auto arr = tensorstore::Array(image->GetBufferPointer(), shape, tensorstore::c_order);

  auto writeFuture = tensorstore::Write(tensorstore::UnownedToShared(arr), dataset);
  auto result = writeFuture.result();
  if (result.ok())
  {
    std::cout << "Written successfully\n";
  }
  else
  {
    std::cout << "Write error:\n" << result.status();
    return;
  }
}

int
main(int argc, char ** argv)
{
  jsonRead();
  exampleWrite();
  return exampleRead();
}

I now realize that zip is not on the list of KvStore drivers. What would be the best way to achieve functionality similar to zarr.storage.ZipStore? What would be the easiest way?

from tensorstore.

jbms avatar jbms commented on May 14, 2024

Do you need read/write support for zip or just read-only?

Best way would be to implement a "zip" kvstore driver in TensorStore (which functions as an adapter on top of a "base" kvstore) --- that has been on our todo list.

Easiest way --- would be to do the zipping and unzipping externally, using a temp directory.

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

Ideally, both read and write. The main need for zip store is for use in JavaScript and WebAssembly, where a single file can be passed around as a blob, and a directory is very inconvenient (to say the least).

from tensorstore.

jbms avatar jbms commented on May 14, 2024

Clearly zip may indeed be useful in some cases. But one of the main use cases for the zarr format is to enable efficient "partial I/O" of large arrays, meaning just reading the desired portion as needed rather than always reading the entire thing. Passing around the entire array as a zip file defeats that purpose. Instead you could retrieve individual chunks on demand from a server, e.g. as done in https://github.com/google/neuroglancer

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

I completely agree. But when developing a web application, starting off with a zip file is a lot easier than having to immediately deal with cloud storage and its complexity (authentication, tolerating network faults etc.).

from tensorstore.

thewtex avatar thewtex commented on May 14, 2024

Read and write zip support is important for working with these datasets in a way that reduces the number of inodes on local filesystems but also objects in JavaScript / WebAssembly.

Passing around the entire array as a zip file defeats that purpose.

We can still retrieve parts of the data on demand through HTTP range requests.

from tensorstore.

thewtex avatar thewtex commented on May 14, 2024

We have found that a good Zip library is minizip-ng.

from tensorstore.

thewtex avatar thewtex commented on May 14, 2024

With CMake options:

option(MZ_COMPAT OFF "Enables compatibility layer")
option(MZ_ZLIB OFF "Enables ZLIB compression")
option(MZ_BZIP2 OFF "Enables BZIP2 compression")
option(MZ_LZMA OFF "Enables LZMA & XZ compression")
option(MZ_ZSTD OFF "Enables ZSTD compression")
option(MZ_PKCRYPT OFF "Enables PKWARE traditional encryption")
option(MZ_WZAES OFF "Enables WinZIP AES encryption")
option(MZ_OPENSSL OFF "Enables OpenSSL encryption")
option(MZ_LIBBSD OFF "Build with libbsd for crypto random")
option(MZ_SIGNING OFF "Enables zip signing support")
option(MZ_ICONV OFF "Enables iconv string encoding conversion library")

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

This function works for writing JSON:

// JSON file path, e.g. "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zgroup"
void
writeJson(nlohmann::json json, std::string path)
{
  auto attrs_store = tensorstore::Open<nlohmann::json, 0>(
                       { { "driver", "json" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                       tsContext,
                       tensorstore::OpenMode::create | tensorstore::OpenMode::delete_existing,
                       tensorstore::ReadWriteMode::read_write)
                       .result()
                       .value();
  auto writeFuture = tensorstore::Write(tensorstore::MakeScalarArray(json), attrs_store);

  auto result = writeFuture.result();
  if (!result.ok())
  {
    itkGenericExceptionMacro(<< "There was an error writing metadata to file '" << path
                             << ". Error details: " << result.status());
  }
}

Can I make it pretty-print the JSON instead of writing the minified version?

from tensorstore.

dzenanz avatar dzenanz commented on May 14, 2024

I see you gave netCDF a shot as well, @toloudis 😄

from tensorstore.

toloudis avatar toloudis commented on May 14, 2024

I see you gave netCDF a shot as well, @toloudis 😄

Nothing against it, but I did attempt it but just found the integration with cmake and getting it to build was hard. tensorstore just sort of worked right away with not too many lines of cmake.

from tensorstore.

crl123 avatar crl123 commented on May 14, 2024
// JSON file path, e.g. "C:/Dev/ITKIOOMEZarrNGFF/v0.4/cyx.ome.zarr/.zgroup"
void
writeJson(nlohmann::json json, std::string path)
{
  auto attrs_store = tensorstore::Open<nlohmann::json, 0>(
                       { { "driver", "json" }, { "kvstore", { { "driver", "file" }, { "path", path } } } },
                       tsContext,
                       tensorstore::OpenMode::create | tensorstore::OpenMode::delete_existing,
                       tensorstore::ReadWriteMode::read_write)
                       .result()
                       .value();
  auto writeFuture = tensorstore::Write(tensorstore::MakeScalarArray(json), attrs_store);

  auto result = writeFuture.result();
  if (!result.ok())
  {
    itkGenericExceptionMacro(<< "There was an error writing metadata to file '" << path
                             << ". Error details: " << result.status());
  }
}

I tried to replicate the writeJson example, but it gives me the following error:

/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/array.h:945:55:   required from 'tensorstore::Array<ElementTagType, Rank, OriginKind, LayoutContainerKind>::Storage::Storage(PointerInit&&, LayoutInit&&) [with PointerInit = std::shared_ptr<nlohmann::basic_json<> >; LayoutInit = tensorstore::StridedLayout<0>; ElementTagType = tensorstore::Shared<nlohmann::basic_json<> >; long int Rank = 0; tensorstore::ArrayOriginKind OriginKind = tensorstore::ArrayOriginKind::zero; tensorstore::ContainerKind LayoutContainerKind = tensorstore::ContainerKind::container]'
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/array.h:526:9:   required from 'tensorstore::Array<ElementTagType, Rank, OriginKind, LayoutContainerKind>::Array(SourcePointer, SourceLayout&&) [with SourcePointer = std::shared_ptr<nlohmann::basic_json<> >; SourceLayout = tensorstore::StridedLayout<0>; std::enable_if_t<IsPairImplicitlyConvertible<SourcePointer, SourceLayout, tensorstore::ElementPointer<SourceTag>, tensorstore::StridedLayout<R, O, C> > >* <anonymous> = 0; ElementTagType = tensorstore::Shared<nlohmann::basic_json<> >; long int Rank = 0; tensorstore::ArrayOriginKind OriginKind = tensorstore::ArrayOriginKind::zero; tensorstore::ContainerKind LayoutContainerKind = tensorstore::ContainerKind::container]'
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/array.h:1147:52:   required from 'tensorstore::SharedArray<Element, 0> tensorstore::MakeScalarArray(const Element&) [with Element = nlohmann::basic_json<>; SharedArray<Element, 0> = Array<Shared<nlohmann::basic_json<> >, 0, tensorstore::ArrayOriginKind::zero, tensorstore::ContainerKind::container>]'
/home/TensorStore/code/write_file.cpp:33:69:   required from here
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/util/element_pointer.h:331:56: error: call of overloaded 'static_pointer_cast<tensorstore::ElementPointer<tensorstore::Shared<nlohmann::basic_json<> > >::Element>(std::enable_if_t<true, std::shared_ptr<nlohmann::basic_json<> > >)' is ambiguous
  331 |                  internal::static_pointer_cast<Element>(
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  332 |                      internal_element_pointer::ConvertPointer<Pointer>(
      |                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  333 |                          std::forward<SourcePointer>(pointer)))) {}
      |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/internal/intrusive_ptr.h:122,
                 from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/context_impl_base.h:35,
                 from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/context.h:30,
                 from /home/TensorStore/code/write_file.cpp:1:
/home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/internal/memory.h:65:27: note: candidate: 'std::shared_ptr<_Tp> tensorstore::internal::static_pointer_cast(std::shared_ptr<_Tp>&&) [with T = nlohmann::basic_json<>]'
   65 | inline std::shared_ptr<T> static_pointer_cast(std::shared_ptr<T>&& other) {
      |                           ^~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/12/memory:77,
                 from /home/TensorStore/code/build/_deps/absl-src/absl/status/internal/status_internal.h:17,
                 from /home/TensorStore/code/build/_deps/absl-src/absl/status/status.h:59,
                 from /home/TensorStore/code/build/_deps/tensorstore-src/tensorstore/context.h:28:
/usr/include/c++/12/bits/shared_ptr.h:745:5: note: candidate: 'std::shared_ptr<_Tp> std::static_pointer_cast(shared_ptr<_Tp>&&) [with _Tp = nlohmann::basic_json<>; _Up = nlohmann::basic_json<>]'
  745 |     static_pointer_cast(shared_ptr<_Up>&& __r) noexcept
      |     ^~~~~~~~~~~~~~~~~~~
/usr/include/c++/12/bits/shared_ptr.h:700:5: note: candidate: 'std::shared_ptr<_Tp> std::static_pointer_cast(const shared_ptr<_Tp>&) [with _Tp = nlohmann::basic_json<>; _Up = nlohmann::basic_json<>]'
  700 |     static_pointer_cast(const shared_ptr<_Up>& __r) noexcept
      |     ^~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/12/bits/shared_ptr.h:53:
/usr/include/c++/12/bits/shared_ptr_base.h:1929:5: note: candidate: 'std::__shared_ptr<_Tp1, _Lp> std::static_pointer_cast(const __shared_ptr<_Tp2, _Lp>&) [with _Tp = nlohmann::basic_json<>; _Tp1 = nlohmann::basic_json<>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]'
 1929 |     static_pointer_cast(const __shared_ptr<_Tp1, _Lp>& __r) noexcept
      |     ^~~~~~~~~~~~~~~~~~~

I am using the latest version 0.1.31 of TensorStore, running on Linux 22.04.

from tensorstore.

jbms avatar jbms commented on May 14, 2024

It looks like you are building in c++20 mode, which we haven't yet tested, but definitely intend to support.

The issue is this overload here:

inline std::shared_ptr<T> static_pointer_cast(std::shared_ptr<T>&& other) {

We need to disable that overload when in C++20 mode.

from tensorstore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.