Code Monkey home page Code Monkey logo

eriksjolund / st_exp_protobuf Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 58 KB

File format for spatial gene data. The file contains a small header, gene expression data and image tiles from a high resolution photo.The header serves as a table of contents. A single image tile can thus be retrieved without having to read the whole file.

License: Other

JavaScript 3.36% CMake 9.98% Protocol Buffer 4.26% C++ 80.57% C 1.01% Shell 0.82%
file-format gene-expression image-tiles demo protobuf

st_exp_protobuf's Issues

What information should be stored in the header?

Information we want to store can either be placed inside the header or as a byte range
outside of the header.
This is sort of an open question: What information should be stored inside the header?
The choice of what to put inside the header will influence the number of bytes read and the number of read operations that it will take for retrieving some information out of the file.

Right now the gene names and the spot coordinate information are stored outside the header but maybe we should put them inside the header?
Advantage: The osd-spot-viewer
(https://github.com/eriksjolund/osd-spot-viewer/blob/master/from_layout/index.html)
would make fewer read calls. The parsing code would be also be somewhat simpler.

Drawback: The header would get bigger. There might be times when that information is not needed
and someone retrieving other information out of the file would have to pay an extra cost.

Store arrays of byte ranges more efficiently

Instead of storing byte ranges like this

message FileRegion {
   required uint64 regionOffset = 1;
   required uint64 regionSize = 2;
}

message InterestingByteRanges {
    repeated FileRegion fileRegions = 1;
}

I think we could store them like

message FileRegions {
    // A start position could define the end position of the previous entry.
    //  To define the endPosition of the last entry we would have to
    // add an extra StartPosition in the end of the array.
    repeated uint64 startPositions = 1;
}

message InterestingByteRanges {
    required FileRegions fileRegions = 1;
}

That should reduce the size of the header.

Should the barcode be included or not?

Right now the barcode is included in the st_exp_protobuf format.
Maybe we should leave it out of the format?

To remove it, the file st_exp.proto would have to be modified here:

message Spot {
  // We limit the total number of spots to (2^32-1) as the spot id is uint32_t.
  // Therefore we limit the grid coordinates to be max (2^16-1).
  required uint32 xCoordGrid = 1;  // integer in range: 0 ... (2^16-1) 
  required uint32 yCoordGrid = 2;  // integer in range: 0 ... (2^16-1)
  required float xCoordPhyscial = 3;  // millimeter
  required float yCoordPhyscial = 4;  // millimeter
  required string barcode = 5;
}

Use correct image alignment information and not the identity matrix

Right now the crickconvert command does not store a correct imageAlignment (that relates
to micrometers in the physical world).

$ grep "imageAlignment =" protobuf_schema/st_exp_protobuf/fullsize_image.proto 
  repeated float imageAlignment = 5;
$ 

It just uses an identity matrix.

$ grep identity_matrix{ c++/serialize_to_st_exp_protobuf/serialize_to_st_exp_protobuf.cc
  const std::array<float, 4> identity_matrix{1.0, 0.0, 0.0, 1.0};
$ 

The reason for this is that the crick file format does not contain any information about how the photo pixels relate to the physical real world. This should be fixed.

As long as there is incorrect information in the imageAlignment, it doesn't make sense to introduce
any scalebar (e.g. https://pages.nist.gov/OpenSeadragonScalebar/) in a viewer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.