Code Monkey home page Code Monkey logo

Comments (7)

ryanmwhitephd avatar ryanmwhitephd commented on May 21, 2024

Histograms and protocol buffers in context of R presented in this paper:
https://arxiv.org/abs/1401.7372

from physt.

janpipek avatar janpipek commented on May 21, 2024

Link: https://github.com/ryanmackenziewhite/physt/tree/protobuf

from physt.

janpipek avatar janpipek commented on May 21, 2024

Hi Ryan,

your suggestion looks nice. I have to study the protocol buffers to properly understand all the implications and benefits, so sorry for the delay (in the past and also future).

Just to give a short initial feeling (which may change after proper studying):

  • I like the option and I would surely include it.
  • I would probably try to put protobuf support in a separate module, not integral part of the class (but this is negotiable ;-)).
  • We cannot enforce anyone to name their histograms. From the analysis perspective, this is an extra step that makes sense only when storing. If the protocol requires this, we may come up with some auto-naming. Or even throw an exception, no problem, just the name cannot be a requirement to construct histograms.

One additional note:

  • I am (very slowly) rethinking the structure of physt (see branch "v04"). I don't have the time (or mental abilities) to do it right (re-iterating now and then the design before commiting myself to it) so it will not come in a near future, but it's worth expecting. If you were interested in commenting the ideas, you'd be welcome.

Thanks a lot and I am starting with your article :-) I will try to come back with more competent comments asap.

Cheers,
Jan

from physt.

ryanmwhitephd avatar ryanmwhitephd commented on May 21, 2024

Hi,

Thanks for the reply. I implemented really quickly to get some idea how it might work. I, too, would likely need to think about this further. I've been mulling over how to do this somewhat correctly with little progress. For metadata information, the protocol buffers, in general appear to be the way to properly maintain, sustain, and pass the information around. The cross language capability and versioning are the key features that make this an ideal format (for any metadata).

After some further thought, I also have the same conclusions.

  • Separate module which is more of a wrapper to physt which provides the conversion to/from protobuf messages.
  • Naming of histograms is important when creating collections, so again should be separate from physt and not required. With protobuf, we can use:
    map<string, Histogram>
  • If properly designed, underlying changes to how physt works should not change the data model. A histogram is a histogram, bins, frequencies, errors2, overflow, underflow. Everything else would be extensible meta data that should be easy to version with the protocol buffer.

Tensorflow has a histogram proto, but lacks some important information that is contained in physt. Nevertheless, worthwhile as reference:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto

Ryan

from physt.

janpipek avatar janpipek commented on May 21, 2024

Hi Ryan,

finally I got to reading about protobufs and your code. I partly copied lines from your commits, partly wrote code from scratch (in order to fit with restructured physt.io module and to deal with multidimensional histograms). Please have a look (version as of 915a0cb) and let me know if you find the new release to your liking.

I am still in doubt about the Collection - whether to make it a collectively manipulable entity (with shared bininnigs, meta data etc.)... This part of API will probably change (hopefully keeping the old message readable).

Also let me know if you want more credits than a mention in README, you deserve it for the idea and the initial implementation.

Thanks

from physt.

ryanmwhitephd avatar ryanmwhitephd commented on May 21, 2024

from physt.

janpipek avatar janpipek commented on May 21, 2024

Hi, you are welcome :-)

from physt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.