Create a collection of histograms in a file. Utility function to merge multiple files

Histogram collections about physt HOT 7 OPEN

janpipek commented on May 21, 2024

Histogram collections

from physt.

Comments (7)

ryanmwhitephd commented on May 21, 2024

Histograms and protocol buffers in context of R presented in this paper:
https://arxiv.org/abs/1401.7372

from physt.

janpipek commented on May 21, 2024

Link: https://github.com/ryanmackenziewhite/physt/tree/protobuf

from physt.

janpipek commented on May 21, 2024

Hi Ryan,

your suggestion looks nice. I have to study the protocol buffers to properly understand all the implications and benefits, so sorry for the delay (in the past and also future).

Just to give a short initial feeling (which may change after proper studying):

I like the option and I would surely include it.
I would probably try to put protobuf support in a separate module, not integral part of the class (but this is negotiable ;-)).
We cannot enforce anyone to name their histograms. From the analysis perspective, this is an extra step that makes sense only when storing. If the protocol requires this, we may come up with some auto-naming. Or even throw an exception, no problem, just the name cannot be a requirement to construct histograms.

One additional note:

I am (very slowly) rethinking the structure of physt (see branch "v04"). I don't have the time (or mental abilities) to do it right (re-iterating now and then the design before commiting myself to it) so it will not come in a near future, but it's worth expecting. If you were interested in commenting the ideas, you'd be welcome.

Thanks a lot and I am starting with your article :-) I will try to come back with more competent comments asap.

Cheers,
Jan

from physt.

ryanmwhitephd commented on May 21, 2024

Hi,

Thanks for the reply. I implemented really quickly to get some idea how it might work. I, too, would likely need to think about this further. I've been mulling over how to do this somewhat correctly with little progress. For metadata information, the protocol buffers, in general appear to be the way to properly maintain, sustain, and pass the information around. The cross language capability and versioning are the key features that make this an ideal format (for any metadata).

After some further thought, I also have the same conclusions.

Separate module which is more of a wrapper to physt which provides the conversion to/from protobuf messages.
Naming of histograms is important when creating collections, so again should be separate from physt and not required. With protobuf, we can use:
map<string, Histogram>
If properly designed, underlying changes to how physt works should not change the data model. A histogram is a histogram, bins, frequencies, errors2, overflow, underflow. Everything else would be extensible meta data that should be easy to version with the protocol buffer.

Tensorflow has a histogram proto, but lacks some important information that is contained in physt. Nevertheless, worthwhile as reference:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto

Ryan

from physt.

janpipek commented on May 21, 2024

Hi Ryan,

finally I got to reading about protobufs and your code. I partly copied lines from your commits, partly wrote code from scratch (in order to fit with restructured physt.io module and to deal with multidimensional histograms). Please have a look (version as of 915a0cb) and let me know if you find the new release to your liking.

I am still in doubt about the Collection - whether to make it a collectively manipulable entity (with shared bininnigs, meta data etc.)... This part of API will probably change (hopefully keeping the old message readable).

Also let me know if you want more credits than a mention in README, you deserve it for the idea and the initial implementation.

Thanks

from physt.

ryanmwhitephd commented on May 21, 2024

Hi Jan, Apologies for the delay. I have not had much of a chance to look into the details, but at a quick glance this looks great. I should get back to the histogram work in the next few weeks. If something comes up I will let you know. Thanks for mention in the README! That's more than enough credit!

…

On Fri, Sep 21, 2018 at 14:42 Jan Pipek ***@***.***> wrote: Hi Ryan, finally I got to reading about protobufs and your code. I partly copied lines from your commits, partly wrote code from scratch (in order to fit with restructured physt.io module and to deal with multidimensional histograms). Please have a look and let me know if you find the new release Also let me know if you want more credits than a mention in README, you deserve it for the idea and the initial implementation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#45 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH1ExIJYN7HdGOTwJ1vg8T1C8nu_jAZlks5udTMjgaJpZM4Vigi0> .

from physt.

janpipek commented on May 21, 2024

Hi, you are welcome :-)

from physt.

Histogram collections about physt HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent