Comments (7)
Histograms and protocol buffers in context of R presented in this paper:
https://arxiv.org/abs/1401.7372
from physt.
Link: https://github.com/ryanmackenziewhite/physt/tree/protobuf
from physt.
Hi Ryan,
your suggestion looks nice. I have to study the protocol buffers to properly understand all the implications and benefits, so sorry for the delay (in the past and also future).
Just to give a short initial feeling (which may change after proper studying):
- I like the option and I would surely include it.
- I would probably try to put protobuf support in a separate module, not integral part of the class (but this is negotiable ;-)).
- We cannot enforce anyone to name their histograms. From the analysis perspective, this is an extra step that makes sense only when storing. If the protocol requires this, we may come up with some auto-naming. Or even throw an exception, no problem, just the name cannot be a requirement to construct histograms.
One additional note:
- I am (very slowly) rethinking the structure of physt (see branch "v04"). I don't have the time (or mental abilities) to do it right (re-iterating now and then the design before commiting myself to it) so it will not come in a near future, but it's worth expecting. If you were interested in commenting the ideas, you'd be welcome.
Thanks a lot and I am starting with your article :-) I will try to come back with more competent comments asap.
Cheers,
Jan
from physt.
Hi,
Thanks for the reply. I implemented really quickly to get some idea how it might work. I, too, would likely need to think about this further. I've been mulling over how to do this somewhat correctly with little progress. For metadata information, the protocol buffers, in general appear to be the way to properly maintain, sustain, and pass the information around. The cross language capability and versioning are the key features that make this an ideal format (for any metadata).
After some further thought, I also have the same conclusions.
- Separate module which is more of a wrapper to physt which provides the conversion to/from protobuf messages.
- Naming of histograms is important when creating collections, so again should be separate from physt and not required. With protobuf, we can use:
map<string, Histogram>
- If properly designed, underlying changes to how physt works should not change the data model. A histogram is a histogram, bins, frequencies, errors2, overflow, underflow. Everything else would be extensible meta data that should be easy to version with the protocol buffer.
Tensorflow has a histogram proto, but lacks some important information that is contained in physt. Nevertheless, worthwhile as reference:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto
Ryan
from physt.
Hi Ryan,
finally I got to reading about protobufs and your code. I partly copied lines from your commits, partly wrote code from scratch (in order to fit with restructured physt.io
module and to deal with multidimensional histograms). Please have a look (version as of 915a0cb) and let me know if you find the new release to your liking.
I am still in doubt about the Collection - whether to make it a collectively manipulable entity (with shared bininnigs, meta data etc.)... This part of API will probably change (hopefully keeping the old message readable).
Also let me know if you want more credits than a mention in README, you deserve it for the idea and the initial implementation.
Thanks
from physt.
from physt.
Hi, you are welcome :-)
from physt.
Related Issues (20)
- Dask example broken HOT 1
- Usage of spherical histogram HOT 3
- Smooth polar histograms? HOT 7
- ImportError with newer plotly HOT 1
- Change optional dependency from uproot3 to uproot4
- Clarify the terms values / frequencies / bins / binnings...
- Automate documentation HOT 1
- Change histogram to accept data of any dimension and work a bit like histogramdd HOT 1
- Change stats into a (data)class HOT 1
- Move xarray into compat
- Remove Travis HOT 1
- Remove missed_inner and remove under/over-flow after any dangerous operation
- Be more explicit about bins too narrow for float representation HOT 1
- Consider using just float64 for the frequencies
- Wrong dask typing annotations
- Rename "human" binning to "pretty" HOT 2
- Make compatible with astropy / pint units
- Support pola.rs HOT 1
- Automate and fix doc building
- Make binnings adhere to sklearn Transformer API HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from physt.