Comments (17)
(I've removed the 3.0 milestone, because this can be done as a non breaking change since we have separated the definitions of entropies from discrete estimators)
from complexitymeasures.jl.
I wouldn't name the field frequencies
. I would name it counts
. I always found frequency an odd word to use for integer numbers.
from complexitymeasures.jl.
DiscreteInfoEstimator
does not depend on the probabilities estimation in any way so there is no reason to make it its field. So I vote no changes to be done.
from complexitymeasures.jl.
I was looking into implementing these now, but came across something we need to resolve:
Many of these estimators are functions of the raw counts of the outcomes, not the normalised counts (i.e. probabilities). To accommodate this, I propose that we simply add another field frequencies
to Probabilities
, where frequencies[i]
corresponds to probs[i]
. This way, both counts and probabilities are readily available.
What do you think, @Datseris?
from complexitymeasures.jl.
This sounds like a big change because many internal functions already compute probabilities, mainly the fasthist
so I can't even count from the top of my head how many files you would need to alter to correctly track this change "everywhere" :(
But I can't think of a different way. If you really need the true counts.
So let's go with this. This means that the internal normed
argument of the Probabilities
constructor must be dropped. Actually, this is a good way for you to find out which methods will fail. Just remove the normed
as now we must never give pre-normed probabilities. The test suite will tell you how many fail.
Let's do this PR before anything else to keep things clean. I am reallhy worried about this change I fear it may break many things.
from complexitymeasures.jl.
But I can't think of a different way. If you really need the true counts.
Yes, several of the estimators do require the raw counts. I can attempt a PR and see how involved it will be.
So let's go with this. This means that the internal normed argument of the Probabilities constructor must be dropped. Actually, this is a good way for you to find out which methods will fail. Just remove the normed as now we must never give pre-normed probabilities. The test suite will tell you how many fail.
Agreed.
from complexitymeasures.jl.
Given the current API, I am not sure how to estimate frequencies given arbitrary input x
to Probabilities
. I guess one method should dispatch to Array{<:AbstractFloat}
and one method dispatches to Array{<:Int}
. The first dispatch tries to somehow magically estimate raw counts by extracting count quantum = 1/minimum(p)
and then muliplying everything with the quantum and rounding to integer. Second method assmes array contains counts already?
from complexitymeasures.jl.
Hm. Do we actually have to estimate frequencies all the time? I think we can do
struct Probabilities
probs::AbstractVector
counts::Union{Nothing, AbstractVector}
end
This way, if isnothing(counts)
, then only MLEntropy
can be used (or other estimators that operate directly on probabilities). On the other hand, if !isnothing(counts)
, then entropy estimators that demand raw counts can also be used.
The user doesn't even need to know that Probabilities
stored counts. We just make sure to also include counts wherever possible. Trying to call entropy(WhateverFancyEstimator(Shannon()), est, x)
will then error generically, because counts are not defined.
from complexitymeasures.jl.
Well there is nothing stopping us from estimating the counts with the method I described so why not do it alltogether. The user anyways will never know of the existence of the counts
field.
from complexitymeasures.jl.
Well there is nothing stopping us from estimating the counts with the method I described so why not do it alltogether. The user anyways will never know of the existence of the counts field.
For the integer version, it is straight-forward
# If given integer vector, then it is assumed that elements are counts of the different outcomes
Probabilities(x::AbstractVector{Int})
But for the float-version, I don't see how that would work. If I input say x = [0.1, 0.3, 0.4]
, and I want to convert it to a probability vector by normalizing, I have no idea how many counts underlie those initial fractions. To get actual counts, I'd need to also specify n
(the total number of outcomes)?
from complexitymeasures.jl.
To me, it seems like there should be three constructors.
Probabilities(::AbstractVector{<:Float})
(the current behaviour, leavescounts
asnothing
)Probabilities(::AbstractVector{<:Int})
. Treats the inputs as countsProbabilities(::AbstractVector{<:Float}, ::Int)
. Like the first, but also defines counts, because the total number of observations is known
from complexitymeasures.jl.
We could of course just create imaginary counts whose ratio respects the initial input data, but I feel uneasy doing so, because we're pretending to know information we don't have.
from complexitymeasures.jl.
Ah, but the input data gives n
automatically, so scaling like you proposed should work (up to rounding errors).
from complexitymeasures.jl.
We could of course just create imaginary counts whose ratio respects the initial input data, but I feel uneasy doing so, because we're pretending to know information we don't have.
That's what we should do. It's fine. Besides, we don't expect users to directly initialize Probabilities
. Instead, they should give input data to the probabilities
function.
from complexitymeasures.jl.
Hey @Datseris,
Since we're moving to 3.0 due to the new infoestimator-stores-the-definition API, would it make sense to do something similar for the discrete info estimators? We have the old-style syntax
struct FancyDiscreteEst{I} <: DiscreteInfoestimator
measure::I # the info measure, e.g. `Shannon()`
end
function information(est::FancyDiscreteEst{<:Shannon}, est::ProbabilitiesEstimator, x)
probs = probabilities(est, x)
# ...
end
Or, we could let the DiscreteInfoEstimator
store the ProbabilitiesEstimator
too, so that we get
struct FancyDiscreteEst{I, P <: ProbabilitiesEstimator} <: DiscreteInfoestimator
measure::I # the info measure, e.g. `Shannon()`
probest::P # e.g. `CountOccurences()`
end
function information(est::FancyDiscreteEst{<:Shannon}, x)
probs = probabilities(est.probest, x)
# ....
end
Any preference?
from complexitymeasures.jl.
I don't think it's a huge problem to have two different signatures for information
- we already do, so I am slightly leaning towards the first alternative. The reason is that it is more pedagogic - one needs to pick both an entropy estimator AND a probabilities/frequencies estimator to estimate an entropy from data. That gets hidden a bit in the second alternative.
from complexitymeasures.jl.
DiscreteInfoEstimator does not depend on the probabilities estimation in any way so there is no reason to make it its field. So I vote no changes to be done.
Ok, then I just stick with information(est::DiscreteInfoEstimator, pest::ProbabilitiesEstimator, x)
.
from complexitymeasures.jl.
Related Issues (20)
- Feature: "increment entropy" HOT 1
- Feature: "attention entropy"
- `missing_probabilities` HOT 1
- `counts_and_outcomes` for `BubbleSortSwaps` should also accept state space sets
- Syntax with type parameter `{m}` in `OrdinalPatterns` is not harmonious with the rest of the library HOT 10
- Encoding using `Dispersion` is slower than necessary due to manual integration for normal cdf
- Encoding complex-valued data HOT 2
- [Q] How to calculate MI between two vectors? HOT 3
- Latest stable documentation has an error in the `StatisticalComplexity` docstring HOT 1
- "Amplitude entropy"
- Good-Turing probabilities estimator HOT 2
- `AddConstant` estimator lacks reference
- Latest tagged release not appearing in neither stable nor dev docs HOT 1
- Our new fluctuation complexity generalization is incorrect. HOT 8
- `TsallisExtropy` doctoring missing some keyword arguments
- Fix deprecated syntax HOT 2
- Variant of `codify` that also returns the outcomes. HOT 2
- Rank-based outcome space HOT 4
- Generalize transfer operator HOT 14
- Optimize transfer operator estimation HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from complexitymeasures.jl.