Code Monkey home page Code Monkey logo

Comments (5)

michaelosthege avatar michaelosthege commented on May 30, 2024

@andrybicio your concern is valid.
The clipping in the model is just done as a numerical safeguard. However, the lower bound of 10 % of the maximum could be outdated, as many countries have significantly increased their testing volume since that line was written.

The weekend/weekday exposure effect is not so much in the model itself, but more in get_scaling_factor:

# the scaling factor is calculated from a comparison between
# (test_adjusted_positive * exposure) vs. sum(observed_positive)
# but only the dates where both case and test count are available must be considered

For most countries the overall fluctuation of testing volume has a larger amplitude than the within-week fluctuation, therefore I don't expect this overestimation to be very pronounced.
Nevertheless I would be happy to discuss alternatives to exposure_profile = exposure / idata.constant_data.exposure.max().
Maybe using the max(weekly_averages) instead of the max(daily)?

from rtlive-global.

andrybicio avatar andrybicio commented on May 30, 2024

From an epidemiological point of view, given the timescales we know about COVID, strong variations should not occur daily. Therefore I think averaging over a week could be a proper solution to compute a consistent and realistic rate of positiveness. Indeed it often changes during a wave (being max when it is exponentially spreading), not making too large jumps.

One more question is related to how it works: once exposure has been initialized, the rate of positiveness is computed when both data about tests and positive results were available. Then, essentially the maximum value in the "exposure_profile" (which if I understood correctly is an array containing the daily rates of positiveness?) is used as the "exposure factor" that multiplies total_positive, and in turn, returns total_inferred?

If so, I would consider the weekly averages as a solution (or a rolling average, 7 days window), since assuming that the minimum rate is at least 10% is a really strong assumption (it leads indeed to an overestimation), while not being constant throughout an epidemic.

from rtlive-global.

michaelosthege avatar michaelosthege commented on May 30, 2024

No, the code doesn't look at the fraction of positives (postive rate), but just at the number of total tests.
exposure_profile is just a vector that is proportional to the total number of tests (per day).

Feel free to open a PR for the week-based exposure. But keep in mind that there are sometimes gaps in the data..

from rtlive-global.

andrybicio avatar andrybicio commented on May 30, 2024
  • Does it mean that if the number of tests is below observed.daily_tests.max() * 0.1 due to weekends or any other cause, it is forced to be above it? Or does it just happen only at the starting since it is how prior acts?

  • If so (it only strongly affects at the starting), how does exposure act later on? Does it force the number to be above a certain threshold? What is its domain?

  • Moreover, I think then that it is not so clear to me how scale_factor is computed: p_observe is defined as the sum of p_delay, which sums up to 0.9999999 (i.e. all infected are tested). Then: scale_factor = total_observed / total_inferred / p_observe.
    But as much as I understand: total_observed < total_inferred. How can the scale factor become more than 1, given that normalizing wrt p_delay is like multiplying by 1?

  • Has exposure_profile domain [0,1], and it is 1 when daily tests made had the maximum value?

from rtlive-global.

michaelosthege avatar michaelosthege commented on May 30, 2024

It might be easier if you print/plot the variables yourself. This is the trace for "DE / all" a few days ago:
2021-02-12 DE_all.zip

In a Jupyter notebook just load and display it with ArviZ:

idata = arviz.from_netcdf("trace.nc")
idata

The trace object contains all the information that goes into calculation of the scaling factor.
The helper function pymc3.gp.util.plot_gp_dist is ideal for plotting the densities:

# helper variable that has all chains stacked together
posterior = idata.posterior.stack(sample=("chain", "draw"))

fig, ax = pyplot.subplots()

pymc3.gp.util.plot_gp_dist(
    ax=ax,
    x=posterior.exposure.date_with_testcounts.values,
    samples=posterior.exposure.values.T
)

pyplot.show()

You can then copy the intermediate variables from the scaling factor calculation and plot them one after another.

from rtlive-global.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.