MCMC Issues

@robin-thompson Has an example simulated dataset, which is breaking our incremental MCMC but not standard MCMC.

We had a look into it in our meeting, but never really got anywhere. It might just be that gamma is not appropriate, but incremental vs non-incremental MCMC should be identical...

Could be related to #39. Should be considered critical until understood.

Add details on state 7.3

In state 7.3, suggest giving a bit of an overview of how this is done (Anne to write explanation about the sampling)
In state 7.3, suggest creating 3 clearly separated section, 1 on sample size (n1, n2), 1 on Mean SI, and 1 on Std SI. Maybe this can be done with graphics to help understand what things correspond to?

Fix SISample option

at the moment doesn't work from example csv file - need to add some processing to that file (at least as.matrix after reading in, probably also transpose, and maybe more - to check)

Incidence data table is incorrect

Currently, the incidence data table shows the imported cases + local cases, rather than having a column for each. This is EpiEstim being consistent with old behaviour patterns (ref: issue), and so we need to account for this in the app.

Stop installing packages using install_github, when possible

Currently, in server.R we use install_github to install the EpiEstim and CoarseDataTools from GitHub using the hackout3 branches.

Before a production release of this app, the hackout3 branches in those repositories should be merged into master and published to CRAN. Then, we can stop using install_github, which I think also means we can stop requiring devtools.

Remove misc files from the repo

There are certain files/folders that shouldn't be in this repository or don't seem to be used at the moment.
Specifically:

Unused:

Shouldn't be in repo:

anne_old_shiny/*
elisabeth_old_shiny/*

@robin-thompson Do you know what all those unused files are for? Can we remove them? Should we have another pre-loaded dataset for the MERSData?

@robin-thompson and @annecori, I realise the old app versions are there for reference at the moment, but at some point, they need to go.

Link to publication, when possible

Once the publication is available to the public, links should be provided to it. Currently, there is a TODO to link it in:

README.md
The description at the top of ui.R
The footer in the wiki

Initialisation of MCMC

At the moment in state 7.1 we ask the user to provide initial parameter values for the MCMC; in EpiEstim this is now optional and if not provided, there is some code which computes 'smart' starting values; suggest using this for the app and not giving the user the option to alter these.

Download plot 404s if nothing has been run yet

Obviously, this is fairly accurate as there is nothing to download. The tables download empty csv files. It would be nice to handle this case slightly better than by throwing 404s, though. It must be a common problem.

Documentation

We need to document everything. For now, this should be done inside a README.md in the project root.

Add license

We need to add a license to the code.

Assuming we're planning on it being completely open source, we should probably just use an MIT license.

@robin-thompson / @annecori any thoughts/objections?

Status doesn't reset unless you're looking at the plot

If you're in a table, everything finishes however you're left in a state where the status says "processing" and you have the "go" button greyed out and the "stop" button available.

This is probably historic from when the plot was the only option, and we now need to implement the same logic for when you're looking at tables.

Limit MCMC

Currently, we run into a few problems if too many users are trying to run MCMC. We should

Implement a "Too many jobs processing, please try again later" type message when multiple MCMC jobs are already running. This should prevent them actually being queued and locking up the app.
Try to estimate how long MCMC will take to run then (a) tell the user this time and check they want to continue and (b) not allow things that take more than a set amount of time. Note (b) should be disabled when the user downloads the app.

SIFromData and NonParametricSI are very similar

@robin-thompson Has pointed out that these two seem pretty similar, however, they're about as far away as they could possibly be in the decision tree.

NonParametricSI takes a vector, whereas SIFromData takes a matrix, however providing a vector to SIFromData should give the same results as giving it to NonParametricSI. Should these be combined? Currently, you can answer "no" to "Do you want to use patient data?", but then end up having to upload data for NonParametricSI which seems bad.

@annecori Do you agree that they are the same?

Uploaded data validation

When files are uploaded, we should check they are in the correct format, and perhaps be able to handle certain inconsistencies.

There are two functions in utils.R, processSerialIntervalData and processIncidenceData, which make a start on this. They, for example, check there are the correct number of columns. They should also cope with transposed data inputs.

There are lots of improvements to be done here.

Need to write dataset file descriptions into docs

State 3.1 doesn't have correct title

There should be an "Incidence Data" title.

(Add this to documentation project when fixed, since the image for this state will need updating)

Input validation

(Migrated from slack)

Currently, there are two functions, getSIState and getIncidenceState, in server.R.

These essentially look through the pivotal inputs in the decision tree to work out which final state the user was in when they clicked go.

Either these functions, or some other supplementary functions, should do some input validation (to make sure ALL necessary inputs are present, and of the correct form). The current functions do this to some extent, but we need to ensure this is done properly.

Seeds for preloaded data

When using pre-loaded data, the data is saved AFTER the MCMC run, so setting the seed will not change this.

There are basically two options:

have a big warning explaining that this is what happens
save MCMC output for say 10 different seeds
run MCMC even when using preloaded data

@annecori and @jstockwin think the first option and @robin-thompson things the second might be better. We need to work out what to do about this.

We could potentially ask the question specifically ("do you want to choose MCMC params etc yourself, or just use our pre-run MCMC output?"). This might be a bit of a pain to code, though.

Make into a package

The app should be made into a package "EpiEstimApp". This will make it easier for users to download.

N.B. At some point, we're likely to implement some kind of limit on the MCMC processing time for the hosted version. This should not be the case in the packaged version.

Remove Horrible MCMC Code

Currently, MCMC is run incrementally, 80 iterations at a time.

R runs 80 (more) iterations of MCMC, sends the data to client, returns (frees the thread)
Client pauses, so the thread is free for a while to handle other stuff, then sends the data back to R to continue
Repeat 1 and 2 until the correct number of samples achieved.

During 2, R is free to do other things, like handle other users, and there is a client-side check for if the user has pressed "Stop", in which case the data does not get sent back to R.

The code logic is pretty messy and hard to interpret because of this, and further sending data around like this is far from ideal. We'd like to improve this if at all possible. Open to ideas. Keep an eye on http://stackoverflow.com/questions/41610354/calling-a-shiny-javascript-callback-from-within-a-future.

Allow user to set the seed

Ref discussion in #42.

We should have a field which allows the user to choose a seed, so they can obtain reproducible results.

Discussion Point 1:
It would also make sense to allow them to set a seed for the MCMC. However, we really do need a different MCMC seed for each 80 iterations, so this is non-trivial. Perhaps using a random seed for the mcmc is fine, and it'll still be reproducible since the "random" seed will be the same if the overall seed is the same. Does that make sense? Thoughts, @annecori?

Discussion Point 2:
We're giving the user a lot of options. Most people will not wish to set the seed. Should we have some kind of semi-hidden "advanced" options? If so, how much should go in them? For example, I feel that most of the MCMC params (burnin, thin) are almost always fine at their defaults, should these count as advanced? Also, how should this fit in with the UI? I'm not really sure what the best way of doing this is. Perhaps near "next" we have an "advanced" button which displays the advanced options?

Site metadata/favicon

The site should have metadata with it, and we need some sort of logo for the favicon.

We'll need a short description and some keywords for the metadata.

Find example files

(Migrated from slack)

Find example files to assess states 2.1, 6.2 and following, 6.3 and following, 7.2 and following.

Width is wrong?

Just tried a width of 5 but it seemed to result in widths of 4?

Remove Erlang distribution from app

The decision has been made that it is no longer supported and so should be completely removed as an option.

Need to write download instructions into docs

Input validation/error handling

This is an overarching issue wich supersedes #5 and #35. The codebase has changed significantly (mainly in #43) and the comments in the older issues are no longer quite so relevant.

There is a handleError function in server.R. This function needs to be expanded massively to handle a wide variety of errors and pass useful messages on to the client. Enclosing inputs or groups of inputs with a div of class errorBox allows a red box to be highlighted around those inputs. This should be used where possible.

Further, the handleState function should validate all inputs (e.g. check they are integers, positive etc etc) and throw errors if not (the errors will be caught by handleError, which should handle them appropriately).

The logic might seem a little messy (throwing an error and then handling it properly elsewhere), but some of the errors come from external sources like EpiEstim, and I think this is a nice and explicit way of handling them.

Any errors not explicitly handled are thrown as a JavaScript alert. Currently, all but two errors are handled in this way. We need to make this much better. Below is a list, which should be kept up to date, of all errors which need to be nicely handled. Please update the list as you come across new errors.

Bad initial MCMC params, which cause the chain not to converge.
All errors thrown by the process functions when a file is uploaded

Imported cases are broken

Currently, the incidenceData is loaded and then ran through EpiEstim::process_I, which changes the first "local" case to an "imported" case. We then upload the imported data, subtract the imported cases from the local ones, and add a new local column. Because EpiEstim has also done this in the first row, we end up with a negative entry.

Update readme etc

The readme still links to shiny.jakestockwin.co.uk, which is now running a very outdated version of the app. Instead, we should link to the installation instructions in the wiki.

window width is wrong

I think at the moment the window width is wrong, i.e. when one chooses 7 days as the window, the actual window used is 8 days
Also, the estimation can only ever start at time step 2 after the first incident case; so for a weekly window the first interval to consider should be T.Start = 2 and T.End = 8

There should be a way to download data/graphs

This is simply something that needs implementing.

Check MCMC convergence?

It's maybe worth checking some kind of convergence criteria when running MCMC, and showing a warning if there are concerns about the convergence?

Issues in the decision tree

In decision tree, under state 6.1 suggest writing 'Run SIFromSample (Preloaded posterior SI distr)'
In decision tree, under state 6.3 suggest writing 'Run SIFromSample (Uploaded posterior SI distr)'
In decision tree, under state 7.2 suggest writing 'Run NonParametricSI (Uploaded SI distr)'
In decision tree, under state 7.3 suggest writing 'Run NonParametricSI (Preloaded SI distr)'
In decision tree, over states 5.3 and 5.4 I think uncertainty=TRUE/FALSE is swapped around?

Rewording text in shiny app

Store preloaded SI Sample data as csv not RData

Currently, we store the SI Sample Data as a .RData file. I think this is legacy code from when we had to have the full MCMC object not just the samples.

We should probably store these as .csvs instead. However, there is one for each distribution type, which might be slightly annoying.

It would be nice for the user to be able to view the csv files in this repo so they can see how their own files should be formatted.

dataset size

One of the datasets is HUGE!
In SIPosteriorSamples, the file Rotavirus_SISamples_L.csv is 5.2MB!
All the others are about 50k...
I think there must be an issue?

ALPHA RELEASE

Alpha Release

Today I am pleased to announce that I am now considering the app to be in alpha release. This means all known bugs have been ironed out, and the app is as I expect it to be when it is released aside from any additional aesthetic changes.

However, the app is largely untested. I personally don't understand the details of how EpiEstim etc is working to be able to make my own data and then sanity-check the output. Therefore, we need as much help as possible to test the app.

At the same time, I would also like to test out the documentation, currently stored in the wiki of this repository, here. I am therefore not intending to give you any instructions on how to install or use the app, as you should be able to find everything you need from the documentation.

What do I need to do?

Please go to the documentation page here to find installation instructions, and from that page, you should easily be able to find the "interactive documentation". This will guide you through how to use the app step by step.

The main thing that needs testing is when you upload your own files, so try to click "own data" rather than "pre-loaded data" as much as possible. You may need to generate some data yourselves.

What if I find an issue or the documentation is unclear?

In either case, please submit a new issue using this issue tracker or email us (Jake at [email protected] and Robin at [email protected]). In that issue, please explain the problem in as much detail as possible. If necessary and possible, please also give the files you are using to create the error.

Before submitting an issue, please check the issue list to ensure nobody else has already reported the same thing.

Testing for errors

The app should also throw user-friendly errors where possible. In particular, if an input is bad it should draw a red box around the input, and give a user-friendly error at the bottom of the page (under the next button). If instead the app either (a) completely crashes or (b) opens a popup window then please let me know. A popup window is the app handling the error itself, but in most cases I would rather it throw a nicer error and highlight the bad input.

The only current exception to this is if you choose a low number of MCMC iterations and the convergence check fails. In this case, you will get a popup, which is currently intentional (although if anyone has better ideas about how to handle this then let me know).

If you could try throwing a few stupid inputs at the app to try and make it fail, that would be good too.

Questions/Suggestions

If you have a general question about the app or want to mention something else that's not really an issue, then please comment on this issue below.

Design Ideas

I'm a developer, but not a designer. If anyone has any good ideas for how we could make the app look better, I'm all ears.

How can I help?

In general, if you raise an issue and are interested in helping to fix in, let me know!
If your submitting an issue about documentation, writing what you think it should say would be helpful.

If you know your way around the RSelenium and testthat packages, there are a ton of tests that need writing - again, let me know if you'd like to help out.

LICENSE is not listed in DESCRIPTION

* checking top-level files ... NOTE
File
  LICENSE
is not mentioned in the DESCRIPTION file.

Seed text is wrong in state 9.1

It currently says:

Set a seed to be used by EpiEstim. A random one will be chosen this is left blank

but should be talking about the MCMC parameter.

Pressing "Stop" makes "Go" not work unless things have changed

Because everything is reactive on the inputs, if nothing has changed clicking "go" a second time does nothing.

This is nice but breaks if "Stop" is pressed and interrupts the output, as pressing "Go", "Stop", "Go" means nothing will happen. Maybe it's bad UI not to re-run anyway - the user might get confused, and also might want to test out the randomness.

Edit default values

As explained in column E in this document: https://docs.google.com/spreadsheets/d/1c2h1lEZ5uF9PGZ57iamFoL3kz9Bz8rKMiZ1U1hM-MsA/edit#gid=0

Clientside Input Validation

Currently, the user will not get an error until they click "Go".

Further, some of R Shiny's max/min conditions do not throw errors if the conditions are broken (for example MCMC init pars can be <0 and not throw a proper error).

We should add some basic clientside validation within the javascript, and check the inputs on the current page when the user clicks next. We can then highlight the input in red and give specific errors etc, which will be much nicer than current.

Allow Imported Cases

EpiEstim now handles imported cases. We should do this too.

Progression bar

Add progression bar for all versions with uncertainty and/or MCMC running

TESTING

There is a lot of testing which should happen. This issue is pretty blank for now but should be updated to add places in need of testing.

It may also be worth looking into automated unit/end2end testing. I don't know how easy this is in R, or if it's even worth it.

TODO

Suite 001 - Basic

Write connection test

Suite 002 - States

Suite 03 - Error checks

Suite 004 - End to end (e2e) tests

Write tests to compare runs with different incidence data
Write tests to compare runs with endpoint 8.1: SIFromSample (preloaded)
Write tests to compare runs with endpoint 9.1: SIFromData
Write tests to compare runs with endpoint 8.3: SIFromSample (uploaded)
Write tests to compare runs with endpoint 7.3: UncertainSI
Write tests to compare runs with endpoint 8.4: ParametricSI
Write tests to compare runs with endpoint 9.2: NonParametricSI (uploaded)
Write tests to compare runs with endpoint 9.3:NonParametricSI (preloaded)

Plots

We now have our own code to handle the plots. This is because values$epiEstimOutput is a reactive value, and we want the plots to update whenever this is updated.

EpiEstim has a large block of code here to decide exactly what to plot. We want to use this, really. Ideally, EpiEstim's plots function would handle taking the entire EpiEstim output object and produce all three plots as appropriate, maybe moving the above linked code block from EstimateR and into plots?

@annecori Thoughts?

Limit MCMC (2)

When hosting the app, we should try to estimate how long MCMC is going to take, and if it's too long we should tell the user to download EpiEstimApp locally and run it themselves, as our server can only cope with so many MCMC processes running at once.

Incremental MCMC Seed issues

MCMCpack sets a seed for some reason. This means multiple runs are identical.

Since we're running MCMC incrementally, 80 iterations at a time, the random sequence will be the same for every 80 iterations. This may be biasing our MCMC chain. We should set seeds manually.

Ref: nickreich/coarseDataTools#45

Documentation needs updating following wording changes

#104 updates wording. The documentation needs to be updated with this.
UPDATE: #107 updated the datasets, so the documentation needs to be updated here too.
UPDATE: In testing, a few typos have been updated. Added relevant states to list.

States:

jstockwin / epiestimapp Goto Github PK

epiestimapp's People

Contributors

Stargazers

Watchers

Forkers

epiestimapp's Issues