jstockwin / epiestimapp Goto Github PK
View Code? Open in Web Editor NEWSource code for the EpiEstim app.
Home Page: https://github.com/jstockwin/EpiEstimApp/wiki
License: MIT License
Source code for the EpiEstim app.
Home Page: https://github.com/jstockwin/EpiEstimApp/wiki
License: MIT License
@robin-thompson Has an example simulated dataset, which is breaking our incremental MCMC but not standard MCMC.
We had a look into it in our meeting, but never really got anywhere. It might just be that gamma is not appropriate, but incremental vs non-incremental MCMC should be identical...
Could be related to #39. Should be considered critical until understood.
at the moment doesn't work from example csv file - need to add some processing to that file (at least as.matrix after reading in, probably also transpose, and maybe more - to check)
Currently, the incidence data table shows the imported cases + local cases, rather than having a column for each. This is EpiEstim being consistent with old behaviour patterns (ref: issue), and so we need to account for this in the app.
Currently, in server.R
we use install_github
to install the EpiEstim
and CoarseDataTools
from GitHub using the hackout3 branches.
Before a production release of this app, the hackout3 branches in those repositories should be merged into master and published to CRAN. Then, we can stop using install_github
, which I think also means we can stop requiring devtools
.
There are certain files/folders that shouldn't be in this repository or don't seem to be used at the moment.
Specifically:
Unused:
stochasticSEIRModel3.R
datasets/MERSData.csv
datasets/MERSData.xlsx
datasets/otherData.csv
datasets/otherData.txt
Shouldn't be in repo:
anne_old_shiny/*
elisabeth_old_shiny/*
@robin-thompson Do you know what all those unused files are for? Can we remove them? Should we have another pre-loaded dataset for the MERSData?
@robin-thompson and @annecori, I realise the old app versions are there for reference at the moment, but at some point, they need to go.
Once the publication is available to the public, links should be provided to it. Currently, there is a TODO to link it in:
README.md
ui.R
At the moment in state 7.1 we ask the user to provide initial parameter values for the MCMC; in EpiEstim this is now optional and if not provided, there is some code which computes 'smart' starting values; suggest using this for the app and not giving the user the option to alter these.
Obviously, this is fairly accurate as there is nothing to download. The tables download empty csv files. It would be nice to handle this case slightly better than by throwing 404s, though. It must be a common problem.
We need to document everything. For now, this should be done inside a README.md
in the project root.
We need to add a license to the code.
Assuming we're planning on it being completely open source, we should probably just use an MIT license.
@robin-thompson / @annecori any thoughts/objections?
If you're in a table, everything finishes however you're left in a state where the status says "processing" and you have the "go" button greyed out and the "stop" button available.
This is probably historic from when the plot was the only option, and we now need to implement the same logic for when you're looking at tables.
Currently, we run into a few problems if too many users are trying to run MCMC. We should
@robin-thompson Has pointed out that these two seem pretty similar, however, they're about as far away as they could possibly be in the decision tree.
NonParametricSI
takes a vector, whereas SIFromData
takes a matrix, however providing a vector to SIFromData
should give the same results as giving it to NonParametricSI
. Should these be combined? Currently, you can answer "no" to "Do you want to use patient data?", but then end up having to upload data for NonParametricSI
which seems bad.
@annecori Do you agree that they are the same?
When files are uploaded, we should check they are in the correct format, and perhaps be able to handle certain inconsistencies.
There are two functions in utils.R
, processSerialIntervalData
and processIncidenceData
, which make a start on this. They, for example, check there are the correct number of columns. They should also cope with transposed data inputs.
There are lots of improvements to be done here.
(Migrated from slack)
Currently, there are two functions, getSIState
and getIncidenceState
, in server.R
.
These essentially look through the pivotal inputs in the decision tree to work out which final state the user was in when they clicked go.
Either these functions, or some other supplementary functions, should do some input validation (to make sure ALL necessary inputs are present, and of the correct form). The current functions do this to some extent, but we need to ensure this is done properly.
When using pre-loaded data, the data is saved AFTER the MCMC run, so setting the seed will not change this.
There are basically two options:
@annecori and @jstockwin think the first option and @robin-thompson things the second might be better. We need to work out what to do about this.
We could potentially ask the question specifically ("do you want to choose MCMC params etc yourself, or just use our pre-run MCMC output?"). This might be a bit of a pain to code, though.
The app should be made into a package "EpiEstimApp". This will make it easier for users to download.
N.B. At some point, we're likely to implement some kind of limit on the MCMC processing time for the hosted version. This should not be the case in the packaged version.
Currently, MCMC is run incrementally, 80 iterations at a time.
During 2, R is free to do other things, like handle other users, and there is a client-side check for if the user has pressed "Stop", in which case the data does not get sent back to R.
The code logic is pretty messy and hard to interpret because of this, and further sending data around like this is far from ideal. We'd like to improve this if at all possible. Open to ideas. Keep an eye on http://stackoverflow.com/questions/41610354/calling-a-shiny-javascript-callback-from-within-a-future.
Ref discussion in #42.
We should have a field which allows the user to choose a seed, so they can obtain reproducible results.
Discussion Point 1:
It would also make sense to allow them to set a seed for the MCMC. However, we really do need a different MCMC seed for each 80 iterations, so this is non-trivial. Perhaps using a random seed for the mcmc is fine, and it'll still be reproducible since the "random" seed will be the same if the overall seed is the same. Does that make sense? Thoughts, @annecori?
Discussion Point 2:
We're giving the user a lot of options. Most people will not wish to set the seed. Should we have some kind of semi-hidden "advanced" options? If so, how much should go in them? For example, I feel that most of the MCMC params (burnin, thin) are almost always fine at their defaults, should these count as advanced? Also, how should this fit in with the UI? I'm not really sure what the best way of doing this is. Perhaps near "next" we have an "advanced" button which displays the advanced options?
The site should have metadata with it, and we need some sort of logo for the favicon.
We'll need a short description and some keywords for the metadata.
(Migrated from slack)
Find example files to assess states 2.1, 6.2 and following, 6.3 and following, 7.2 and following.
Just tried a width of 5 but it seemed to result in widths of 4?
The decision has been made that it is no longer supported and so should be completely removed as an option.
This is an overarching issue wich supersedes #5 and #35. The codebase has changed significantly (mainly in #43) and the comments in the older issues are no longer quite so relevant.
There is a handleError
function in server.R
. This function needs to be expanded massively to handle a wide variety of errors and pass useful messages on to the client. Enclosing inputs or groups of inputs with a div
of class errorBox
allows a red box to be highlighted around those inputs. This should be used where possible.
Further, the handleState
function should validate all inputs (e.g. check they are integers, positive etc etc) and throw errors if not (the errors will be caught by handleError
, which should handle them appropriately).
The logic might seem a little messy (throwing an error and then handling it properly elsewhere), but some of the errors come from external sources like EpiEstim, and I think this is a nice and explicit way of handling them.
Any errors not explicitly handled are thrown as a JavaScript alert. Currently, all but two errors are handled in this way. We need to make this much better. Below is a list, which should be kept up to date, of all errors which need to be nicely handled. Please update the list as you come across new errors.
Currently, the incidenceData is loaded and then ran through EpiEstim::process_I
, which changes the first "local" case to an "imported" case. We then upload the imported data, subtract the imported cases from the local ones, and add a new local column. Because EpiEstim has also done this in the first row, we end up with a negative entry.
The readme still links to shiny.jakestockwin.co.uk, which is now running a very outdated version of the app. Instead, we should link to the installation instructions in the wiki.
I think at the moment the window width is wrong, i.e. when one chooses 7 days as the window, the actual window used is 8 days
Also, the estimation can only ever start at time step 2 after the first incident case; so for a weekly window the first interval to consider should be T.Start = 2 and T.End = 8
This is simply something that needs implementing.
It's maybe worth checking some kind of convergence criteria when running MCMC, and showing a warning if there are concerns about the convergence?
width
to "Choose the width of the sliding time window for R estimation"Currently, we store the SI Sample Data as a .RData
file. I think this is legacy code from when we had to have the full MCMC
object not just the samples.
We should probably store these as .csv
s instead. However, there is one for each distribution type, which might be slightly annoying.
It would be nice for the user to be able to view the csv files in this repo so they can see how their own files should be formatted.
One of the datasets is HUGE!
In SIPosteriorSamples, the file Rotavirus_SISamples_L.csv is 5.2MB!
All the others are about 50k...
I think there must be an issue?
Today I am pleased to announce that I am now considering the app to be in alpha release. This means all known bugs have been ironed out, and the app is as I expect it to be when it is released aside from any additional aesthetic changes.
However, the app is largely untested. I personally don't understand the details of how EpiEstim etc is working to be able to make my own data and then sanity-check the output. Therefore, we need as much help as possible to test the app.
At the same time, I would also like to test out the documentation, currently stored in the wiki of this repository, here. I am therefore not intending to give you any instructions on how to install or use the app, as you should be able to find everything you need from the documentation.
Please go to the documentation page here to find installation instructions, and from that page, you should easily be able to find the "interactive documentation". This will guide you through how to use the app step by step.
The main thing that needs testing is when you upload your own files, so try to click "own data" rather than "pre-loaded data" as much as possible. You may need to generate some data yourselves.
In either case, please submit a new issue using this issue tracker or email us (Jake at [email protected] and Robin at [email protected]). In that issue, please explain the problem in as much detail as possible. If necessary and possible, please also give the files you are using to create the error.
Before submitting an issue, please check the issue list to ensure nobody else has already reported the same thing.
The app should also throw user-friendly errors where possible. In particular, if an input is bad it should draw a red box around the input, and give a user-friendly error at the bottom of the page (under the next button). If instead the app either (a) completely crashes or (b) opens a popup window then please let me know. A popup window is the app handling the error itself, but in most cases I would rather it throw a nicer error and highlight the bad input.
The only current exception to this is if you choose a low number of MCMC iterations and the convergence check fails. In this case, you will get a popup, which is currently intentional (although if anyone has better ideas about how to handle this then let me know).
If you could try throwing a few stupid inputs at the app to try and make it fail, that would be good too.
If you have a general question about the app or want to mention something else that's not really an issue, then please comment on this issue below.
I'm a developer, but not a designer. If anyone has any good ideas for how we could make the app look better, I'm all ears.
In general, if you raise an issue and are interested in helping to fix in, let me know!
If your submitting an issue about documentation, writing what you think it should say would be helpful.
If you know your way around the RSelenium
and testthat
packages, there are a ton of tests that need writing - again, let me know if you'd like to help out.
* checking top-level files ... NOTE
File
LICENSE
is not mentioned in the DESCRIPTION file.
It currently says:
Set a seed to be used by EpiEstim. A random one will be chosen this is left blank
but should be talking about the MCMC parameter.
Because everything is reactive on the inputs, if nothing has changed clicking "go" a second time does nothing.
This is nice but breaks if "Stop" is pressed and interrupts the output, as pressing "Go", "Stop", "Go" means nothing will happen. Maybe it's bad UI not to re-run anyway - the user might get confused, and also might want to test out the randomness.
As explained in column E in this document: https://docs.google.com/spreadsheets/d/1c2h1lEZ5uF9PGZ57iamFoL3kz9Bz8rKMiZ1U1hM-MsA/edit#gid=0
Currently, the user will not get an error until they click "Go".
Further, some of R Shiny's max/min conditions do not throw errors if the conditions are broken (for example MCMC init pars can be <0 and not throw a proper error).
We should add some basic clientside validation within the javascript, and check the inputs on the current page when the user clicks next
. We can then highlight the input in red and give specific errors etc, which will be much nicer than current.
EpiEstim
now handles imported cases. We should do this too.
Add progression bar for all versions with uncertainty and/or MCMC running
There is a lot of testing which should happen. This issue is pretty blank for now but should be updated to add places in need of testing.
It may also be worth looking into automated unit/end2end testing. I don't know how easy this is in R, or if it's even worth it.
SIFromSample (preloaded)
SIFromData
SIFromSample (uploaded)
UncertainSI
ParametricSI
NonParametricSI (uploaded)
NonParametricSI (preloaded)
We now have our own code to handle the plots. This is because values$epiEstimOutput
is a reactive value, and we want the plots to update whenever this is updated.
EpiEstim has a large block of code here to decide exactly what to plot. We want to use this, really. Ideally, EpiEstim's plots
function would handle taking the entire EpiEstim output object and produce all three plots as appropriate, maybe moving the above linked code block from EstimateR
and into plots
?
@annecori Thoughts?
When hosting the app, we should try to estimate how long MCMC is going to take, and if it's too long we should tell the user to download EpiEstimApp locally and run it themselves, as our server can only cope with so many MCMC processes running at once.
MCMCpack sets a seed for some reason. This means multiple runs are identical.
Since we're running MCMC incrementally, 80 iterations at a time, the random sequence will be the same for every 80 iterations. This may be biasing our MCMC chain. We should set seeds manually.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.