hopkinsidd / flepimop Goto Github PK
View Code? Open in Web Editor NEWThe Flexible Epidemic Modeling Pipeline
Home Page: https://flepimop.org
License: GNU General Public License v3.0
The Flexible Epidemic Modeling Pipeline
Home Page: https://flepimop.org
License: GNU General Public License v3.0
Add a config option named projection_date or similar.
Where we only run the time before the last datapoint for fitting, but we project the last iteration of the last slot until a later date. Saves computation for long horizons.
I get error (even after running twice):
ERROR: dependencies ‘cdlTools’, ‘ggraph’, ‘tidygraph’ are not available for package ‘flepicommon’
When some slots failed, they are carried over from resume to resume. We should provide a script that download the S3 bucket and duplicate the simulations with the highest likelyhood to fill the blank from failed slots.
RIght now, logs are saved on two different location (the general log, and the one specific to filterMC). There should be one location for the user to check.
At the moments, the slurm logs are stored in the $FLEPI_DATA repository while the logs of the inference_slots are stored into the flepimod-code directory. We should only store a single file.
Branch init_file PR #54 add the ability to:
seeding:
method: "FromFile"
seeding_file: pathtoyourfile.csv # ideally in a data/ subfolder
initial_conditions:
method: "FromFile"
initial_conditions_file: pathtoyourfile.csv/.parquet # ideally in a data/ subfolder
where this file is formatted like a seir file (nodes as columns, mc_name, …) and it’ll filter for the date that is the same as the config start_date (i.e the same as when we do a continuation resume)
But the existing method to load Initial Conditions from a single File is not really great
This method, which has a warning because there is no unit test covering it and I haven’t tested it in depth, but should work, sets initial condition from a csv file that is
initial_conditions:
method: "SetInitialConditions"
initial_conditions_file: pathtoyourfile.csv # ideally in a data/ subfolder
here the file is formatted as:
comp,place,amount
S_unvaccinated_ALPHA,01000,20
where the order of the meta compartment is the same as in the config (e.g you cannot say unvaccinated_S_ALPHA ). This method is not really finished because for now it requires that all compartments needs to be specified (I would like the user to be able to specify only a few, rest being 0 by default) and because a better syntax is needed (for meta compartments, more like seeding is)
I'd like to compile errors that we got, and that weren't clear to the users, but the main task is that R does not propagate enough python traceback fully.
The goal is that the end-user doesn't need to analyze the runs pulling from S3 and using the Studio server.
There should be postprocessing scripts that
Outcomes is very slow and produce a fragmented dataframe. Outcomes should be written in pure numpy instead of pandas + python as it does not use any specific structure.
and merge that with the continuation resume feature.
By looking at parameters.py, 'v2' seems the default value but actually 'v3'. That should be modified.
And the next flepimop update will include lots of param names changes, new config_version 'v4' definition and correspoiding params names rearrangement according to it the revisoion, will be needed, because most of the test codes will be influenced.
AWS has an error handler for failed slots. We should, on slurm, let the user know how many runs failed via flepibot.
Currently, mobility is inputted as absolute numbers of individuals moving between locations. This is then use to calculate a rate against the inputted geodata inside the model.
This should be revised to
Currently "data_path" is a base option in the config, but it appears to only be used for geodata and mobility. We either should make it universal (i.e., all input data goes there and is pulled from there) or get rid of it and put the data_path in the path to geodata and mobility.
Need to update postprocessing to ensure it works for new config structure.
Also need to update postprocessing scripts to work with Hubverse formats for SMH and Flusight, and add new FluSight targets.
Update:
sim_processing_source.R
plot_predictions.R
run_sim_processing_template.R
processing_diagnostics
Introduce seff, sact, scancel and squeue. The filesystem structure.
fix to take in the config option config$model_output_dirname
This file is very large and does not have to be written on each MCMC iteration, but just on some selected ones. Inference should exploit that.
There are often case-sensitive issues with naming of flepimop vs flepiMoP etc. Like if you use Docker there is already a “flepimop” directory with R packages so you have to be super careful to name the volume for the Github repository “flepiMoP” and always refer to it as such or you get errors! And, the docker repository flepimop has our custom R packages but these are repeated in the Github repo under flepiMoP/flepimop. We should probably try to avoid the random capitalization and relying on it to separate directories, and avoid this repeat of directories with (similar? identical?) content
We need one version of the post-processing that does not depends on anything but the config. For each spatial node, it'll plot each outcome that is used for inference with the corresponding ground truth.
Currently both geodata and initial conditions define the population and mobility. We should reduce this confusion/redundancy.
Right now, submission on SLURM HPC does not uses "blocks", so each chains is run on a single jobarray. However, it means that intermediate simulations are not save to S3. We need to be able to choose the frequency of these save.
and then copy to the /data folder for archival.
exception are fast in python, so we should be able to raise proper error identifying the precise module that failed (as reticulate does not propagate the traceback)
this should include
Right now, separate function for perturb_spar, perturb_hnpi, etc, can all be combined into a single file to avoid redundancy and to make it easier to edit them.
Additionally, these should be updated to not read from the config each time but to read from the previously written files
Right now we have to run gempyor-seir
and gempyor-outcomes
. We should provide a single command interface to scenario runs.
We need a tool to do some simple checks that specific data are pushed and match the intended formats. This should be in the project repos (COVID19_USA) and be done through continuous integration on github.
Using Mac laptop with OS X 11.2. See this note for known issues: https://docs.docker.com/desktop/troubleshoot/known-issues/. Will get platform incompatibility errors ie " WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested" and then will run into problems running under emulation, most notably that files which are updated on host machine do not correctly update on mounted volume in Docker.
I tried using this workaround but Docker Desktop just hung for hours when trying to turn the Virtualization option on, eventually had to uninstall whole thing : https://collabnix.com/warning-the-requested-images-platform-linux-amd64-does-not-match-the-detected-host-platform-linux-arm64-v8/
Right now on SLURM the post-processing script runs all postprocessing available and sends the reports to the csp_production chat.
We should:
Something isn't parsing correctly in the rate
section of the seir_chunk
function:
" rate: [\n",
paste0(sapply(X = na.omit(c(rate_seir_parts, rate_vacc_parts, rate_var_parts, rate_age_parts)),
function(x = X){ paste0(" ",x,",\n")}) ),
" ]\n"),
paste0(
" proportional_to: [\"source\"]\n",
" proportion_exponent: [[\"1\",\"1\",\"1\",\"1\"]]\n",
" rate: [", paste(na.omit(c(rate_seir_parts, rate_vacc_parts, rate_var_parts, rate_age_parts)), collapse = ", "), "]\n")),
# " rate: [", glue::glue_collapse(na.omit(c(rate_seir_parts, rate_vacc_parts, rate_var_parts, rate_age_parts)), collapse = ", "), "]\n")),
"\n")
Output is giving only the first rate, rather than pasting together each corresponding rate.
e.g.
rate: [
["r0*gamma"],
]
rather than
rate: [
["r0*gamma"],
["1", "theta1_WILD", "theta2_WILD", "thetaW2_WILD"],
["1"],
["1", "1", "1"]
]
param_from_file
is hardcoded to be TRUE in print_outcomes
Variants are also hardcoded to be capitalized in print_inference_statistics
and print_seeding
, which causes issues with our Flu set up.
There are errors being caused by hard-coding that restricts us to simulating the US. For example inference_slot.R calls flepicommon:load_geodata_file to read the geodata file, but this function is expecting a column to be named “geoid” whereas you’re supposed to be able to name the column anything as long as its specified as nodename in the config. And in the section reading in ground truth data there is also some US specific stuff re fips codes, states etc. We don’t want anything US-specific outside of the get_ground_truth function.
In super simple configs (SEIR model with 2 subpopulations) where I tell it to seed 5 individuals S->E on a certain date, with no other initial conditions or seedings, I actually see individuals in E on the day before, and individuals in other subpopulations than the one seeded appear on that date too.
For example for this config seeding isn't supposed to happen until Feb 1, but the (attached) output shows compartments populated before then
name: sample_2pop
setup_name: minimal
start_date: 2020-01-31
end_date: 2020-05-31
data_path: data
nslots: 1subpop_setup:
geodata: geodata_sample_2pop.csv
mobility: mobility_sample_2pop.csvseeding:
method: FromFile
seeding_file: data/seeding_2pop.csvcompartments:
infection_stage: ["S", "E", "I", "R"]seir:
integration:
method: rk4
dt: 1 / 10
parameters:
sigma:
value: 1 / 4
gamma:
value: 1 / 5
Ro:
value: 3
transitions:
- source: ["S"]
destination: ["E"]
rate: ["Ro * gamma"]
proportional_to: [["S"],["I"]]
proportion_exponent: ["1","1"]
- source: ["E"]
destination: ["I"]
rate: ["sigma"]
proportional_to: ["E"]
proportion_exponent: ["1"]
- source: ["I"]
destination: ["R"]
rate: ["gamma"]
proportional_to: ["I"]
proportion_exponent: ["1"]
with this in the seeding file
Using Mac OSX 10.15 with Intel chip and with command line tools installed. Problem first detected May 11th. When building environment get endless cycle of package conflicts that cannot be resolved even after many hours. Expect related to cdltools package
Line 42 of inference_main.R and line 118 of inference_slot.R: there are mistakes in this code to catch specification of outcome scenarios that don’t exist. The variable deathrate and the variable p_death don’t exist. Fixed in branch outcome_scenarios. Updated to match code for the intervention scenarios
Add a check that the input data all align and contain the required columns. This includes:
if they are not correct, kill the job with a useful message.
Add options to specific columns required where possible.
It's a kind of remainder until it's suppored, because such "parameter" description below is not suppored yet as of now:
outcome_modifiers:
scenarios:
- ReducedTesting
modifiers:
DelayedTesting
method:SinglePeriodModifier
parameter: incidC::probability
period_start_date: 2020-03-15
period_end_date: 2020-05-01
subpop: 'all'
value: 0.5
The option to run stochastically vs deterministically should not be an environmental variable/command line input only but should also be in the config under integration::method (in addition to the deterministic methods rk4 and legacy)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.