Code Monkey home page Code Monkey logo

flepimop's Introduction

flepiMoP

Welcome to the Johns Hopkins University Infectious Disease Dynamics COVID-19 Working Group's Flexible Epidemic Modeling Pipeline(“FlepiMoP”, formerly the COVID Scenario Pipeline, “CSP”), a flexible modeling framework that projects epidemic trajectories and healthcare impacts under different suites of interventions in order to aid in scenario planning. The model is generic enough to be applied to different spatial scales given shapefiles, population data, and COVID-19 confirmed case data. There are multiple components to the pipeline, which may be characterized as follows: 1) epidemic seeding; 2) disease transmission and non-pharmaceutical intervention scenarios; 3) calculation of health outcomes (hospital and ICU admissions and bed use, ventilator use, and deaths); and 4) summarization of model outputs.

We recommend that most new users use the code from the stable main branch. Please post questions to GitHub issues with the question tag. We are prioritizing direct support for individuals engaged in public health planning and emergency response.

For more details on the methods and features of our model, visit our preprint on medRxiv.

This open-source project is licensed under GPL v3.0.

Tools for using this repository

Docker

A containerized environment is a packaged environment where all dependencies are bundled together. This means you're guaranteed to be using the same libraries and system configuration as everyone else and in any runtime environment. To learn more, Docker Curriculum is a good starting point.

Starting environment

A pre-built container can be pulled from Docker Hub via:

docker pull hopkinsidd/flepimop:latest-dev

To start the container:

docker run -it \
  -v <dir1>\:/home/app/flepimop \
  -v <dir2>:/home/app/drp \
hopkinsidd/flepimop:latest

In this command we run the docker image hopkinsidd/flepimop. The -v command is used to allocate space from Docker and mount it at the given location. This mounts the data folder to a path called drp within the docker environment, and the flepimop folder to flepimop.

You'll be dropped to the bash prompt where you can run the Python or R scripts (with dependencies already installed).

Building the container

Run docker build -f build/docker/Dockerfile . if you ever need to rebuild the container after changing to the top directory of flepiMoP.

Note that the container build supports amd64 CPU architecture only, other architectures are unconfirmed. If you are using M1 MAC etc., please use the build kit to build an image with specifying the platform/architecture.

flepimop's People

Contributors

alsnhll avatar csmith701 avatar eclee25 avatar emprzy avatar epicfarmer avatar fang19911030 avatar herambgupta avatar hrmeredith12 avatar iddynamics-group avatar javierps avatar jcblemai avatar jkamins7 avatar jlessler avatar juanderone avatar jwills avatar kgrantz avatar kjsato avatar kkintaro avatar perifaws avatar salauer avatar samshah avatar saraloo avatar scnerd avatar shauntruelove avatar shwohl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

guibuzi emprzy

flepimop's Issues

local_install.R fails in conda flepimop-env

I get error (even after running twice):

ERROR: dependencies ‘cdlTools’, ‘ggraph’, ‘tidygraph’ are not available for package ‘flepicommon’

  • removing ‘/Users/Ali/anaconda3/envs/flepimop-env/lib/R/library/flepicommon’
    ERROR: dependency ‘flepicommon’ is not available for package ‘config.writer’
  • removing ‘/Users/Ali/anaconda3/envs/flepimop-env/lib/R/library/config.writer’
    ERROR: dependency ‘flepicommon’ is not available for package ‘inference’
  • removing ‘/Users/Ali/anaconda3/envs/flepimop-env/lib/R/library/inference’
    Warning messages:
    1: In install.packages(loc_pkgs, type = "source", repos = NULL) :
    installation of package ‘build/../flepimop/R_packages//flepicommon’ had non-zero exit status
    2: In install.packages(loc_pkgs, type = "source", repos = NULL) :
    installation of package ‘build/../flepimop/R_packages//config.writer’ had non-zero exit status
    3: In install.packages(loc_pkgs, type = "source", repos = NULL) :
    installation of package ‘build/../flepimop/R_packages//inference’ had non-zero exit status

Harmonize runs logs on SLURM

RIght now, logs are saved on two different location (the general log, and the one specific to filterMC). There should be one location for the user to check.

Add check that the input data all align.

Add a check that the input data all align and contain the required columns. This includes:

  • any input files that get input in the data repos or generated in submission
  • us_data
  • time series parameter data (i.e., vaccination)
  • geodata
  • seeding population data
  • mobility
  • others?

if they are not correct, kill the job with a useful message.
Add options to specific columns required where possible.

Postprocessing: provide options

Right now on SLURM the post-processing script runs all postprocessing available and sends the reports to the csp_production chat.

We should:

  • Specify if the results are sent on slack or not
  • If they are, choose either a personal channel or #csp_production
  • Allow just a subset of processing scripts to be run.

Postprocessing updates with new config structure

Need to update postprocessing to ensure it works for new config structure.

Also need to update postprocessing scripts to work with Hubverse formats for SMH and Flusight, and add new FluSight targets.

Update:

  • sim_processing_source.R
  • plot_predictions.R
  • run_sim_processing_template.R
  • processing_diagnostics
  • Write function/script/section to update formats to new hub formats
  • Write script to include new flusight targets

script: duplicate failed slots from past runs

When some slots failed, they are carried over from resume to resume. We should provide a script that download the S3 bucket and duplicate the simulations with the highest likelyhood to fill the blank from failed slots.

Timezone conversion messing up discrete days for model output?

In super simple configs (SEIR model with 2 subpopulations) where I tell it to seed 5 individuals S->E on a certain date, with no other initial conditions or seedings, I actually see individuals in E on the day before, and individuals in other subpopulations than the one seeded appear on that date too.

For example for this config seeding isn't supposed to happen until Feb 1, but the (attached) output shows compartments populated before then

name: sample_2pop
setup_name: minimal
start_date: 2020-01-31
end_date: 2020-05-31
data_path: data
nslots: 1

subpop_setup:
geodata: geodata_sample_2pop.csv
mobility: mobility_sample_2pop.csv

seeding:
method: FromFile
seeding_file: data/seeding_2pop.csv

compartments:
infection_stage: ["S", "E", "I", "R"]

seir:
integration:
method: rk4
dt: 1 / 10
parameters:
sigma:
value: 1 / 4
gamma:
value: 1 / 5
Ro:
value: 3
transitions:
- source: ["S"]
destination: ["E"]
rate: ["Ro * gamma"]
proportional_to: [["S"],["I"]]
proportion_exponent: ["1","1"]
- source: ["E"]
destination: ["I"]
rate: ["sigma"]
proportional_to: ["E"]
proportion_exponent: ["1"]
- source: ["I"]
destination: ["R"]
rate: ["gamma"]
proportional_to: ["I"]
proportion_exponent: ["1"]

with this in the seeding file

image (1)

outcome_modifier description format

It's a kind of remainder until it's suppored, because such "parameter" description below is not suppored yet as of now:

outcome_modifiers: 
  scenarios:
    - ReducedTesting
  modifiers:
    DelayedTesting
      method:SinglePeriodModifier
      parameter: incidC::probability
      period_start_date: 2020-03-15
      period_end_date: 2020-05-01
      subpop: 'all'
      value: 0.5

Revise mobility to be a rate instead of absolute numbers

Currently, mobility is inputted as absolute numbers of individuals moving between locations. This is then use to calculate a rate against the inputted geodata inside the model.

This should be revised to

  1. Be inputted as a rate from the start
  2. Have the option to be time-resolved so we can vary mobility over time if we want

Remove or Restructucture "data_path" option and use in config

Currently "data_path" is a base option in the config, but it appears to only be used for geodata and mobility. We either should make it universal (i.e., all input data goes there and is pulled from there) or get rid of it and put the data_path in the path to geodata and mobility.

Error messages and gracefull failures

I'd like to compile errors that we got, and that weren't clear to the users, but the main task is that R does not propagate enough python traceback fully.

  • list out of range error while reading config
  • outcomes with the same names (including durations)

post-processing: provide full analysis

The goal is that the end-user doesn't need to analyze the runs pulling from S3 and using the Studio server.
There should be postprocessing scripts that

  1. Makes the summary CSV for submission to the Hubs
  2. makes the pdf with the run fits that are now produced manually
  3. run the diagnosis and analysis of the inference algorithm

conda environment does not build on Mac

Using Mac OSX 10.15 with Intel chip and with command line tools installed. Problem first detected May 11th. When building environment get endless cycle of package conflicts that cannot be resolved even after many hours. Expect related to cdltools package

config_version lookups and v4

By looking at parameters.py, 'v2' seems the default value but actually 'v3'. That should be modified.

And the next flepimop update will include lots of param names changes, new config_version 'v4' definition and correspoiding params names rearrangement according to it the revisoion, will be needed, because most of the test codes will be influenced.

stochastic option in config

The option to run stochastically vs deterministically should not be an environmental variable/command line input only but should also be in the config under integration::method (in addition to the deterministic methods rk4 and legacy)

catching missing outcome scenarios

Line 42 of inference_main.R and line 118 of inference_slot.R: there are mistakes in this code to catch specification of outcome scenarios that don’t exist. The variable deathrate and the variable p_death don’t exist. Fixed in branch outcome_scenarios. Updated to match code for the intervention scenarios

Docker does not work on Macs with Apple chips

Using Mac laptop with OS X 11.2. See this note for known issues: https://docs.docker.com/desktop/troubleshoot/known-issues/. Will get platform incompatibility errors ie " WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested" and then will run into problems running under emulation, most notably that files which are updated on host machine do not correctly update on mounted volume in Docker.

I tried using this workaround but Docker Desktop just hung for hours when trying to turn the Virtualization option on, eventually had to uninstall whole thing : https://collabnix.com/warning-the-requested-images-platform-linux-amd64-does-not-match-the-detected-host-platform-linux-arm64-v8/

Make a single parameter perturbation file in inference

Right now, separate function for perturb_spar, perturb_hnpi, etc, can all be combined into a single file to avoid redundancy and to make it easier to edit them.
Additionally, these should be updated to not read from the config each time but to read from the previously written files

Harmonize SLURM logs

At the moments, the slurm logs are stored in the $FLEPI_DATA repository while the logs of the inference_slots are stored into the flepimod-code directory. We should only store a single file.

SLURM: save some intermediate simulations

Right now, submission on SLURM HPC does not uses "blocks", so each chains is run on a single jobarray. However, it means that intermediate simulations are not save to S3. We need to be able to choose the frequency of these save.

Improve syntax of single initial condition file.

Branch init_file PR #54 add the ability to:

  • load seeding from a single file (just added)
seeding:
  method: "FromFile"
  seeding_file: pathtoyourfile.csv  # ideally in a data/ subfolder
  • load Initial Conditions from a single File (just added)
initial_conditions:
  method: "FromFile"
  initial_conditions_file: pathtoyourfile.csv/.parquet # ideally in a data/ subfolder

where this file is formatted like a seir file (nodes as columns, mc_name, …) and it’ll filter for the date that is the same as the config start_date (i.e the same as when we do a continuation resume)

But the existing method to load Initial Conditions from a single File is not really great
This method, which has a warning because there is no unit test covering it and I haven’t tested it in depth, but should work, sets initial condition from a csv file that is

initial_conditions:
  method: "SetInitialConditions"
  initial_conditions_file: pathtoyourfile.csv # ideally in a data/ subfolder

here the file is formatted as:

comp,place,amount
S_unvaccinated_ALPHA,01000,20

where the order of the meta compartment is the same as in the config (e.g you cannot say unvaccinated_S_ALPHA ). This method is not really finished because for now it requires that all compartments needs to be specified (I would like the user to be able to specify only a few, rest being 0 by default) and because a better syntax is needed (for meta compartments, more like seeding is)

Docker issues with directory flepimop vs flepiMoP vs flepiMoP/flepimop

There are often case-sensitive issues with naming of flepimop vs flepiMoP etc. Like if you use Docker there is already a “flepimop” directory with R packages so you have to be super careful to name the volume for the Github repository “flepiMoP” and always refer to it as such or you get errors! And, the docker repository flepimop has our custom R packages but these are repeated in the Github repo under flepiMoP/flepimop. We should probably try to avoid the random capitalization and relying on it to separate directories, and avoid this repeat of directories with (similar? identical?) content

config.writer SEIR chunk incorrectly printing rates

Something isn't parsing correctly in the rate section of the seir_chunk function:

                   "      rate: [\n",
                   paste0(sapply(X = na.omit(c(rate_seir_parts, rate_vacc_parts, rate_var_parts, rate_age_parts)),
                              function(x = X){ paste0("        ",x,",\n")}) ),
                   "      ]\n"),
               paste0(
                   "      proportional_to: [\"source\"]\n",
                   "      proportion_exponent: [[\"1\",\"1\",\"1\",\"1\"]]\n",
                   "      rate: [", paste(na.omit(c(rate_seir_parts, rate_vacc_parts, rate_var_parts, rate_age_parts)), collapse = ", "), "]\n")),
                   # "      rate: [", glue::glue_collapse(na.omit(c(rate_seir_parts, rate_vacc_parts, rate_var_parts, rate_age_parts)), collapse = ", "), "]\n")),
        "\n")

Output is giving only the first rate, rather than pasting together each corresponding rate.
e.g.

      rate: [
        ["r0*gamma"],
      ]

rather than

      rate: [
        ["r0*gamma"],
        ["1", "theta1_WILD", "theta2_WILD", "thetaW2_WILD"],
        ["1"],
        ["1", "1", "1"]
      ]

hardcoded requirement for US-specific geoids in geodata file

There are errors being caused by hard-coding that restricts us to simulating the US. For example inference_slot.R calls flepicommon:load_geodata_file to read the geodata file, but this function is expecting a column to be named “geoid” whereas you’re supposed to be able to name the column anything as long as its specified as nodename in the config. And in the section reading in ground truth data there is also some US specific stuff re fips codes, states etc. We don’t want anything US-specific outside of the get_ground_truth function.

Simple & robust post-processing

We need one version of the post-processing that does not depends on anything but the config. For each spatial node, it'll plot each outcome that is used for inference with the corresponding ground truth.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.