nhsx / synpath Goto Github PK

Proof Of Concept - Open Patient Pathway Generator using and an agent based approach

License: MIT License

Jupyter Notebook 26.19% Python 73.81%

nhs pathways simulation synthetic

synpath's Introduction

Welcome to `patient_abm`!

A library for generating synthetic electronic health records in FHIR v4 format using agent-based modeling to simulate patient pathways.

See the redacted final report from the March 2021 project development for an overview of the main components as well as suggested future developments - "REDACTED_C245 ABM Patient Pathways_Final Report_V3_28042021.cleaned.pdf"

Description

The simulation models a single patient interacting with environments (hospitals, GPs, etc) which can prompt updates to the patient's record.

Patients and Environments are modelled as agents. They are python class objects of type PatientAgent and EnvironmentAgent respectively, and are located in

src/patient_abm/agent

The simulation is configured by a configuration script, and the details of the patient-environment interactions must be implemented in the intelligence layer (see the relevant sections below for more details). We have provided templates for these elements.

Project Stucture

The main code is found in the src and template folders of the repository (see Usage below for more information)
The accompanying report is also available in the reports folder

Installation

`pip` installation

This repository has been tested using and

Use your terminal to cd into the directory containing this README (the project root directory) and run:

pip install .

Alternatively, if you want to develop and edit the library, then run

pip install -e ".[dev]"

Environment variables

In the project root directory, run:

export PATIENT_ABM_DIR="$(pwd)"

Running a simulation

After installing patient_abm, to run a patient pathway simulation, you must:

(1) Set up the simulation configuration script config.json

(2) Implement the intelligence layer, which:

governs how the Patient and Environment agents interact;
generates new Patient record entries;
decides which Environment the patient should visit next and at what time;
optionally applies custom updates the Patient and Environment agents.

In the folder template we provide templates for the config.json and the intelligence layer. The subfolder, also called template, contains empty template files, whereas the subfolder example contains example files.

You config.json and intelligence_dir can be located anywhere - they do not need to be inside this repo.

After completing this, in the terminal, run:

patient_abm simulation run --config_path </path/to/config.json>

to run the simulation. Angular brackets <...> here and in the following indicate places where the user needs to supply their own values, or where values are automatically generated by the simulation. For instance, if you want to run the config.json in template/example this this is the command

patient_abm simulation run --config_path template/example/config.json

Its outputs can be found in template/example/outputs.

This will load and validate the config.json, load the variables from the config, and then run the simulation one patient at a time, saving the outputs after each simulation.

The following folder structure and outputs are created in the save_dir defined in the config.json:

<simulation_id> /
    agents /
        patient_<patient_id>.tar
	    environment_<environment_0_id>.tar
	    environment_<environment_1_id>.tar
        ...
    fhir /
	    bundle.json
    main.log
    patient.log

where a unique simulation_id is automatically generated for every patient in the config.json.

The configuration file `config.json`

The configuration file config.json contains all the information needed to initialize:

All the simulation Patients
All the simulation Environments (each Patient's 'universe'), along with the names of the interactions that the intelligence layer can apply when the Patient is present at an Environment
Path to the intelligence layer directory, intelligence_dir
Path to the save_dir directory in which the simulation outputs will be written
Any other simulation parameter, such as stopping conditions, logging frequency, etc.

The config.json is a file with key-value pairs:

{
    key_0: <value_0>,
    key_1: <value_1>,
    ...
}

Below we provide the definition for each key and what the user is expected to provide as the corresponding value

`patients`

The key patients refers to data that should be used to initialize patient agent objects. You can enter its value in one of two ways:

Write the patient data directly as a list of dictionaries. Each dictionary contains the patient class initialization arguments as key-value pairs.
Give a path to a JSON (strongly preferred) or a CSV file that contains the same data as the list of dictionaries. The reason a JSON is preferred is because correct the datatypes are preserved, and is particularly important in the case where the Patient attribute is a nested object (such as the Patient's conditions attribute.)

Note that each patient must have the following required attributes:

patient_id : Union[str, int]: Unique ID for the patient.
gender : str: Patient gender, either "male" or "female". There are many other optional attributes, see the documentation for the PatientAgent class in patient_abm.agent.patient.

Two patient can have the same patient_id.

Even though multiple patients can be listed here, the simulation only runs for one patient at a time, they do not interact.

`environments`

The key environments refers to data that should be used to initialize Environment objects. You can enter its value in one of two ways:

Write the environment data directly as a list of dictionaries. Each dictionary contains the patient class initialization arguments as key-value pairs.
Give a path to a JSON (strongly preferred) or a CSV file that contains the same data as the list of dictionaries. The reason a JSON is preferred is because correct the datatypes are preserved, and is particularly important in the case where the Environment attribute is a nested object (such as the Environment interactions attribute.)

Note that each environment must have the following required attribute:

environment_id : Union[str, int]: Unique ID for the environment. There are many other optional attributes, see the documentation for the EnvironmentAgent class in patient_abm.agent.environment.

Each environment in the list must have a unique environment_id.

Each environment's interactions attribute is a list of strings referring to functions in the intelligence layer with a specific structure. For example, if the intelligence layer directory looks like

<intelligence_dir> /
    interactions /
        general.py
	    gp.py
    intelligence.py

and there are functions in general.py called inpatient_encounter and outpatient_encounter, and two functions in gp.py called measure_bmi and diagnose_fever, then suppose we had a GP environments, its interactions list might be

interactions = [
    "general.inpatient_encounter",
    "gp.measure_bmi",
    "gp.diagnose_fever"
]

Note that default interactions located in src/patient_abm/intelligence/interactions/default get added to every environment as well. These are currently automatically added but in future could be amended.

`intelligence_dir`

The key intelligence_dir refers to the directory that contains the intelligence layer. Its value is the path string to that directory.

`save_dir`

The key save_dir refers to the directory in which the simulation outputs should be saved. Its value is the path string to that directory.

`initial_environment_ids`

The key initial_environment_ids refers to the initial Environment that each patient should visit, given by the Environment's environment_id. Its value is a dictionary, which can take several formats:

{from_id: <environment_id>}, all Patients will start from the Environment with that <environment_id>.
{from_id: [<environment_id_0>, <environment_id_1>, ...]}, the list of environment IDs must be as long as the number of Patients, each Patient will start from the Environment given in the corresponding position in the list.
{from_probability: [<p_0>, <p_1>,...]}, the list of probabilities p_i must be as long as the number of Environments. The distribution is sampled for each patient.
{from_probability: [<p_0>, <p_1>,...]}, the list of probabilities p_i must be as long as the number of Environments. The distribution is sampled for each patient.
{from_json: '</path/to/ids.json>'}, a JSON file containing initial environment IDs, one for each patient.

`stopping_condition`

The key stopping_condition refers to the condition that should cause the simulation while loop to terminate. The simulation can always terminate early if a death interaction is applied. Its value is a dictionary, which can take several formats:

{max_num_steps: <max_num_steps>}, the maximum number of steps (an integer) in the simulation.
{max_real_time: {<unit>: <value>}}, maximum real time the simulation should run for. The subdictionary is {<unit>: <value>} is passed into python's datetime.timedelta function and so should respect the parameter values there.
{max_patient_time: {<unit>: <value>}}, maximum patient time the simulation should run for. The subdictionary is {<unit>: <value>} is passed into python's datetime.timedelta function and so should respect the parameter values there.

`hard_stop`

The key hard_stop refers to a hard upper bound on the number of simulation steps. It is there to try and prevent the loop going to infinity for any reason. An integer value is expected.

`log_every`

The key log_every refers to the number of simulation steps that should execute between logging. Its value is an integer, i.e.. if it is n then logging will happen every n-th simulation step.

`log_intermediate`

If log_every > 1, then logging of simulation information between log_every steps may be lost. log_intermediate is a boolean which, if set to true, will ensure intermediate log information is collected but then actually writes to the logger in the log_every step. If false, only the information at every log_every-th simulation steps is written.

`log_patient_record`

A boolean value which, if set to true add the latest patient record entry to the logger. This should be used mainly for debugging. The full patient record is always stored in the saved patient agent tar file, and it can be recovered from there.

`fhir_server_validate`

At the end of the simulation, the patient record is converted into a FHIR Bundle resource and validated. Validation can be done using an "offline" method via the python fhir.resources library, or "online" by sending the bundle to the HAPI FHIR server (http://hapi.fhir.org/baseR4). If fhir_server_validate is true, the online method used.

`patient_record_duplicate_action`

When new patient entries are added to the patient record, a validation step is performed which checks whether the entry already exists, this is to prevent duplication. patient_record_duplicate_action decides the action to take if a duplicate is found. If it is set to "add", the new entry is added, whereas if if it is set to "skip" the entry won't be added.

Breast cancer pathway config

As an illustration for how a breast cancer pathway might look, we have provided a config.json for this in template/breast_cancer. This is simply an initial version of how this script could be configured for such a pathway, but this template and the intelligence layer can be configured to facilitate more complex dynamics.

The `intelligence` layer

The intelligence layer is a directory of python scripts. The location of the directory is given by the field intelligence_dir in the config.json. The structured of intelligence_dir is as follows:

<intelligence_dir> /
    interactions /
        <interactions_0>.py
	    <interactions_1>.py
        ...
    intelligence.py

The intelligence.py script must contain a function called intelligence. More information about the intelligence layer and how it should be structured are provided inside the respective files in template/template/<intelligence_dir>.

`nox` and tests

nox is used to check code is correctly formatted and runs the test suite. To use nox, cd the project root directory and run:

nox

Tests can also be run from this directory via

pytest tests

Notebooks

There are two demo notebooks in the notebooks folder.

`patient-agent.ipynb`

In this notebook we introduce the patient agent and its methods including:

initializing with comorbidities
adding properties to conditions, such as severity
updating the patient record
the patient record internal representation and converting to FHIR

`simulation.ipynb`

In this notebook we walk through how to run a simulation with a very simple intelligence layer and interactions. Please see above for more information about the simulation configuration script and the intelligence layer (we will not go into detail about the intelligence layer in the notebook). Here we will be using the files in template/example, and going through main processes that are called when patient_abm.simulation.run.simulate is executed (which is the function called by the CLI command patient_abm simulation run)

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidance.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

To find out more about the Analytics Unit visit our project website or get in touch at [email protected].

Acknowledgements

synpath's People

Contributors

Stargazers

Watchers

Forkers

nhsx-mirror tiyimo mbfons jrpearson500 trellixvulnteam

synpath's Issues

Inclusion of SNOMED to set additional Patient conditions

To fully specify a Patient medical condition, a disease name and code are required, and we recommend using a coding system like
SNOMED.

To do this currently, the user would need to manually write the relevant SNOMED information into a Patient record entry inside of an Intelligence Layer interaction function.

An alternative to this would be to connect the Intelligence Layer to a SNOMED server. In this case, the Intelligence Layer could
contain functionality to generate a keyword and then use this keyword to query the SNOMED server, the response would then
automatically populate the Patient condition name and code fields.

It is very likely that the response would contain multiple codes for the same keyword (perhaps even codes for some other SNOMED entity types, e.g. procedures). A mechanism for selecting a single code would need to be implemented. This could be achieved by designing a process to filter out irrelevant codes. If there are multiple relevant codes (e.g. multiple codes for “long covid”) a probability distribution could be assigned to the codes and then a single one may be sampled.

A simpler version of this could be implemented if, say, it was known a priori that only a limited set of SNOMED codes might be
required. These could be preloaded from SNOMED and written to, for example, a python dictionary, and this is then made accessible to the Intelligence Layer.

If a SNOMED query resulted in multiple codes in the response, even though a single code would be selected for updating the patient record, it is still possible to store all these responses. This could be done by adding a patient attribute that tracks all these SNOMED codes (along with the query and other relevant parameters), these codes would then be stored with the patient and accessible at the end of the simulation. The codes could also be written to the logs. This raises a further option where the user could define an “event of interest”, namely, a condition such as “multiple SNOMED codes” that prompts a message to alert the user that this event has occurred during the simulation. This “alert” system would need to be implemented.

Incorporation of dynamics within the Environment layer

There are versions of environment dynamics which may not be too onerous to implement.

For example, consider a GP in-hours service. A simple rules-based model could capture:

Low capacity / long wait times, between the hours of 8am-10am and 3pm-6pm, Mon-Fri
High capacity / short wait times, between the hours of 10am-3pm, Mon-Fri
And no capacity otherwise (including bank holidays).

Each Environment has capacity and wait_time placeholder attributes which could be used to store such information, and this could
then be used by the interaction function to decide on whether the patient needs to wait an amount of wait_time before creating the
first record entry. Also, if the interaction layer has access to other environments, it could use this information to decide on the time of next interaction or to schedule an appointment.

More complex versions where more patients or other external events are taken into consideration could be employed, but this would require far more engineering.

Expansion to multi-agent simulations, where multiple patient pathways can be simulated concurrently

The Alpha data model handles multiple patients independently.

Whilst it is technically possible to initialise a simulation with many patients in the config.json, the model will simulate patients
one after the other in a loop - all with independent environments and no interactions between patients whatsoever.
This process should be trivially parallelizable; one small enhancement would be to use a tool like python’s multiprocessing library to parallelise the simulation.

A true multi-patient agent-based simulation would involve significant changes to the codebase. If the user wishes to go down this route, they should be aware of the following:

It might be possible to still have the Intelligence Layer focussing only on updating the record of one central patient at a
time - however, it would need to be aware of other patients in its vicinity (e.g. for modelling the likelihood of one patient
infecting another with COVID-19). This would require tracking all patient locations and the distances between them.
For modelling patients further away (i.e. in other environments that the central patient may need to go to next to inform
waiting times / capacity) - this would require the ability to recognise when multiple patients are occupying the same
environment, implementing functions to prevent forbidden events such as two unrelated patients having the same GP
appointment at the same time

Depending on the logic required to process and update and patients and environments, this could be quite a complex enhancement and it may be more efficient to rewrite the framework.

Additional Patient Agent Attributes

Using PRSB core information standards mapped to FHIR

Intelligence Layer updated to include access to multiple environments 35

We identified several situations where it might be beneficial for the Intelligence Layer interactions to have access to all the
environments at every step of the simulation e.g. one environment might need to access another environment’s storage of patient
data.

Another example is in relation to environment dynamics - it may help to know the capacity / wait times of other environments in
order to determine the time of the next interaction, or schedule an appointment.

Our recommendation is to implement a version of the simulation where in each step the Intelligence Layer receives all
environments. The patient would still be directly interacting on one reference environment only, the other environments only serve to help the interaction take some actions. The patient’s health record update will still look like it has only visited a single environment.

Expansion to multiple patient pathways for the same Patient Agent

We have considered the breast cancer pathway in detail and used it to inform the development of the Alpha data model. The breast cancer pathway has helped us shape the structure of all three layers, but in particular it has influenced the attributes of the Environment.

An issue with the breast cancer pathway is that it is quite linear and very well-specified. There is therefore a risk that either: a) the resulting Alpha data model is not flexible enough to incorporate another pathway, or even multiple pathways at the same time (although we have tested the thinking of our current model on other types of cancer pathway); b) a significant amount of work is required in order to expand the Alpha data model to other patient pathways.

Throughout the development of the Alpha data model, we have been careful to try and maintain the generalisability of the model. This has been achieved by implementing quite multi-purpose Patient and Environment objects, which can in future be subclassed to more specific versions, and deferring all pathway-specific logic to the Intelligence Layer, which the user is free to implement. In principle, we therefore believe that no significant changes should be required for the core codebase (which contains the Patient and Environment agents) to work in multiple patient pathway scenarios. Instead, the complexity would fall into the Intelligence Layer.

Nevertheless, there may be certain elements in some pathways that require changes to the core codebase, for instance the social care or mental health pathways may require features that are not yet implemented, such as patient mood - this will require further consideration for how to record such observations (e.g. in python, calling the setattr method on the agent objects - note this is currently used to set patient agent kwargs as attributes)

Expand List of FHIR resources

Expand List of FHIR resources that the interaction layer can update in the patient data model

Library of generic go-to interactions which can be applied across multiple pathways

It is very likely that many pathways will have a common set of interactions, for example, booking an appointment, or the chance of
the patient developing a cold or flu and then going to the GP and being prescribed some medication.

To save rewriting code, it would make sense to develop a bank of common interaction functions which could be imported across
different pathways.

Some simple common interactions (such as the cold example above) may not be too complex. Some work would be required to
implement a method to refer to these go-to interaction modules inside of the Template Language, especially if all the scripts
containing these features all live in different directories, but this should not be too difficult.

Developing common functions for something such as all cancer pathways would be a more complex task, but would undoubtedly
be a powerful resource.

Implement ability to convert to other data formats (e.g. PRSB) - templating language and in the code repository

A key guiding principle for the Alpha data model was to develop a model which laid the foundation for producing realistic patient
health records.

There are numerous formats and architectures for representing electronic health records - e.g. HL7v2, FHIR, art-decor (from PRSB), openEHR. Due to the time constraints in this project, we focussed on generating synthetic patient records in one of these: FHIR (v4).

There are then two potential ways of converting to other formats:
a. Finding or building a data converter which translates from the FHIR v4 to the desired format. E.g. there seems to be a HL7v2 to FHIR converter (https://github.com/LinuxForHealth/hl7v2-fhir-converter), although we have heard anecdotally that these converters are not very successful. As for PRSB formats, from our understanding, some of the PRSB standards (such as the Core Information Standard) are based on FHIR, hence this conversion may be possible as well (see also enhancement 1 under ‘Patient Agent’). There are also tools to translate between different versions of FHIR (e.g. https://www.hl7.org/fhir/r3maps.html)
b. Writing a converter directly to / from the internal language we have built. For user-friendliness, we have developed an “internal patient record representation” which is a simplified language for creating Patient Agent record entries. This saves the user from writing a lot of boilerplate code in the Intelligence Layer and simplifies the logic (for instance, names of fields representing dates dates in different FHIR resource types). To import / export to FHIR v4, we have written a converter which maps from this internal language to / from FHIR. Only a few important fields from the most common resources are currently used, but there is scope to include more depending on the need.

The converter could be built directly into the core codebase (as it currently is). Or, in a future version, it could be exposed to the user via a template language, which could allow the user to define new maps between fields and FHIR resources.

These converters should be built in a modular fashion.

Ability to capture / store other interactions / communications beyond the patient agent

As well as generating a patient health record update, the Intelligence Layer could produce data to accompany a particular
interaction - such as the image of a scan, or a discharge letter for a patient.

Currently we have not modelled an interaction function that can produce such data - however, each environment has an attribute
patient_data that could be used to store such information.

Patient_data is chosen as a default dict(list) python type so that the keys can be set as the patient_id, and the values are lists
containing the data. We envisaged it could be used as follows:

Patient and environment interaction using interaction function - interaction generates some data, for example, a scan
The data could look like a dictionary:
{
real_time: 2021-03-24 15:55:25,
patient_time: 2021-03-24 15:55:25,
environment_database_name: PACS,
visible_to_environment_ids: [1, 3, 15],
patient_record_indices: [22, 23, 24],
interaction_name: write_letter,
content_type: image,
content: <scan.png>,
}
This would then be appended to the patient_data[patient_id] list

Improve the interface for inputting data on the pathway to make more user friendly

The Alpha data model was built to be as pathway-agnostic as possible - we did not want the model to rely on having a pre-defined
pathway graph (such as the Synthea approach - https://github.com/synthetichealth/synthea)..

The pathway manifests itself through the decisions made by the Intelligence Layer. Each interaction calculates a distribution over the environments, this distribution is then sampled and thus the next Environment is chosen.

One downside to this approach is that if a user does want to apply the simulation to a particular pathway, they would need to be
proficient enough at python in order to write the functionality to calculate these transition probabilities. This raises a barrier to
entry for non-technical users who wish to use the model on a particular pathway.

A way to ameliorate this issue would be to design an interface that makes it easy to define a pathway graph (and possibly the
transition probabilities) as part of the model configuration. The graph and probabilities would then need to be parsed by the model
and fed into Intelligence Layer.

An issue would still be that the probabilities of transitioning between environments would change over time, and would depend on
the Patient state - these probabilities would be unlikely to equal the initial input probabilities, so the Intelligence Layer would still need to compute the changing distributions (that is, if a realistic and somewhat flexible model was still desired).

There could be several ways to capture and encode the graphical data:

Using a tool like neo4j or networkX
Defining an adjacency matrix
Each environment could hold a list of “next_environment_ids”*

Finally, to make the template language generally more robust, a tool like Jinja could be employed.

Implement a private FHIR validator https://www.hl7.org/fhir/validation.html

Validating the FHIR health record that is generated by the Alpha data model can currently be done in two ways.

Offline - using a python library called fhir.resources - https://github.com/nazrulworld/fhir.resources:

Pros: it is fast
Cons:
o It is not clear how fast the library keeps up with new FHIR standards
o Seems to not be rigorous enough - we have found examples that pass offline validation checks but fail online
Online - by connecting to the HAPI FHIR server - https://hapifhir.io/:
Pros: listed as an official FHIR test server (https://wiki.hl7.org/Publicly_Available_FHIR_Servers_for_testing), so we expect
it is kept up to date with current FHIR standards
Cons: it is slow

To combine the best of both worlds, namely, a fast validator that keeps up to date with changing FHIR standards, we recommend
implementing a private FHIR validation server in the future. This could have the added benefit of building a custom version that, for
instance, validates FHIR data that is also compliant with PRSB core information standards.

We would recommend implementing validation at the point when the config is being read, and when an output is generated. This will protect against an invalid entry / output being generated.

Background disease measures / probabilities (e.g. based on patient age etc.)

For more complex and long-term patient pathways, it may be useful to model the chance that a patient spontaneously develops an illness during the simulation.

To make this kind of event realistic, it would be useful to know the probability of a patient getting ill given their demographics and historical health record. Determining these probabilities could come from a lookup table, which would have to be made available to the Intelligence Layer. Alternatively a more complex version could be some model which takes in relevant patient features and returns the distribution over illnesses.

This feature could be enabled and disabled by the user via a config attribute.

Note a subtlety: patient dynamics are currently fully governed by the Intelligence Layer. This means that a patient state is essentially static until it interacts with its next environment. A patient therefore can only “develop an illness” during an interaction. We could of course model that illness as having started before (by artificially setting an earlier start date) but this kind of dynamic may be limiting. Instead, a future version of the model could enable patient dynamics even without an environment.

Capturing of hidden variables

Hidden variables are typically used in agent-based simulations. An example of such a variable would be the presence of an infection before the agent has started displaying any symptoms or has been diagnosed by a doctor.

Whilst the Alpha data model does not explicitly make use of them, they could be incorporated via the Patient kwargs variable,
which can be set in the config script.

If common hidden variables were identified, these could be added as Patient attributes, perhaps with some naming convention
(such as a prefix hidden_) to clearly indicate its scope. Adding hidden variables and acting on their values will require changes to the core codebase, but should not be too complex to implement (although this depends on their intended use).

nhsx / synpath Goto Github PK

synpath's Introduction

Welcome to patient_abm!

Description

Project Stucture

Installation

pip installation

Environment variables

Running a simulation

The configuration file config.json

patients

environments

intelligence_dir

save_dir

initial_environment_ids

stopping_condition

hard_stop

log_every

log_intermediate

log_patient_record

fhir_server_validate

patient_record_duplicate_action