Light

aidh-ms / pyaki Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 0.0 480 KB

Python package to detect AKI within time series data.

License: MIT License

Python 100.00%

aki disease-detection health intensive-care

pyaki's Introduction

pyAKI

Python package to detect AKI within time series data.

The goal of this package is to establish well tested, comprehensive functions for the detection of Acute Kidney Injury (AKI) in time series data, according to the Kidney Disease Improving Global Outcomes (KDIGO) Criteria, established in 2012 ¹.

Installation

pip install git+https://github.com/aidh-ms/pyAKI

Usage

import pandas as pd

from pyAKI.probes import Dataset, DatasetType
from pyAKI.kdigo import Analyser

data = [
    Dataset(DatasetType.URINEOUTPUT, pd.DataFrame()),
    Dataset(DatasetType.CREATININE, pd.DataFrame()),
    Dataset(DatasetType.DEMOGRAPHICS, pd.DataFrame()),
    Dataset(DatasetType.RRT, pd.DataFrame()),
]

analyser = Analyser(data)
results: pd.Dataframe =  analyser.process_stays()

Tests

pytest --cov=. test/

Acknowledgement

We encourage all users to use pyAKI in their scientific work. Doing so, please use the following citation:

@misc{porschen2024pyaki,
    title={pyAKI - An Open Source Solution to Automated KDIGO classification},
    author={Christian Porschen and Jan Ernsting and Paul Brauckmann and Raphael Weiss and Till Würdemann and Hendrik Booke and Wida Amini and Ludwig Maidowski and Benjamin Risse and Tim Hahn and Thilo von Groote},
    year={2024},
    eprint={2401.12930},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Our paper can be found on arxiv.

Improving Global Outcomes (KDIGO) Acute Kidney Injury Work Group. KDIGO Clinical Practice Guideline for Acute Kidney Injury. Kidney inter., Suppl. 2012; 2: 1–138. ↩

pyaki's People

Contributors

Stargazers

Watchers

pyaki's Issues

Expand Testing On Baseline Methods

So far we are not testing if our baseline methods are working correctly, we should do that for each implemented baseline method.

Check Preprocessor For Double Processing

        for preprocessor in preprocessors: # TODO: Check Preprocessor For Double Processing

https://github.com/AI2MS/pyAKI/blob/4c6340f25b98fb89e478d12cfa16d93e7e8c48e9/pyAKI/kdigo.py#L88

Fix Type Hints

Check Preprocessor For Double Processing

        for preprocessor in preprocessors: # TODO: Check Preprocessor For Double Processing

https://github.com/AI2MS/pyAKI/blob/4c6340f25b98fb89e478d12cfa16d93e7e8c48e9/pyAKI/kdigo.py#L88

Column Name Agnostic

right now, we require strict identification of columns by stay_id, charttime etc. We should give users the possibility to provide custom column naming.

Fix coveralls

Add logging

I think we should add some logging to the modules so that it will get easier to debug.

Interpolate Urine Output

We want to define an interpolation method for the urine output. Scipys interp1d seems to be a good fit for the job, together with a cumulative sum for accurately calculating urine output in every hour. The user should be able to define a thershold for missing data when nan-Values are filled.

New Baseline: Lowest Creatinine within first 7 days

A request for a new baseline method has been made by UKM physicians: They want to define the baseline creatinine for a patient as the lowest creatinine value within the first 7 days of the ICU stay. Of course, a creatinine value on day 3 e.g. should only be compared to creatinine values on day 1 or 2. In the end, we want to have 7 baseline values for the first 7 days and after that, the last value will be kept for the rest of the stay.

User input hardening

Add test for complete analyzer

So far we do not single handedly test the whole pipeline, we should do that for usability checks.

Assignees: aegis301
Labels: enhancement
Milestone:

Clean Up

This repo definately needs some cleanup before publication!

Bug: Probes not working without preprocessor

Right now, if no preprocessor is provided, the pipeline stops working. The standard preprocessors are applied by default which is fine but we should provide users with the ability to opt out of preprocessing if the already did preprocessing on their own.

Add Baseline Methods

Users might wish for additional baseline methods to use in pyAKI. We should implement them and test them on artificial data.
Methods of interes:

fixed value: Users might want to give fixed creatinine values for each patient if they have a dataframe e.g. containing preoperative creatinine
calculation based on eGFR, weight and height: Users might provide a dataframe containing weight and height and wish to compute the creatinine based on the assumption that patients had a normal kidney function before admission we could use a re-structuring of this formula: Est. Creatinine Clearance = [[140 - age(yr)]weight(kg)]/[72serum Cr(mg/dL)] (multiply by 0.85 for women) -> assumption of a normal eGFR of 75.

implement kdigo_rel_crea_criterion function

This function should for now take to numpy vectors as input: one vector containing the urine output values, one vector containing the respective timestamps.

It should output two vectors:

The first vector should contain the hourly timestamps
The second vector should contain the kdigo stages at each hour according to the kdigo urine output criteria

Handle nan values in probes

Probes should just restart the staging whenever nan values are in one of the time windows.

Best solution: Users can define a threshold for how much values can be missing, if the values are missing, the mean of the rest of the window should be used for imputation

Return nan as Stage if Input is NaN

Right now, if we use a threshold for the imputation of values that are missing the interpolater correctly omits interpolating values and returns nan. However, the Probe, when encountering a full set of nan values still returns AKI stage 0. However it should return np.nan instead.

Absolute Creatinine Criterion does not get calculated correctly

Creatinine > 4 is considered KDIGO stage III, regardless of baseline values.

Implement process_stays method

        pass  # TODO: Implement process_stays method

https://github.com/AI2MS/pyAKI/blob/7a196f4822f1a7cd45c5636dedffea2a229fd688/pyAKI/kdigo.py#L105

Rename CRRT to RRT

Actually all CRRT entries should be renamed to RRT since the definition is dialysis of any kind, not just continous renal replacement therapy.

Update setup.py

Handle negative Data

We should forbid the user to input negative data and return an error message if he or she does so.

implement kdigo_uo_criterion function

This function should for now take to numpy vectors as input: one vector containing the urine output values, one vector containing the respective timestamps.

It should output two vectors:

The first vector should contain the hourly timestamps
The second vector should contain the kdigo stages at each hour according to the kdigo urine output criteria

Build CLI

We should offer an CLI to process input data.

E.g.: process_aki_stages [path_to_data_csv]

Write Tests for Preprocessors

Change Back Threshold Default

Generate Docstrings for Current State

So far we do not have any documentation on the current state of the project. We definately should though 😆 So generate docstrings for all files, functions and classes

Handle Missing Data

Sometimes users will not provide data for a certain topic. E.g. they only have creatinine values or they do not have CRRT data. Our package should be able to handle this and only provide stages for values it has.

Flexible Baseline Creatinine Definition

Users should be able to choose between different baseline creatinine time windows and provide their own custom window for baseline creatinine.

Fix Testing

Write User Guide

We have to write up some user guides and documentation for our package.

Handle Missing Data

Right now, a key error is thrown in kdigo.py line 126 when a patient has no entry in one of the input dataframes. This will be the case often for CRRT data. We should not throw a KeyError here but instead just ignore this and not calculate the corresponding stages for this particular patient then. Maybe just fill in np.nan in the corresponding stages col?

Add and redefine creatinine probes

Add a couple of creatinine probes:

Remember to test all of the probes

Implement mean urine probe

After discussion matters with some of our kidney experts there is a demand for a second method to calculate KDIGO urine output stages: Calculating the mean urine output over the given time window, dividing by the size of the window and then dividing by weight => calculating mean uo over time. This method might be less "strict" but also avoids under-classifying AKI. Users should have both options.

Interpolate Creatinine

Build an interpolation method for creatinine. Right now we are just using the sum, which is not ideal since there might be edge cases where reatinine was measured multiple times within an hour. Mean would be better. Also, we can forward fille the upsampled time series with the old creatinine value until we get a new one. We should also define a threshold for amount of missing data when we want to fill nan-values (e.g. if we don't have any creatinine value for 72h, we want to fill nan instead). The user should be able to apply different thresholds as needed.

Check if Calculation of UO Stage is Correct

        ### TODO: check if this is correct

https://github.com/AI2MS/pyAKI/blob/74c528e2ed6f84d931003e3fb2ae9ee1da6846ef/pyAKI/probes.py#L118

CRRT Probe and Resampler

Write a probe for CRRT. CRRT criterion is simple: If the patient is under chronic renal replacement therapy at a given timepoint, he or she has AKI stage 3. If not, the AKI stage according to CRRT is 0. Input data should be a time series of boolean values indicating if the patient was on RRT at any timepoint given.

Pandas Warnings

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[self.RESNAME] = 0.0
/home/paul/projects/helpwave/pyAKI/pyAKI/probes.py:353: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

implement kdigo_abs_crea_criterion function

https://github.com/AI2MS/pyAKI/blob/021d5cdc5270f98a6cf33bb4d1a45fc88b8fdbe5/kdigo.py#L7

fix setting without copying warning

Remove unneccessary columns

Remove unneccessary columns in pyaki output. Mostly the additional stay_id cols that are created by merging the data.

Build Tests for Current State

Same as with docs, right now there are no tests in place, there should be though 😆 so let's write tests to test our code

Write Tests for Baseline Calculations

Floating Point Error

Fix issue with floating point failure in crea elevation from 0.2 to 0.3

RRT probe throws error when stay_id not present

Our current implementation of the rrt probe still throws an error whenever a stay_id is not in the data, this should not be the case as users often will only have data on patients receiving rrt and when a patient has no data, we should just assume he did not receive RRT.

Modify baseline method

Rework name for crea probes

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.