Code Monkey home page Code Monkey logo

pyaki's Introduction

Coverage Status

pyAKI

Python package to detect AKI within time series data.

The goal of this package is to establish well tested, comprehensive functions for the detection of Acute Kidney Injury (AKI) in time series data, according to the Kidney Disease Improving Global Outcomes (KDIGO) Criteria, established in 2012 1.

Installation

pip install git+https://github.com/aidh-ms/pyAKI

Usage

import pandas as pd

from pyAKI.probes import Dataset, DatasetType
from pyAKI.kdigo import Analyser

data = [
    Dataset(DatasetType.URINEOUTPUT, pd.DataFrame()),
    Dataset(DatasetType.CREATININE, pd.DataFrame()),
    Dataset(DatasetType.DEMOGRAPHICS, pd.DataFrame()),
    Dataset(DatasetType.RRT, pd.DataFrame()),
]

analyser = Analyser(data)
results: pd.Dataframe =  analyser.process_stays()

Tests

pytest --cov=. test/

Acknowledgement

We encourage all users to use pyAKI in their scientific work. Doing so, please use the following citation:

@misc{porschen2024pyaki,
    title={pyAKI - An Open Source Solution to Automated KDIGO classification},
    author={Christian Porschen and Jan Ernsting and Paul Brauckmann and Raphael Weiss and Till Würdemann and Hendrik Booke and Wida Amini and Ludwig Maidowski and Benjamin Risse and Tim Hahn and Thilo von Groote},
    year={2024},
    eprint={2401.12930},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Our paper can be found on arxiv.

Footnotes

  1. Improving Global Outcomes (KDIGO) Acute Kidney Injury Work Group. KDIGO Clinical Practice Guideline for Acute Kidney Injury. Kidney inter., Suppl. 2012; 2: 1–138.

pyaki's People

Contributors

aegis301 avatar dependabot[bot] avatar jernsting avatar paul-b98 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pyaki's Issues

Column Name Agnostic

right now, we require strict identification of columns by stay_id, charttime etc. We should give users the possibility to provide custom column naming.

Add logging

I think we should add some logging to the modules so that it will get easier to debug.

Interpolate Urine Output

We want to define an interpolation method for the urine output. Scipys interp1d seems to be a good fit for the job, together with a cumulative sum for accurately calculating urine output in every hour. The user should be able to define a thershold for missing data when nan-Values are filled.

New Baseline: Lowest Creatinine within first 7 days

A request for a new baseline method has been made by UKM physicians: They want to define the baseline creatinine for a patient as the lowest creatinine value within the first 7 days of the ICU stay. Of course, a creatinine value on day 3 e.g. should only be compared to creatinine values on day 1 or 2. In the end, we want to have 7 baseline values for the first 7 days and after that, the last value will be kept for the rest of the stay.

Add test for complete analyzer

So far we do not single handedly test the whole pipeline, we should do that for usability checks.

Assignees: aegis301
Labels: enhancement
Milestone:

Clean Up

This repo definately needs some cleanup before publication!

Bug: Probes not working without preprocessor

Right now, if no preprocessor is provided, the pipeline stops working. The standard preprocessors are applied by default which is fine but we should provide users with the ability to opt out of preprocessing if the already did preprocessing on their own.

Add Baseline Methods

Users might wish for additional baseline methods to use in pyAKI. We should implement them and test them on artificial data.
Methods of interes:

  • fixed value: Users might want to give fixed creatinine values for each patient if they have a dataframe e.g. containing preoperative creatinine
  • calculation based on eGFR, weight and height: Users might provide a dataframe containing weight and height and wish to compute the creatinine based on the assumption that patients had a normal kidney function before admission we could use a re-structuring of this formula: Est. Creatinine Clearance = [[140 - age(yr)]weight(kg)]/[72serum Cr(mg/dL)] (multiply by 0.85 for women) -> assumption of a normal eGFR of 75.

implement kdigo_rel_crea_criterion function

This function should for now take to numpy vectors as input: one vector containing the urine output values, one vector containing the respective timestamps.

It should output two vectors:

  1. The first vector should contain the hourly timestamps
  2. The second vector should contain the kdigo stages at each hour according to the kdigo urine output criteria

Handle nan values in probes

Probes should just restart the staging whenever nan values are in one of the time windows.

Best solution: Users can define a threshold for how much values can be missing, if the values are missing, the mean of the rest of the window should be used for imputation

Return nan as Stage if Input is NaN

Right now, if we use a threshold for the imputation of values that are missing the interpolater correctly omits interpolating values and returns nan. However, the Probe, when encountering a full set of nan values still returns AKI stage 0. However it should return np.nan instead.

Screenshot 2023-08-11 at 11 48 24

Rename CRRT to RRT

Actually all CRRT entries should be renamed to RRT since the definition is dialysis of any kind, not just continous renal replacement therapy.

Handle negative Data

We should forbid the user to input negative data and return an error message if he or she does so.

implement kdigo_uo_criterion function

This function should for now take to numpy vectors as input: one vector containing the urine output values, one vector containing the respective timestamps.

It should output two vectors:

  1. The first vector should contain the hourly timestamps
  2. The second vector should contain the kdigo stages at each hour according to the kdigo urine output criteria

Build CLI

We should offer an CLI to process input data.

E.g.: process_aki_stages [path_to_data_csv]

Generate Docstrings for Current State

So far we do not have any documentation on the current state of the project. We definately should though 😆 So generate docstrings for all files, functions and classes

Handle Missing Data

Sometimes users will not provide data for a certain topic. E.g. they only have creatinine values or they do not have CRRT data. Our package should be able to handle this and only provide stages for values it has.

Write User Guide

We have to write up some user guides and documentation for our package.

Handle Missing Data

Right now, a key error is thrown in kdigo.py line 126 when a patient has no entry in one of the input dataframes. This will be the case often for CRRT data. We should not throw a KeyError here but instead just ignore this and not calculate the corresponding stages for this particular patient then. Maybe just fill in np.nan in the corresponding stages col?

Add and redefine creatinine probes

Add a couple of creatinine probes:

  • ROLLING_MEAN: Rolling window of defined size with mean value
  • ROLLING_MIN
  • ROLLING_FIRST
  • FIXED_MEAN
  • FIXED_MIN
  • OVERALL_FIRST
  • OVERALL_MEAN
  • OVERALL_MIN

Remember to test all of the probes

Implement mean urine probe

After discussion matters with some of our kidney experts there is a demand for a second method to calculate KDIGO urine output stages: Calculating the mean urine output over the given time window, dividing by the size of the window and then dividing by weight => calculating mean uo over time. This method might be less "strict" but also avoids under-classifying AKI. Users should have both options.

Interpolate Creatinine

Build an interpolation method for creatinine. Right now we are just using the sum, which is not ideal since there might be edge cases where reatinine was measured multiple times within an hour. Mean would be better. Also, we can forward fille the upsampled time series with the old creatinine value until we get a new one. We should also define a threshold for amount of missing data when we want to fill nan-values (e.g. if we don't have any creatinine value for 72h, we want to fill nan instead). The user should be able to apply different thresholds as needed.

CRRT Probe and Resampler

Write a probe for CRRT. CRRT criterion is simple: If the patient is under chronic renal replacement therapy at a given timepoint, he or she has AKI stage 3. If not, the AKI stage according to CRRT is 0. Input data should be a time series of boolean values indicating if the patient was on RRT at any timepoint given.

Pandas Warnings

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[self.RESNAME] = 0.0
/home/paul/projects/helpwave/pyAKI/pyAKI/probes.py:353: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Remove unneccessary columns

Remove unneccessary columns in pyaki output. Mostly the additional stay_id cols that are created by merging the data.

Build Tests for Current State

Same as with docs, right now there are no tests in place, there should be though 😆 so let's write tests to test our code

RRT probe throws error when stay_id not present

Our current implementation of the rrt probe still throws an error whenever a stay_id is not in the data, this should not be the case as users often will only have data on patients receiving rrt and when a patient has no data, we should just assume he did not receive RRT.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.