ellessenne / comorbidity Goto Github PK

View Code? Open in Web Editor NEW

75.0 11.0 21.0 26.49 MB

An R package for computing comorbidity scores.

Home Page: https://ellessenne.github.io/comorbidity/

License: GNU General Public License v3.0

R 96.90% TeX 2.47% Makefile 0.63%

comorbidity r rstats

comorbidity's Introduction

Hello! 👋

Alessandro here, a senior biostatistician and software developer currently working at Red Door Analytics in Stockholm, Sweden. I am also a member of openstatsware, previously known as the ASA Biopharmaceutical Section Software Engineering Working Group.

Some of my research interests are: statistical simulation, survival analysis, multilevel modelling, joint shared random-effects modelling (e.g. longitudinal-survival), computational statistics, statistical natural history models for breast cancer growth and spread.

I am predominantly an R user, but use a good amount of Stata too. I also like to fiddle around with other languages, such as C++, Python, SQL, and LaTeX, and I can write a good amount of HTML and CSS.

I developed and currently maintain the following R packages available on CRAN:

{rsimsum} [CRAN, GitHub]
{comorbidity} [CRAN, GitHub]
{KMunicate} [CRAN, GitHub]

You will also find a variety of experimental packages and code (with wildly variable maturity levels) here.

If you want to know more about me, feel free to check out my website at ellessenne.xyz where I sometimes write about statistics, R programming, and other random stuff (tiny computers anyone?). You'll find there contact details as well, if you want to get in touch.

comorbidity's People

Contributors

Stargazers

Watchers

Forkers

corinne-riddell nakarumanchi jwilliman guhjy fiksdala huangrh jasmark lagelab salmasian arinehart2 axjadamson dryan1102 rachel-pun jwallib manzar-123 javerya10th mattmoo jbarsotti tdvle

comorbidity's Issues

Compatibility with data.table v1.12.8

Release notes: https://cran.rstudio.com/web/packages/data.table/news/news.html

New version breaks old code

Where is your "What's new" documentation to explain what's new/what's changed for why old code that worked fine under comorbidity_0.5.3 no longer works?

I see this now when

elixhauser10 <- comorbidity(x       = icd10,
                           id      = "DiagnosisID",
                           code    = "DiagnosisCode",
                           score   = "elixhauser",
                           icd     = "icd10",
                           assign0 = FALSE)

Error in comorbidity(x = icd10, id = "DiagnosisID", code = "DiagnosisCode", :
unused arguments (score = "elixhauser", icd = "icd10")

This is very frustrating in paper submission process for a minor re-run. In introducing improvements, why not let old code work just fine? Why not let map be new but support the old score?

incorrect comorbidity calculations for pulmonary and cancer codes, maybe more

Hi, I'm the author of the R package 'icd', and I'm glad to see that several of us have worked on solving the comorbidity computation problem. Just noticed your package today. Also, glad to see you live in my home country!

cc @patrickmdnet who is the author of 'medicalrisk'

I took some time this morning to compare the comorbidity computations between our packages both in speed and content. I was distressed to see we all differed from each other, particularly in the COPD/chronic lung disease and cancer/tumor categories. I dug into your source code and noticed you grep for descendents of a non-existent top-level code (498), giving a false positive for chronic lung disease with a random test code 498.82 . It is an open question what we should all be doing if potentially valid, but utterly non-existent codes appear, particularly as different annual revisions may gain or lose codes, and we would probably want to sweep them all up when looking for comorbidities.

In 'icd' I took the view that I would count non-existent descendents of extant codes, but I would exclude codes which had no parent with any association with a comorbidity.

I didn't look into the cancer side, but there were many more discrepancies which I suspect to be of the same origin.

Can I suggest randomly generating strings for testing. You can see I do this in the icd source code. I see you generated test data by sampling only valid codes.

One way for us all to work together might be for you to continue to implement comorbidities how you wish, and consider importing the 'icd' package for validation and explanation of actual codes, which I've put a lot of time into. I'm open to considering other ways for us to collaborate.

Best wishes,
Jack

comorbidity function returning values other than 0 or 1 for Elixhauser depr condition

After upgrading from version 0.5.3 to 1.0.0 I'm seeing different results for processing 4000+ patients, but only for the depr condition when using map = "elixhauser_icd9_quan" or map = "elixhauser_icd10_quan". Did something change in how you're assigning a depr condition based on ICD 9 or ICD 10 codes?

Instead of only 0 and 1 values to indicate a comorbidity for depr, I'm also seeing values 2, 3, 4, 5, 6, 7, and 9 -- there were no 8s. Why is the comorbidity mapping for depr different from all others?

Here's how I'm calling the comorbidty function:

elixhauser10 <- comorbidity(x = cohortICD10,
                            id      = "PATIENT_SK",
                            code    = "DIAGNOSIS_CODE",
                            map     = "elixhauser_icd10_quan", 
                            assign0 = FALSE)

getting "Error in data.table::setnames(x = loc, new = c(code, "ind")) : When 'new' is provided, 'old' must be provided too"

Dear colleague,

I am getting following error when running comorbiodity command:

comorbidity(x = x, id = "id", code = "code", map = "elixhauser_icd10_quan", assign0 = TRUE)

"Error in data.table::setnames(x = loc, new = c(code, "ind")) :
When 'new' is provided, 'old' must be provided too"

Please help

Thanks

Renal Dx not captured in score

CCI: The ICD-9 code 583.0 is not being flagged as a Renal disease (cat 13)
Also, ICD9 codes have a leading "0" for codes below 100. I think that codes like 016.72 are being counted as 167.2 because the leading zero was lost.

Conda-forge recipe.

Hello! I wanted to let you know that I created a recipe for this package in the "conda-forge" channel. This provides an alternate way to install it for people using conda for package management.

https://anaconda.org/conda-forge/r-comorbidity

https://github.com/conda-forge/r-comorbidity-feedstock

"reshape2 is deprecated, and this redirection is now deprecated as well"

Hello, thanks so much for making this package; it's a huge lifesaver! I've been experiencing a bug with the package; it seems like reshape2 is messing up the package. But I'm not the most skilled coder so this may also be human error. This is also my first time posting to GitHub so please excuse any formatting issues:

Stole reproducible code from last issue
x <- data.frame(
id = sample(1:15, size = 200, replace = TRUE),
code = sample_diag(200),
stringsAsFactors = FALSE
)

colnames(x) <- c("testid", "code")

test <- comorbidity(x = x, id = "testid", code = "code", score = "charlson", assign0 = TRUE, tidy.codes = TRUE)

Aggregate function missing, defaulting to 'length'
Warning message:
In data.table::melt(loc, value.name = code) :
The melt generic in data.table has been passed a list and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(loc). In the next version, this warning will become an error.

CRAN checks are failing

See here.

Package will be archived on 2022-10-10 if not fixed.

testthat: no tests found

I've never used testthat before, so I may be doing something wrong. I cloned your repo and tried running the code in testthat.R and get the error: "Error: No tests found for comorbidity". However I was able to run the tests using test_dir("./tests/testthat/").

No sure if this error was the result of my misunderstanding/misuse, but wanted to note here how I enabled the tests to run.

Review Issue URL: openjournals/joss-reviews#648

Allow custom list of codes

The idea is to allow the user passing a list of codes, e.g. a different comorbidity map.
Could use something like this to turn a list of codes into a regex:

list_of_codes <- c("I10", "I11", "I12")
make_regex <- function(x) {
  x <- paste(x, collapse = "|^")
  x <- paste0("^", x)
  return(x)
}
list_of_codes
#> [1] "I10" "I11" "I12"
make_regex(list_of_codes)
#> [1] "^I10|^I11|^I12"

^{Created on 2021-01-14 by the reprex package (v0.3.0)}

The assign0 argument should either have a default or come before the icd argument

The current signature for the comorbidity() function is as follows:

comorbidity(x, id, code, score, icd = "icd10", assign0, factorise = FALSE, labelled = TRUE, tidy.codes = TRUE)

This is bad practice. All arguments that are mandatory (i.e. have no default) should come first, and all arguments with a default should come next. Either a default should be set for the assign0 argument or it should come before the icd argument. My personal vote: assign a default value of FALSE to assign0.

Missing ICD10 code in Quan et al. 2005, so not really a bug in the package

Hi there,

I was working with the package (great one btw) and looking at the ICD10 codes included for each diagnosis. I have noticed that C86 (in ICD10 of course) is missing. It is missing in Quan et al. 2005 but I see that the code does exist in newer versions of the ICD10 code. I suspect it was not taken into account by Quan et al. because it was non-existing at the time.

Here is the line where C86 is not mentioned, which is consistent with literature but a bit puzzling for common sense:

comorbidity/data-raw/make-data.R

Line 191 in 4a52052

    
           lofregex[["charlson"]][["icd10"]][["canc"]] <- "^C00|^C01|^C02|^C03|^C04|^C05|^C06|^C07|^C08|^C09|^C10|^C11|^C12|^C13|^C14|^C15|^C16|^C17|^C18|^C19|^C20|^C21|^C22|^C23|^C24|^C25|^C26|^C30|^C31|^C32|^C33|^C34|^C37|^C38|^C39|^C40|^C41|^C43|^C45|^C46|^C47|^C48|^C49|^C50|^C51|^C52|^C53|^C54|^C55|^C56|^C57|^C58|^C60|^C61|^C62|^C63|^C64|^C65|^C66|^C67|^C68|^C69|^C70|^C71|^C72|^C73|^C74|^C75|^C76|^C81|^C82|^C83|^C84|^C85|^C88|^C90|^C91|^C92|^C93|^C94|^C95|^C96|^C97"

What is your thought on this? For my research, I am leaning toward tweaking the regex and mention that the computation of Charlson was based on the algorithm of Quan et al. but with this one modification.

I totally acknowledge this is not per se a bug in the package!

Consider Swiss comorbidity weights

https://bmchealthservres.biomedcentral.com/articles/10.1186/s12913-020-05999-5

id column name if not 'id'

Hello,
I have been trying to use your comorbidity package. Thank you for creating it, it's great. I just thought I'd report an issue I found, which is that the id column must be named 'id' for the function to work. I think this should either be stated in the package or the function should be made more generalisable.

This is my first time using Github so please forgive me if I haven't formed this comment in the right way or in the right place!

Alex

Reproducible code:

x <- data.frame(
id = sample(1:15, size = 200, replace = TRUE),
code = sample_diag(200),
stringsAsFactors = FALSE
)

colnames(x) <- c("testid", "code")

comorbidity(x = x, id = "testid", code = "code", score = "charlson", assign0 = TRUE, tidy.codes = TRUE)

Error when running example code

Running the example code below with the most recent version of the package, I receive the following error:

Error in data.table::setnames(x = loc, new = c(code, "ind")) : 
  When 'new' is provided, 'old' must be provided too

x <- data.frame(
    id = sample(1:3, size = 30, replace = TRUE),
    code = sample_diag(n = 30)
)
charlson <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE)

Any thoughts on this error?

Missing variables from the return descriptions for comorbidity()

wscore and windex need to be documented as part of the included dataset returned for the Charlson score (to be included ~lines 37 of comorbidity.R).

Similarly for Elixhauser score, score, index, wscore, and windex should be added (~line 72 of comorbidity.R)

Review issue URL: openjournals/joss-reviews#648

Confusing result with assign0 set to True

Using comorbidity v. 0.5.3, I ran the following call:

elixtbl <-
  comorbidity::comorbidity(
    x = dxtbl,
    id = "CLAIM_NO",
    code = "dxvals",
    score = "elixhauser",
    icd = 'icd10',
    assign0 = TRUE
  )

So, according to the documentation, I should expect (among other things) that hypc and hypunc are never both true. However, when I look at the output, there are several such instances:

> head(elixtbl[ , c('hypc', 'hypunc')])
  hypc hypunc
1    1      1
2    0      1
3    1      0
4    1      1
5    1      1
6    1      1

In fact this seems to be the usual case:

> sum(elixtbl$hypc==1 & elixtbl$hypunc == 1) / sum(elixtbl$hypc==1)
[1] 0.8651381

This also seems to happen with the other two hierarchical conditions:

> sum(elixtbl$metacanc == 1 & elixtbl$solidtum) / sum(elixtbl$metacanc)
[1] 0.8641205
> sum(elixtbl$diabc==1 & elixtbl$diabunc==1) / sum(elixtbl$diabc)
[1] 0.4266899

Am I misunderstanding how the assign0 argument is supposed to work?

Add support for latest AHRQ Elixhauser comorbidity calculation

The way the package calculates the Elixhauser comorbidity is based on previously published literature (Quan et al. and Moore et al.) which is dated. AHRQ continues to update the algorithm every year. While one could use the SAS program from HCUP, it would be ideal to make comorbidity the R equivalent of it and keep it up to date.

Other related issues are:

The score (wscore) calculated by comorbidity is only one of the two scores from Moore et al. (it is the mortality score; there also exists a readmission score). We should allow users to choose which score they want. Alternatively, we should return both, and name the output more accurately (e.g. wscore_ahrq_moratlity)
To name the output ahrq is a bit of a stretch, because current AHRQ algorithm is not what the R code returns. Perhaps we should name it more accurate, e.g. wscore_moore and then have an additional choice in the comorbidity() function's method argument to return the 2019 or 2020 versions which returns wscore_ahrq.
The newer AHRQ algorithms don't just rely on ICD codes, but also look at DRG codes to capture certain commodities. We should add an optional drg argument which names the column that stores the DRG value, and would only be required if the choice of method is elixhauser_ahrq.
The carit comorbidity has been dropped by AHRQ's latest code altogether.

Of note, my colleague has already created a fork at https://github.com/fiksdala/comorbidity and is working on it. The fork includes a script that can parse the SAS code and automatically generate the corresponding R code. It might be a good idea to use this Issue here to decide on the naming conventions, etc.

Missing codes cause max. score

Missing ICD10 (and probably ICD9) codes cause people to be marked as having every comorbidity.

Compare ID 1 using this code:

set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE)

id ami chf pvd cevd dementia copd rheumd pud mld diab diabwc hp rend canc msld metacanc aids
1   1   0   0   0    0        0    0      0   0   0    0      0  0    0    1    0        0    0

With the same person after setting 1 code to missing:

set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

x$code[36] <- NA_character_

# Charlson score based on ICD-10 diagnostic codes:
comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE)

   id ami chf pvd cevd dementia copd rheumd pud mld diab diabwc hp rend canc msld metacanc aids
1   1   1   1   1    1        1    1      1   1   1    1      1  1    1    1    1        1    1

ahrq score calculation appears incorrect

Calculated score should be: 9 (chf) - 1 (hypc) + 4 (hypothy) = 12
R output:
PT_ID chf carit valv pcd pvd hypunc hypc para ond cpd diabunc diabc hypothy rf ld pud aids lymph metacanc solidtum rheumd coag obes wloss fed blane dane alcohol drug psycho depre wscore_ahrq wscore_vw
13 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 7

Data types in `help()`-text, `assign0` specifically

Hi! Thanks for writing this package.

It's harder than I'd like as a new user to start using it, since e.g. help(comorbidity) doesn't specify very clearly what is expected to each argument. The worst issue is for assign0 which has no mention of data type, but a long list of what comorbidites are affected. I had to read the example to see that it expected a logical. Adding this would also make the on-typing suggestion more helpful.

(And for further improvements I suppose that id and code could be improved to read "Column name of x" instead of just "Column of x", which had me initially wondering if the function wanted that vector extracted from x beforehand, or if it was just the name - thankfully the examples at the bottom of the help-text answered my question. It might be 100% argued that this was a weird idea of mine, so discard if you find this perfectly clear already.)

The help text for tidy.codes on the other hand is for instance very clear!

I suggest:

Add "If TRUE; " to the beginning of the assign0-helptext.
Add a small example of what assign0 does to the examples - preferably for score impact as well.

Versions of ICD-10 codes

Hi All,

Thank you for this excellent package. This is not a bug, just a question. Am I right in my understanding that the package currently identifies Elixhauser conditions based on the 2011 version of the ICD-10 codes? If the codes for something such as diabetes are now recorded with new codes, would this then prevent the comorbidity() function from identifying the condition?

Thanks

Sebastien

Possible typo in code for the Enhanced-ICD-9-CM Charlson Comorbidity Score algorithm

I believe there may be a typo stemming from the Quan et al. paper from 2005 in the Enhanced-ICD-9-CM column of Table 1 for the coding algorithm of the Charlson Comorbidity Score under the Peripheral vascular disease comorbidity. The code in question is listed as "47.1", but I believe this may be a typo and should actually be "447.1". In the most recent ICD-9-CM code dataset, a code of "47.1" (or 0471) is listed as "Meningitis due to echo virus", while 447.1 (or 4471) is listed as "Stricture of artery", which appears to be a much more appropriate diagnosis under this particular comorbidity. If this has already been addressed, I apologize for posting this issue again. I noticed this potential error in the Quan et al. paper and also on the website here:
https://cran.r-project.org/web/packages/comorbidity/vignettes/comorbidityscores.html

Handling of mixed ICD-9 and ICD-10

Hi there,
First of all, thanks for the great package which has worked for me like a charm so far. I was wondering if the comorbidity function can also handle if both ICD-9 and ICD-10 codes co-exist in the same column? From the documentation I assumed that this is supported (with the option to apply different mappings using the map parameter) but wanted to double check that this is correct.
Thanks!

could not find function "score"

When I tried to calculate the Charlson score by the following code in tutorial, it prompted Error in score(charlson, weights = NULL, assign0 = FALSE) : could not find function "score". I wonder if the unweighted score calculated by score function is same as the score column in Charlson table?

unw_cci <- score(charlson, weights=NULL, assign0=FALSE)

Input cannot be a data.table

This works:

dt <- data.frame(
  EncID = 1234,
  DxCode = 'N390'
)

comorbidity(dt, id = 'EncID', code = 'DxCode', icd = 'icd10', score = 'charlson', assign0 = F)

But this does not:

dt <- data.table(
  EncID = 1234,
  DxCode = 'N390'
)

comorbidity(dt, id = 'EncID', code = 'DxCode', icd = 'icd10', score = 'charlson', assign0 = F)

Note that data.tables are inherently a data.frame too, yet somehow the latter causes the following error message:

Error in data.table::setDT(x) : 
  Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'

Using data.table version 1.12.6 and comorbidity 0.5.2

Release comorbidity 1.0.2

Prepare for release:

Submit to CRAN:

devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()

Is parallel processing actually supported?

This is a very nice package which has saved me quite some time, as others have already highlighted. The paper proposed for citation states that, "Parallel computing is supported out of the box to mitigate this potential problem, with no additional programming required by the user: it is sufficient to set the argument parallel = TRUE when calling comorbidity." (last paragraph on p. 1 in https://joss.theoj.org/papers/10.21105/joss.00648). Was this feature removed, or has it never made it into the released version? It's not in the documentation, the source code does not seem to provide parallelisation, and using the argument parallel = TRUE gives the expected error message that it's an unused argument.

Give me a shout if I you need a hand implementing this feature.

Cheers,
Ben

README suggestions

It would be helpful to highlight that samp_diag is a function of your package. Perhaps when you illustrate how you can simulate ICD codes, you can explicitly say something like "using the samp_diag() function it is possible to simulate...". I think it was a little unclear because you had written "Using comorbidity" and I wasn't sure if you were talking about the package or the function.
Similarly, further down in the README when you discuss computing the Charlson and Elixhauser scores, I think you can again add something like, "Using the comorbidity() function, we could compute..."
When you simulate x for five individuals "Conversely, say we have 5 individuals with a total of 100 ICD-9 diagnostic codes:...", you follow this by calculating the scores on x9, the three individual dataset. I think you meant to use x rather than x9 in those last two examples.

Review issue URL: openjournals/joss-reviews#648

weights for Charlson calc

The doco says this on WEIGHTS:

"Each condition from the Charlson score is assigned a score when computing the weighted Charlson index, irrespectively of the coding system utilised. In particular, diabetes with complications, hemiplegia/paraplegia, renal disease, and malignancies are assigned a score of 2; moderate/severe liver disease is assigned a score of 3; metastatic solid tumour and AIDS/HIV are assigned a score of 6; the remaining comorbidities are assigned a score of 1. comorbidity allows the option of applying a hierarchy of comorbidities should a more severe version be present: by choosing to do so (and that is the default behaviour of comorbidity) a type of comorbidity is never computed more than once for a given patient."

Those are the weights generally used before Quan et al updated the ICD-code lists and derived new weights from a newer and much larger dataset, as shown below. The AIDS/HIV @ 6 was the clue that I first noticed.

My thought is that the Quan weights should at least be an option. Making it the default at this point might be disruptive to existing users.

Doug

Release comorbidity 1.0.5

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Allow spaces in id column name

The way #20 was fixed does not allow for the id column's name to contain spaces.

dt <- data.frame(
  `Enc ID` = 1234,
  DxCode = 'N390'
)

comorbidity(dt, id = 'Enc ID', code = 'DxCode', icd = 'icd10', score = 'charlson', assign0 = F)

What you get is an error message:

Error in comorbidity(dt, id = "Enc ID", code = "DxCode", icd = "icd10",  : 
  1 assertions failed:
 * Variable 'id': Must be a subset of {'Enc.ID','DxCode'}, but is {'Enc ID'}.

Package compatibility

Following some e-mail feedback, I think it would be good to add to the package the following functions:

comorbidity_check() (or some similar name) to verify if a user-supplied comorbidity dataset is compatible with the package;
set_comorbidity() (or similar) to add the required internal structure/attributes to yield a comorbidity-compatible object.

Example code using the current (2022-08-16) dev version:

library(comorbidity)
#> This is {comorbidity} version 1.0.3.
#> A lot has changed since the pre-1.0.0 release on CRAN, please check-out breaking changes here:
#> -> https://ellessenne.github.io/comorbidity/articles/C-changes.html
df <- data.frame(
  id = 1,
  mi = 1,
  chf = 0,
  pvd = 0,
  cevd = 0,
  dementia = 0,
  cpd = 0,
  rheumd = 0,
  pud = 0,
  mld = 0,
  diab = 0,
  diabwc = 1,
  hp = 1,
  rend = 0,
  canc = 0,
  msld = 0,
  metacanc = 0,
  aids = 0
)
score(x = df)
#> Error: This function can only be used on an object of class 'comorbidity', which you can obtain by using the 'comorbidity()' function. See ?comorbidity for more details.

class(df) <- c("comorbidity", class(df))
attr(df, "map") <- "charlson_icd10_quan"
score(x = df, assign0 = FALSE)
#> [1] 3
#> attr(,"map")
#> [1] "charlson_icd10_quan"
score(x = df, weights = "quan", assign0 = FALSE)
#> [1] 3
#> attr(,"map")
#> [1] "charlson_icd10_quan"
#> attr(,"weights")
#> [1] "quan"

df$mi <- NULL
score(x = df, assign0 = FALSE)
#> Error in `[.data.frame`(x, , names(.maps[[map]])): undefined columns selected

^{Created on 2022-08-16 by the reprex package (v2.0.1)}

Residual use of `ami` from #53

The code group weights continue to refer to Acute MI (ami) vs the MI (mi) name change made to the codes in #53.

It looks like this is the case because 92d59b7 touched make-mapping.R only vs. 6c1dac2 which touched both make-mapping.R and make-weights.R

I just happened to notice it in passing while tracking down an unexpected result that ultimately turned out to be the sub tree grepping described in #9. It is not obvious to me that this impacts function as it seems like weights are applied by order, not name, but it's easy to imagine confusion arising.

Missing weighting algorithm from a pre-1.0.0 version

Re-implement the weighting from the Moore (2017) paper: https://pubmed.ncbi.nlm.nih.gov/28498196/

Release comorbidity 1.0.0

Important:

Review/merge #44

Prepare for release:

Submit to CRAN:

devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
~~usethis::use_github_release()~~
usethis::use_dev_version()

Only numeric as ID ????

I got error in ID with following message in version 1.0.2. Former version I could use alphabetical letter in ID column.

Error in data.table::setnafill(x = x, type = "const", fill = 0L) :
'x' argument must be numeric type, or list/data.table of numeric types

a possible bug in .matchit

> comorbidity:::.matchit
function (x, id, code, regex)
{
    mvb <- id
    backup <- x[, ..mvb]
    backup <- unique(backup)
    x <- stats::na.omit(x)
    if (nrow(x) == 0)
        stop("No non-missing data, please check your input data",
            call. = FALSE)
    ..cd <- unique(x[[code]])
    loc <- sapply(X = regex, FUN = function(p) stringi::stri_subset_regex(str = ..cd,
        pattern = p))
    loc <- utils::stack(loc)
    data.table::setDT(loc)
    data.table::setnames(x = loc, new = c(code, "ind"))
    x <- merge(x, loc, all.x = TRUE, allow.cartesian = TRUE,
        by = code)
    x[, `:=`((code), NULL)]
    x <- unique(x)
    mv <- c(id, "ind")
    xin <- x[, ..mv]
    xin[, `:=`(value, 1L)]
    x <- data.table::dcast.data.table(xin, stats::as.formula(paste(id,
        "~ ind")), fill = 0)
    if (!is.null(x[["NA"]]))
        x[, `:=`(`NA`, NULL)]
    x <- merge(x, backup, by = id, all.y = TRUE)
    data.table::setnafill(x = x, type = "const", fill = 0L, cols = which(names(x) !=
        id))
    for (col in names(regex)) {
        if (is.null(x[[col]]))
            x[, `:=`((col), 0L)]
    }
    data.table::setcolorder(x, c(id, names(regex)))
    return(x)
}
<bytecode: 0x555947ff3028>
<environment: namespace:comorbidity>

Error in data.table::setnames(x = loc, new = c(code, "ind")) :
[2023-05-02, 18:49:42 UTC] {subprocess.py:93} INFO - When 'new' is provided, 'old' must be provided too
[2023-05-02, 18:49:42 UTC] {subprocess.py:93} INFO - Calls: ... eval -> eval -> comorbidity -> .matchit ->

Charlson comorbidity index conditions names

Hi, there is a little detail I feel concerned.

When I calculate the cases of comorbidities for each patient, I found COPD was in the CCI. However, according to Quan (2005), it should be Chronic pulmonary disease. On the other hand, I tested the codes of Chronic pulmonary disease and COPD as follow:
df <- data.frame(id = c(1,2),code = c("J64","J441")) where J64 does not belong to COPD, but the result still counted J64 as COPD, so I guess maybe it is just an error to put COPD which should have been in the Elixhauser score into the CCI, while the chronic pulmonary disease in the Elixhauser score should have been COPD .

I sincerely look forward to your reply. And thank you for your time.

Add support for ICD-9

It would be extremely helpful if ICD-9 support is also added (based on the Deyo et al. paper for Charlson, and a similar source for Elixhauser).

Possible speed increase

Hello

This package is great, and saved me doing a lot of coding. I have some suggestions that could significantly increase its speed. The main idea relies on the assumption that each code only relates to one comorbidity (I believe this is true?) In which case;

The codes can be replaced by their comorbidity group name (or dropped). The quickest way to do this is to transform the code variable to a factor and then replace the levels of the factor using regex matching (rather than checking every code individually).
The replaced variable is then transposed from long to wide.


## Load libraries
library(comorbidity)
library(data.table)
library(tictoc)

## User function to replace factor levels using regex matching
### 'replace' must be a named list of regex codes, unmatched codes are dropped.
xfactor <- function(x, replace) {
  
  x <- factor(x)
  levels_tmp <- levels(x)
  for(i in seq_along(replace)) 
    levels_tmp[grepl(replace[i], levels_tmp)] <- names(replace)[i]
  levels(x) <- levels_tmp
  x <- factor(x, levels = names(regex))  
  
}

## Create example dataset
### Number of patients
n_ids <- 100000
### Average number of codes per patient
n_codes <- 10

set.seed(1)
dfr <- data.frame(
  id = sample(1:n_ids, size = n_ids * n_codes, replace = TRUE),
  code = sample_diag(n_ids * n_codes),
  stringsAsFactors = FALSE
)

dfr <- dfr[order(dfr$id),]

### Set options
id = "id"
code = "code"
score = "charlson"
icd = "icd10"
regex <-  comorbidity:::lofregex[[score]][[icd]]

## Using current scoring algorithm
tictoc::tic()
### Split by ID
x1 <- utils::unstack(dfr, form = stats::as.formula(paste(code, id, sep = "~")))
### Run scoring algorithm
x1 <- comorbidity:::.score(x1, id = id, score = score, icd = icd, parallel = TRUE, mc.cores = 4)
x1[,-1] <- lapply(x1[,-1], as.integer)
tictoc::toc()
52.33 sec elapsed

## New algorithm using base R
x2 <- dfr
tictoc::tic()
x2$code_f <- xfactor(x = x2[, code], replace = regex)
x2 <- unique(x2[, c(id, "code_f")])
x2 <- reshape2::dcast(x2, id ~ code_f, length, value.var = "code_f", fill = 0)
x2$`NA` <- NULL
x2[, id] <- as.character(x2[, id])
x2[,-1] <- lapply(x2[,-1], as.integer)
tictoc::toc()
12.67 sec elapsed

identical(x1, x2)
[1] TRUE

## A further gain of speed using the data.table package
x3 <- dfr

tictoc::tic()
setDT(x3)
x3[, code_f := xfactor(x3[, code], regex)]
x3 <- dcast.data.table(unique(x3[, .(id, code_f, value = 1L)]), id ~ code_f, fill = 0)
x3[, `NA` := NULL]
x3[, id := as.character(id)]
setDF(x3)
tictoc::toc()
0.57 sec elapsed

identical(x1, x3)
[1] TRUE

Add test for expected number of rows

Loop over B times, simulate a dataset with a given number of rows (at random b), and check that comorbidity returns a dataset with the expected number of rows.

Consider Swedish version of the Charlson comorbidity index

See e.g. here and here.

Variable labels disappearing after deriving scores

Possibly related to #39 - in the code below I can see the variable labels and access them through the variable.labels attribute of the dataframe:

set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE)
attributes(x1)

However, if I append the dataframe with the score as well (since I'm interested both in the scores and underlying comorbidities) then I lose the variable.labels attribute (using tidyverse since it's in my workflow):

library(tidyverse)
set.seed(1)
x <- data.frame(
  id = sample(1:15, size = 200, replace = TRUE),
  code = sample_diag(200),
  stringsAsFactors = FALSE
)

# Charlson score based on ICD-10 diagnostic codes:
x1 <- comorbidity(x = x, id = "id", code = "code", map = "charlson_icd10_quan", assign0 = FALSE) %>%
score(x = ., weights = "charlson", assign0 = FALSE)
attributes(x1)

This seems to be a result of applying the variable labels as an attribute of the dataframe, rather than of the variable. But this is harder to work around now that mapping and scoring are distinct functions.

Updates to Swedish version of Charlson comorbidity score

See latest commits here: https://github.com/bjoroeKI/Charlson-comorbidity-index-revisited

comorbidity function doesn't work

Hi,
I have updated the R software to version 4.3.1 and also the 'comorbidity' package. Unfortunately the 'comorbidity' function doesn't work and produces the following error:

Error in comorbidity(x = pat_dati.long, id = "codpaz", code = "ICD", map = "elixhauser_icd9_quan", :
unused arguments (x = pat_dati.long, id = "codpaz", code = "ICD", map = "elixhauser_icd9_quan", assign0 = TRUE, labelled = TRUE, tidy.codes = TRUE).

These are my data (pat_dati.long), and my id (codpaz) and code (ICD) names.

head(pat_dati.long)
codpaz sesso ETA CLASSI_ETA PESO BMI CLASSE_BMI prog_sdo pat ICD
1 100448 F 79 70-79 83 NA 16612 pat1 71515
2 100448 F 79 70-79 83 NA 16612 pat2 4011
3 100448 F 79 70-79 83 NA 16612 pat3
4 100448 F 79 70-79 83 NA 16612 pat4
5 100448 F 79 70-79 83 NA 16612 pat5
6 100448 F 79 70-79 83 NA 16612 pat6

Thank you for your attention
Gianluca

Peripheral vascular disease - regex patterns

Hello,

I was comparing this packaged with a package in STATA (http://fmwww.bc.edu/RePEc/bocode/c/charlson.html) and something was different.

In the "Internal Dataset #1: List of regex patterns", ICD-9-CM (comorbidity/data-raw/make-data.R), there is one code that i think is not correct, that is from the group Peripheral vascular disease, there is a code 47.1.

This code is also present in the paper Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data, Quan H (https://www.ncbi.nlm.nih.gov/pubmed/16224307).
But if you check the code it don't seem to be the write one.
447.1 - http://icd9.chrisendres.com/index.php?srchtype=diseases&srchtext=447.1&Submit=Search&action=search
47.1 - http://icd9.chrisendres.com/index.php?srchtype=diseases&srchtext=47.1&Submit=Search&action=search
I think is missing a 4. The correct one is not 447.1?

In this table (http://mchp-appserv.cpe.umanitoba.ca/concept/Charlson%20Comorbidities%20-%20Coding%20Algorithms%20for%20ICD-9-CM%20and%20ICD-10.pdf), is the code 447.1 instead of 47.1.
And in the STATA package that i was comparing the results they also use the 447.1.

There is other thing that is missing:
Also in the same file mention above in:
lofregex[["charlson"]][["icd9"]][["copd"]] <- "^4168|^4169|^490|^491|^492|^493|^494|^495|^496|497|^498|^499|^500|^501|^502|^503|^504|^505|^5064|^5081|^5088"

Is missing a "^" before the code: 497

try this to examples:
id | code
1 | 497
2 | V497

Unable to access/modify variable labels

I would like to modify the variable labels (from the labelled = TRUE option) and include them in exported data through haven for Stata. Variable labels are showing up in the Rstudio viewer as expected, but I'm unable to find the attribute anywhere. Using the code below (modified from help file):

data10 <- data.frame(
  id = sample(1:10, size = 250, replace = TRUE),
  code = sample_diag(n = 250, version = "ICD10_2011"),
  stringsAsFactors = FALSE
)
data10 <- data10[order(data10$id), ]

elixhauser10 <- comorbidity(x = data10, 
                            id = "id", 
                            code = "code", 
                            score = "elixhauser", 
                            icd = "icd10",
                            labelled = "TRUE",
                            assign0 = FALSE)
str(elixhauser10)

I would expect this code: attr(elixhauser10$id, "variable.labels") to return "ID" (as is shown in the RStudio viewer), but instead returns "NULL". attributes(elixhauser10$id) similarly returns "NULL"

Remove startup message

{comorbidity} 1.0.0 was released on CRAN on 2022-01-17. A startup message is printed to the console when the package is loaded with a certain probabiltiy, to highlight that there were significant API changes.

The startup messages need to go, in a future version (e.g., 1.1.0), which will be released at least a year after version 1.0.0 (no release is planned at the moment).

ellessenne / comorbidity Goto Github PK

comorbidity's Introduction

Hello! 👋

comorbidity's People

Contributors

Stargazers

Watchers

Forkers

comorbidity's Issues

Recommend Projects

Recommend Topics

Recommend Org