For explainability purposes some recent papers have shown that stacking embeddings learned from claims/EHRs of different care settings can be valuable. Intuitively this also makes sense as the distribution and co-occurrence of diagnoses codes is different in the hospital versus the clinic. As such, can you provide additional information on the claims dataset used to learn these embeddings including the following:
Were these claims for commercial insurance or from public insurers (e.g., Medicare/Medicaid)?
Were these claims from a mix of care settings, inpatient only, or outpatient only?
The first column in stanford_cuis_svd_300.txt contains numbers (e.g. 4411984) which don't seem to be CUIs (CUIs typically begin with character 'C'). Is there a way to map these numbers to CUIs?
The first column in DeVine_etal_200.txt does look like it contains legitimate CUIs (e.g. C0030705).
I am confused about the codes in the IDX_IPR_C_N_L_month_ALL_MEMBERS_fold1_s300_w20_ss5_hs_thr12.txt file inside claims_codes_hs_300.txt.gz. Do they refer to icd-9 codes? But when I tried to find a particular icd-9 code in there, it didn't match.